Fix ASAN false positive on aarch64 from fiber stack reuse#12899
Fix ASAN false positive on aarch64 from fiber stack reuse#12899shumbo wants to merge 2 commits intobytecodealliance:mainfrom
Conversation
|
The ASAN behavior here isn't aarch64-specific, right? If not, could you apply the same change to the other architectures as well? |
In theory, yes; however, for some reason, it doesn't reproduce on x86 (likely due to an ASAN implementation/memory layout difference). Happy to apply the same change defensively to the other architectures if you'd prefer. |
|
Yes, unless we have reason to believe (based on documentation) that ASAN only expects annotations on one platform, we should apply the same annotations on all platforms. Thanks! |
|
eb48a43 applied the same change to other architectures and I can confirm this doesn't break x86_68. I don't have easy access to other architectures to confirm -- I believe this change is low risk (ASAN only + same pattern across architectures) but I'd appreciate some eyes on this. |
|
Do you know why ASAN is flaky about this? Locally I can't reproduce at all except for |
|
I did some digging into this, and I think one answer for my confusion is that there's a small set of fiber functions which never actually return. When a fiber exits we switch off them but never switch back, and this is leaving behind corrupt state I think. I tried out this diff: diff --git a/crates/fiber/src/unix.rs b/crates/fiber/src/unix.rs
index fc202a2e79..f0f43759d1 100644
--- a/crates/fiber/src/unix.rs
+++ b/crates/fiber/src/unix.rs
@@ -389,7 +389,19 @@ mod asan {
// If this fiber is finishing then NULL is passed to asan to let it know
// that it can deallocate the "fake stack" that it's tracking for this
// fiber.
+ //
+ // Additionally inform asan that this function is itself never going to
+ // return because once we switch away we'll never switch back. It seems
+ // like this isn't part of `__sanitizer_start_switch_fiber` upon
+ // switching away, and this helps resolve the issue identified in #12899
+ // for example.
let private_asan_pointer_ref = if is_finishing {
+ unsafe extern "C" {
+ fn __asan_handle_no_return();
+ }
+ unsafe {
+ __asan_handle_no_return();
+ }
None
} else {
Some(&mut private_asan_pointer)and it looks to fix the issue for me locally (similar to this). Does this fix the issue locally for you too @shumbo? If so I think I'd personally prefer to go this route as I feel it strikes at the heart of the issue a bit more which is cleaning up stale state proactively instead of lazily. |
|
@alexcrichton, your patch solves the problem and is a better fix than this PR. It seems like (1) |
Overview
When fiber stacks are reused across async calls (via the Store-level
last_fiber_stackcache), ASAN's shadow memory retains poisoning from the previous fiber execution. On aarch64, whenwasmtime_fiber_initwrites theInitialStackstruct to reinitialize a reused stack, ASAN reports a false stack-buffer-overflow because it sees a write to memory it still considers out-of-scope.This adds an
__asan_unpoison_memory_regioncall (gated behind#[cfg(asan)]) to unpoison exactly the InitialStack-sized region before writing to it.Reproducing
Any .wast file with two or more invocations triggers this, since wasmtime wast defaults to async mode and the Store caches the fiber stack after the first call:
The first invoke allocates and runs a fiber, poisoning shadow memory for the stack region. The second invoke reuses the cached stack, and wasmtime_fiber_init's initial_stack.
write(...)hits the stale poisoning.With this patch, it works as expected: