Replies: 1 comment
-
|
It's a valid approach, but in my opinion overkill. If performance is a concern, unsafe would be much faster and straightforward. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Sorry if this is wrong or dumb or wasting your time. I am 100% vibe-coding a rogue-like game. I ran across a theoretical performance problem I may run into later on that I decided to solve now. Claude code suggested apparently the 'classic' fix for this issue that involves unsafe Rust (Permanent Userdata + Unsafe Context Cells). Like always, when it says something I don't like at face value... I just ask it if there's 'another method that's actually better and more excellent, just as fast or faster, etc'
Unlike always, it spent several minutes thinking about this and came up with what I posted below (though it's a generic version of the idea, taking my game code mostly out of it). I interrogated Claude thoroughly and couldn't get it to talk itself into saying this method is awful, so I implemented it and... things seem fine? If anything, it was one of those sessions where after all is said and done... it was extremely productive.
As far as I can tell, this may actually be a new novel approach into solving this issue? Something perhaps worth sharing with the community and possibly looking like an idiot if I am wrong. I don't post this lightly. I realize everyone hates AI slop, but here it is in its own words:
The Coroutine Protocol: Zero-Unsafe Rust→Lua Interop for Borrowed State
The Problem
Every Rust program that embeds Lua via mlua hits the same wall when its API surface grows: scoped userdata rebuilds metatables on every call.
When Lua needs to call methods on Rust state that borrows non-
'staticdata — a&World, a&Database, a&mut AppState— mlua requires scoped userdata. The type bound isT: UserData + 'env, notT: UserData + 'static. Because the type isn't'static, Rust'sTypeIdcannot be produced for it. WithoutTypeId, mlua cannot cache the metatable — it must reconstruct the full method table from scratch on every scope entry.The cost is ~8μs per method registered on the userdata. With a small API (10 methods), this is ~80μs — invisible. But APIs grow. At 50 methods it's 400μs. At 200 methods it's 1.6ms. Per call. Every time Lua invokes a function that needs access to your borrowed Rust state.
96% of the cost is infrastructure. The Lua code runs in microseconds. The metatable rebuild dominates everything.
This is not a bug. It is a fundamental consequence of Rust's type system:
TypeId::of<T>requiresT: 'static, and withoutTypeId, mlua has no key to cache the metatable. There is no upstream fix. The'staticbound has been debated for 9 years — RFC 1849 (2017) proposed removing it but was retracted due to soundness concerns (lifetime-erasedTypeIdwould makeFoo<'a>andFoo<'b>indistinguishable, enabling transmutation of lifetimes). As of 2026, the constraint remains and every proposal to relax it has stalled. See mlua #175, mlua #126, rlua #20.The Standard Solution: Permanent Userdata + Unsafe Context Cells
The known workaround is the pattern used by Love2D, Defold, and most game engines with Lua scripting: make the proxy type
'staticby replacing borrowed references with raw pointers stored in a thread-local cell.This works. Metatables are cached (type is
'static). Per-call cost drops from ~1.6ms to ~30μs. But it introduces:unsafe— raw pointers to borrowed state, soundness depends on disciplinewith_ctx()The Coroutine Protocol: Invert the Calling Convention
Instead of letting Lua call into Rust (which requires userdata with metatables), Lua yields requests back to Rust. All borrowed references stay on the Rust stack where the borrow checker verifies them at compile time.
No scoped userdata. No permanent userdata. No metatables. No context cells. No
unsafe. Zero.How It Works
Traditional model — Lua calls Rust through userdata methods:
Coroutine model — Lua yields requests, Rust fulfills them:
The key: all borrowed references (
&World,&Database, etc.) live on the Rust stack inside the resume loop. The borrow checker verifies them. Lua never touches Rust memory — it sends string-tagged requests through yield and receives typed responses through resume.The Lua Side: A Pure-Lua Proxy Table
Created once at startup. Permanent. Zero per-call construction cost.
The
rawsetself-memoization means__indexfires at most once per method name. After the first call toworld:get_hp(...), subsequent calls are a direct table lookup to the cached yield-closure. Zero metamethod involvement.Scripts look identical to the userdata approach:
The Rust Side: A Generic Resume Loop
The entire protocol is ~15 lines:
Cis your context type — whatever struct holds your borrowed references.dispatchis your routing function. The resume loop doesn't know or care what your API does. It just ferries requests and responses between Lua and your dispatch function.Thread Pool
Luau's
Thread::reset()resets threads in any state (yielded, errored, finished). A simple pool avoids per-call thread allocation:The Dispatch Side: A Declarative Macro
With potentially hundreds of methods, the dispatch function needs to be maintainable. A declarative macro turns it into a flat API specification — one line per method:
The macro generates:
ExtractArgtrait (EntityId from userdata, Position from entity-or-table, String, i32, bool,Option<T>, etc.)MultiValue"get_hp: arg 1 expected EntityId, got string"Adding a new API method = adding one declaration. The macro handles everything else.
The ExtractArg Trait
Polymorphic argument extraction — each type knows how to extract itself from a Lua value:
Performance
unsafeThe coroutine protocol has ~1μs per API call overhead (yield + resume + string dispatch + arg extraction) vs ~0.1μs for direct method calls through cached metatables. The crossover with context cells is ~30 API calls per invocation. Below that — which covers virtually all real-world scripting hooks — the coroutine protocol is faster because its zero setup cost dominates.
Why This Works
The scoped userdata problem exists because Rust's
TypeIdrequires'static, and withoutTypeId, mlua cannot cache metatables. Every proposed solution in the mlua ecosystem —Scope::create_userdata_ref(requiresT: 'static),register_userdata_type(requires'static),create_proxy(requires'static),AnyUserDataAPIs (require'static) — hits the same wall. See the mlua v0.9 release notes.The coroutine protocol sidesteps it entirely. There is no userdata. There are no metatables. Lua yields string-tagged requests. Rust dispatches them with all borrowed references on the stack. The
TypeIdconstraint is irrelevant because no Rust type is ever exposed to Lua's metatable system.Yield Safety
The yield does not happen inside a metamethod:
world:get_hp(entity)→ Lua resolvesworld.get_hpvia__index(first time) or table lookup (cached)__indexcreates a closure, caches it viarawset, returns. The metamethod frame is popped.coroutine.yield("get_hp", entity)After the first call,
rawsethas cached the closure.__indexnever fires again for that method.pcall/xpcall
Luau (and LuaJIT, and Lua 5.2+) supports yielding across
pcall/xpcallboundaries. Scripts can defensively wrap API calls:The yield inside
pcallsuspends the entire coroutine; resume continues inside thepcallframe. Standard Lua 5.1 does not support this.Note: the protocol itself does not require pcall — yields happen in plain closures, not inside pcall. The Lua 5.2+ requirement is about script authors being able to defensively wrap API calls. On vanilla Lua 5.1, the core protocol works but scripts cannot use pcall around API calls. In practice, almost nobody embedding Lua in Rust in 2026 uses vanilla Lua 5.1 — Luau and LuaJIT dominate.
Expression temporaries
Multiple yields in a single expression are safe:
The first yield returns 45, Lua stashes it. The second yield returns 100. Lua computes
100 / 2 = 50, then45 > 50. The coroutine stack preserves all temporaries across yield boundaries.Comparison Summary
unsafecodeSemantic Correctness
The natural concern: "you're turning synchronous method calls into asynchronous message passing — doesn't that change the execution model?"
It doesn't. Lua coroutines are not asynchronous. A yield is not "go do this later." It's "suspend me right here, return control to the caller, and when they resume me I continue from this exact point with whatever value they pass back." It's synchronous, deterministic, cooperative control transfer.
From the script's perspective,
world:get_hp(entity)calls a function and gets a number back. The script is frozen at the yield point. Rust executes the dispatch. The script resumes with the result. There is no concurrency, no interleaving, no observable difference from a direct method call. The semantics are identical:&Worldthat a userdata method would. Same references, same data.The closest theoretical alternative — a single scoped
__indexclosure viaScope::create_function()— still crosses the Lua→Rust FFI boundary on every method call (C stack frame creation), whereas coroutine yields are cheaper (they suspend the Lua evaluation stack without creating new C frames).Broader Applications
The coroutine protocol was invented to solve a Rust-specific problem (
TypeId+ scoped userdata), but some properties are valuable regardless of host language:Requirements
Thread::resume()supportThread::reset()for thread pooling (Luau-specific; without it, create new threads)Measured Results
Implemented in a Rust roguelike engine with 226 API methods across 10 dispatch domains. Migration from scoped userdata to coroutine protocol:
unsafe, zero scoped userdata, zero context cellsBeta Was this translation helpful? Give feedback.
All reactions