The Coroutine Protocol: Zero-Unsafe Rust→Lua Interop for Borrowed State #683

ledati16 · 2026-02-25T13:52:36Z

ledati16
Feb 25, 2026

Sorry if this is wrong or dumb or wasting your time. I am 100% vibe-coding a rogue-like game. I ran across a theoretical performance problem I may run into later on that I decided to solve now. Claude code suggested apparently the 'classic' fix for this issue that involves unsafe Rust (Permanent Userdata + Unsafe Context Cells). Like always, when it says something I don't like at face value... I just ask it if there's 'another method that's actually better and more excellent, just as fast or faster, etc'

Unlike always, it spent several minutes thinking about this and came up with what I posted below (though it's a generic version of the idea, taking my game code mostly out of it). I interrogated Claude thoroughly and couldn't get it to talk itself into saying this method is awful, so I implemented it and... things seem fine? If anything, it was one of those sessions where after all is said and done... it was extremely productive.

As far as I can tell, this may actually be a new novel approach into solving this issue? Something perhaps worth sharing with the community and possibly looking like an idiot if I am wrong. I don't post this lightly. I realize everyone hates AI slop, but here it is in its own words:

The Coroutine Protocol: Zero-Unsafe Rust→Lua Interop for Borrowed State

The Problem

Every Rust program that embeds Lua via mlua hits the same wall when its API surface grows: scoped userdata rebuilds metatables on every call.

When Lua needs to call methods on Rust state that borrows non-'static data — a &World, a &Database, a &mut AppState — mlua requires scoped userdata. The type bound is T: UserData + 'env, not T: UserData + 'static. Because the type isn't 'static, Rust's TypeId cannot be produced for it. Without TypeId, mlua cannot cache the metatable — it must reconstruct the full method table from scratch on every scope entry.

The cost is ~8μs per method registered on the userdata. With a small API (10 methods), this is ~80μs — invisible. But APIs grow. At 50 methods it's 400μs. At 200 methods it's 1.6ms. Per call. Every time Lua invokes a function that needs access to your borrowed Rust state.

Measured breakdown of a single scoped-userdata call (206 methods):

  scope.create_userdata(Proxy)     ~490μs   (61 methods × ~8μs)
  build_sub_api_tables (9 tables)   ~1,160μs (145 methods × ~8μs)
  Actual Lua execution                    ~20-60μs
  Everything else                              ~38μs
  ─────────────────────────────────
  Total                                                 ~1.2-1.7ms

96% of the cost is infrastructure. The Lua code runs in microseconds. The metatable rebuild dominates everything.

This is not a bug. It is a fundamental consequence of Rust's type system: TypeId::of<T> requires T: 'static, and without TypeId, mlua has no key to cache the metatable. There is no upstream fix. The 'static bound has been debated for 9 years — RFC 1849 (2017) proposed removing it but was retracted due to soundness concerns (lifetime-erased TypeId would make Foo<'a> and Foo<'b> indistinguishable, enabling transmutation of lifetimes). As of 2026, the constraint remains and every proposal to relax it has stalled. See mlua #175, mlua #126, rlua #20.

The Standard Solution: Permanent Userdata + Unsafe Context Cells

The known workaround is the pattern used by Love2D, Defold, and most game engines with Lua scripting: make the proxy type 'static by replacing borrowed references with raw pointers stored in a thread-local cell.

thread_local! {
    static CTX: RefCell<Option<*const World>> = RefCell::new(None);
}

// Before each Lua call:
CTX.set(Some(world as *const World));

// Inside every UserData method:
fn get_hp(&self, entity: EntityId) -> i32 {
    with_ctx(|world| {
        world.get::<Health>(entity).map_or(0, |h| h.current)
    })
}

// After each Lua call (with panic guard):
CTX.set(None);

This works. Metatables are cached (type is 'static). Per-call cost drops from ~1.6ms to ~30μs. But it introduces:

unsafe — raw pointers to borrowed state, soundness depends on discipline
Panic guards — forgetting cleanup after a Lua error is a use-after-free
Distributed complexity — every method (potentially hundreds) must wrap access in with_ctx()
Reentrance hazards — if a method triggers another Lua call, the context cell is already borrowed

The Coroutine Protocol: Invert the Calling Convention

Instead of letting Lua call into Rust (which requires userdata with metatables), Lua yields requests back to Rust. All borrowed references stay on the Rust stack where the borrow checker verifies them at compile time.

No scoped userdata. No permanent userdata. No metatables. No context cells. No unsafe. Zero.

How It Works

Traditional model — Lua calls Rust through userdata methods:

Lua:  world:get_hp(entity)
        → metatable lookup (rebuilt every call for non-'static types)
        → Rust method executes with borrowed state

Coroutine model — Lua yields requests, Rust fulfills them:

Lua:  world:get_hp(entity)
        → cached Lua closure calls coroutine.yield("get_hp", entity)
        → Lua suspends, control returns to Rust

Rust: thread.resume() returns ("get_hp", entity)
        → dispatch: world.get::<Health>(entity) → 45
        → thread.resume(45)

Lua:  hp = 45, continues

The key: all borrowed references (&World, &Database, etc.) live on the Rust stack inside the resume loop. The borrow checker verifies them. Lua never touches Rust memory — it sends string-tagged requests through yield and receives typed responses through resume.

The Lua Side: A Pure-Lua Proxy Table

Created once at startup. Permanent. Zero per-call construction cost.

local function make_api(prefix)
    local t = {}
    setmetatable(t, {
        __index = function(self, key)
            local fn = function(_, ...)
                return coroutine.yield(prefix .. key, ...)
            end
            rawset(self, key, fn)  -- cache: __index fires once per method name
            return fn
        end,
    })
    return t
end

local world = make_api("")
world.spatial = make_api("spatial.")
world.combat = make_api("combat.")
-- ... any number of sub-APIs
return world

The rawset self-memoization means __index fires at most once per method name. After the first call to world:get_hp(...), subsequent calls are a direct table lookup to the cached yield-closure. Zero metamethod involvement.

Scripts look identical to the userdata approach:

on_damage_taken = function(source, entity, damage, world)
    local hp = world:get_hp(entity)
    if hp < 50 then
        local allies = world.combat:count_allies_nearby(entity, 3)
        if allies == 0 then
            world:apply_condition(entity, "panicked", 1)
        end
    end
end

The Rust Side: A Generic Resume Loop

The entire protocol is ~15 lines:

pub fn coroutine_call<C>(
    lua: &Lua,
    pool: &mut ThreadPool,
    func: &Function,
    args: MultiValue,
    ctx: &mut C,
    dispatch: fn(&Lua, &str, &[Value], &mut C) -> mlua::Result<MultiValue>,
    script_path: &str,
) -> Result<MultiValue, ScriptError> {
    let thread = pool.get(lua, func)?;
    let mut result: MultiValue = thread.resume(args)?;

    while thread.status() == ThreadStatus::Resumable {
        // Extract method name from first yield value
        let method = extract_method_name(&result)?;
        let args = &result.iter().skip(1).cloned().collect::<Vec<_>>();
        let response = dispatch(lua, method, args, ctx)?;
        result = thread.resume(response)?;
    }

    pool.release(thread);
    Ok(result)
}

C is your context type — whatever struct holds your borrowed references. dispatch is your routing function. The resume loop doesn't know or care what your API does. It just ferries requests and responses between Lua and your dispatch function.

Thread Pool

Luau's Thread::reset() resets threads in any state (yielded, errored, finished). A simple pool avoids per-call thread allocation:

pub struct ThreadPool {
    threads: Vec<mlua::Thread>,
}

impl ThreadPool {
    pub fn get(&mut self, lua: &Lua, func: &Function) -> mlua::Result<mlua::Thread> {
        if let Some(thread) = self.threads.pop() {
            thread.reset(func.clone())?;
            Ok(thread)
        } else {
            lua.create_thread(func.clone())
        }
    }

    pub fn release(&mut self, thread: mlua::Thread) {
        self.threads.push(thread);
    }
}

The Dispatch Side: A Declarative Macro

With potentially hundreds of methods, the dispatch function needs to be maintainable. A declarative macro turns it into a flat API specification — one line per method:

dispatch_method! {
    lua, method, args, ctx;

    // Queries — read state, return values
    "get_hp"(entity: EntityId) -> i32 => {
        ctx.world.get::<Health>(entity).map_or(0, |h| h.current)
    },
    "is_alive"(entity: EntityId) -> bool => {
        ctx.world.is_alive(entity)
    },

    // Effects — mutate state (or queue mutations), return nothing
    "apply_condition"(target: EntityId, name: String, stacks: u32) => {
        ctx.effects.push(Effect::ApplyCondition {
            target, condition: ConditionId::new(name), stacks,
            source: EffectSource::Script,
        });
    },

    // Sub-API methods — same syntax, prefixed names
    "spatial.distance"(a: Position, b: Position) -> f64 => {
        a.distance_to(b)
    },
    "combat.count_allies_nearby"(entity: EntityId, radius: i32) -> i32 => {
        count_allies_nearby(ctx.world, ctx.maps, entity, radius)
    },
}

The macro generates:

Typed argument extraction via an ExtractArg trait (EntityId from userdata, Position from entity-or-table, String, i32, bool, Option<T>, etc.)
Return value packing into MultiValue
Error messages with method name and argument position: "get_hp: arg 1 expected EntityId, got string"
An error for unknown methods (not silent nil)

Adding a new API method = adding one declaration. The macro handles everything else.

The ExtractArg Trait

Polymorphic argument extraction — each type knows how to extract itself from a Lua value:

trait ExtractArg: Sized {
    fn extract(value: &Value, method: &str, position: usize) -> Result<Self, ScriptError>;
}

impl ExtractArg for EntityId {
    fn extract(value: &Value, method: &str, pos: usize) -> Result<Self, ScriptError> {
        match value {
            Value::UserData(ud) => Ok(ud.borrow::<LuaEntityId>()?.0),
            other => Err(arg_error(method, pos, "EntityId", other)),
        }
    }
}

// Position accepts either an entity (look up its position) or a {x, y} table:
impl ExtractArg for Position {
    fn extract(value: &Value, method: &str, pos: usize) -> Result<Self, ScriptError> {
        match value {
            Value::UserData(ud) => { /* entity → lookup position */ }
            Value::Table(t) => { /* {x, y} table → Position */ }
            other => Err(arg_error(method, pos, "Position", other)),
        }
    }
}

// Also: String, i32, u32, f64, bool, Option<T>, Vec<String>, Value (pass-through)

Performance

Metric	Scoped Userdata	Context Cell (unsafe)	Coroutine Protocol
Per-call setup	~1.6ms (206 methods)	~0.1μs	~1μs (thread reset)
Per-API-call	~0.1μs	~0.1μs	~1μs
Typical hook (5 API calls)	~1.7ms	~30μs	~6μs
2,000 entities × 3 hooks	10+ seconds	~180ms	~36ms
`unsafe`	No	Yes	No
Metatable rebuild	Every call	Never	N/A (no metatables)

The coroutine protocol has ~1μs per API call overhead (yield + resume + string dispatch + arg extraction) vs ~0.1μs for direct method calls through cached metatables. The crossover with context cells is ~30 API calls per invocation. Below that — which covers virtually all real-world scripting hooks — the coroutine protocol is faster because its zero setup cost dominates.

Why This Works

The scoped userdata problem exists because Rust's TypeId requires 'static, and without TypeId, mlua cannot cache metatables. Every proposed solution in the mlua ecosystem — Scope::create_userdata_ref (requires T: 'static), register_userdata_type (requires 'static), create_proxy (requires 'static), AnyUserData APIs (require 'static) — hits the same wall. See the mlua v0.9 release notes.

The coroutine protocol sidesteps it entirely. There is no userdata. There are no metatables. Lua yields string-tagged requests. Rust dispatches them with all borrowed references on the stack. The TypeId constraint is irrelevant because no Rust type is ever exposed to Lua's metatable system.

Yield Safety

The yield does not happen inside a metamethod:

world:get_hp(entity) → Lua resolves world.get_hp via __index (first time) or table lookup (cached)
__index creates a closure, caches it via rawset, returns. The metamethod frame is popped.
Lua calls the returned closure → the closure calls coroutine.yield("get_hp", entity)
The yield happens in step 3 — a normal function call, not inside any metamethod

After the first call, rawset has cached the closure. __index never fires again for that method.

pcall/xpcall

Luau (and LuaJIT, and Lua 5.2+) supports yielding across pcall/xpcall boundaries. Scripts can defensively wrap API calls:

local ok, hp = pcall(function() return world:get_hp(entity) end)

The yield inside pcall suspends the entire coroutine; resume continues inside the pcall frame. Standard Lua 5.1 does not support this.

Note: the protocol itself does not require pcall — yields happen in plain closures, not inside pcall. The Lua 5.2+ requirement is about script authors being able to defensively wrap API calls. On vanilla Lua 5.1, the core protocol works but scripts cannot use pcall around API calls. In practice, almost nobody embedding Lua in Rust in 2026 uses vanilla Lua 5.1 — Luau and LuaJIT dominate.

Expression temporaries

Multiple yields in a single expression are safe:

if world:get_hp(entity) > world:get_max_hp(entity) / 2 then

The first yield returns 45, Lua stashes it. The second yield returns 100. Lua computes 100 / 2 = 50, then 45 > 50. The coroutine stack preserves all temporaries across yield boundaries.

Comparison Summary

Concern	Scoped Userdata	Context Cell	Coroutine Protocol
`unsafe` code	None	Required	None
Use-after-free risk	None (scope-bounded)	Runtime (panic guard)	Impossible (borrows on stack)
Metatable rebuild	Every call	Never	N/A
Per-call setup cost	O(methods)	O(1)	O(1)
Adding a method	UserData impl	UserData impl + with_ctx	One macro declaration
Complexity location	Distributed (every method)	Distributed (every method)	Centralized (dispatch file)
Reentrance	Scoped (no issue)	Hazardous (cell borrowed)	Natural (nested resume loops)
Debugging	Native stack traces	Native stack traces	Clear error messages, one extra proxy frame on dispatch errors only
Lua compatibility	Any	Any	Luau, LuaJIT, Lua 5.2+

Semantic Correctness

The natural concern: "you're turning synchronous method calls into asynchronous message passing — doesn't that change the execution model?"

It doesn't. Lua coroutines are not asynchronous. A yield is not "go do this later." It's "suspend me right here, return control to the caller, and when they resume me I continue from this exact point with whatever value they pass back." It's synchronous, deterministic, cooperative control transfer.

From the script's perspective, world:get_hp(entity) calls a function and gets a number back. The script is frozen at the yield point. Rust executes the dispatch. The script resumes with the result. There is no concurrency, no interleaving, no observable difference from a direct method call. The semantics are identical:

State visibility: The dispatch handler reads from the same &World that a userdata method would. Same references, same data.
Ordering: One coroutine, one yield at a time, strictly sequential. Same as method calls.
Capability: Every method that existed on userdata exists as a dispatch entry. Return values, errors, pcall — all work identically.

The closest theoretical alternative — a single scoped __index closure via Scope::create_function() — still crosses the Lua→Rust FFI boundary on every method call (C stack frame creation), whereas coroutine yields are cheaper (they suspend the Lua evaluation stack without creating new C frames).

Broader Applications

The coroutine protocol was invented to solve a Rust-specific problem (TypeId + scoped userdata), but some properties are valuable regardless of host language:

Security/sandboxing: Lua never calls into the host language. It yields strings. The dispatch function can filter, rate-limit, log, or deny specific method calls trivially. This matters for modding, plugin systems, and untrusted scripts.
Observability: Every API call passes through one dispatch point. Adding timing, logging, call counting, or tracing to every method is a one-line change.
Hot-reloadable API surface: The dispatch function can be swapped at runtime. Add or remove API methods between calls without rebuilding metatables or re-registering functions.
Cross-language portability: The Lua proxy is pure Lua. The dispatch is pure host-language logic. If you migrated engines, the Lua scripts wouldn't change.

Requirements

mlua (or rlua) with Thread::resume() support
Luau, LuaJIT, or Lua 5.2+ (yield-across-pcall)
Thread::reset() for thread pooling (Luau-specific; without it, create new threads)

Measured Results

Implemented in a Rust roguelike engine with 226 API methods across 10 dispatch domains. Migration from scoped userdata to coroutine protocol:

Performance: 1.2ms → 4.3μs per hook call (280× faster)
Safety: Zero unsafe, zero scoped userdata, zero context cells
Code: ~12,600 lines of UserData impl code deleted, replaced by ~6,600 lines of dispatch declarations
API change: Zero. All existing Lua scripts work unmodified.
Adding a method: One declaration in a dispatch file. Macro generates extraction, packing, error messages.

khvzak · 2026-03-01T13:03:31Z

khvzak
Mar 1, 2026
Maintainer

It's a valid approach, but in my opinion overkill. If performance is a concern, unsafe would be much faster and straightforward.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The Coroutine Protocol: Zero-Unsafe Rust→Lua Interop for Borrowed State #683

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

The Coroutine Protocol: Zero-Unsafe Rust→Lua Interop for Borrowed State #683

Uh oh!

ledati16 Feb 25, 2026

The Coroutine Protocol: Zero-Unsafe Rust→Lua Interop for Borrowed State

The Problem

The Standard Solution: Permanent Userdata + Unsafe Context Cells

The Coroutine Protocol: Invert the Calling Convention

How It Works

The Lua Side: A Pure-Lua Proxy Table

The Rust Side: A Generic Resume Loop

Thread Pool

The Dispatch Side: A Declarative Macro

The ExtractArg Trait

Performance

Why This Works

Yield Safety

pcall/xpcall

Expression temporaries

Comparison Summary

Semantic Correctness

Broader Applications

Requirements

Measured Results

Replies: 1 comment

Uh oh!

khvzak Mar 1, 2026 Maintainer

ledati16
Feb 25, 2026

khvzak
Mar 1, 2026
Maintainer