Skip to content

bugfix: prevent uthread crash by checking coroutine reference before …#398

Merged
zhuizhuhaomeng merged 2 commits intoopenresty:masterfrom
oowl:fix-uthread-crash
Mar 10, 2026
Merged

bugfix: prevent uthread crash by checking coroutine reference before …#398
zhuizhuhaomeng merged 2 commits intoopenresty:masterfrom
oowl:fix-uthread-crash

Conversation

@oowl
Copy link
Contributor

@oowl oowl commented Mar 10, 2026

…deletion

Copilot AI review requested due to automatic review settings March 10, 2026 14:16
@oowl oowl force-pushed the fix-uthread-crash branch from 496d61f to fdda724 Compare March 10, 2026 14:19
@zhuizhuhaomeng zhuizhuhaomeng merged commit e56dac2 into openresty:master Mar 10, 2026
2 checks passed
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a crash bug in stream-lua-nginx-module where the ctx->uthreads counter could be incorrectly decremented when a user thread (uthread) that was already killed via ngx.thread.kill terminates through the normal coroutine completion path. This happens specifically when a uthread contains a child coroutine created via coroutine.create (which doesn't get a co_ref in the Lua registry). The fix adds a co_ref != LUA_NOREF guard before calling ngx_stream_lua_del_thread and decrementing ctx->uthreads, preventing double-decrements that could lead to phantom uthreads count underflow and subsequent crashes.

Changes:

  • Added co_ref != LUA_NOREF guard in three locations in ngx_stream_lua_run_thread() to prevent phantom uthreads decrement when a thread's coroutine reference has already been cleaned up
  • Replaced all 6 existing test cases in t/127-uthread-kill.t with a single new test case that reproduces the specific bug scenario using Redis connections and child coroutines

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/ngx_stream_lua_util.c Guards three del_thread/uthreads-- code paths with co_ref != LUA_NOREF checks to prevent double-decrement
t/127-uthread-kill.t Removes TEST 1-6 (existing uthread kill tests) and adds TEST 7 to reproduce the phantom uthreads decrement bug
Comments suppressed due to low confidence (3)

t/127-uthread-kill.t:25

  • This PR removes TEST 1 through TEST 6 which covered critical ngx.thread.kill scenarios (killing pending sleep, already-waited threads, pending resolver, pending connect, already-terminated threads, and kill-self). These are important regression tests that should be preserved. The new TEST 7 only covers the specific bug being fixed (phantom uthreads decrement) and does not replace the coverage of the removed tests. Please keep the existing tests and add TEST 7 as an additional test case.
        function f()

t/127-uthread-kill.t:25

  • The test is numbered TEST 7 but it is the only test in the file (tests 1-6 were removed). If the intent is to keep only this test, it should be renumbered to TEST 1 for consistency. If tests 1-6 are restored (recommended), then TEST 7 is the correct number.
        function f()

t/127-uthread-kill.t:29

  • The variable is named dns_threads but these threads are performing Redis operations, not DNS lookups. This makes the test harder to understand. Consider renaming to something more descriptive like worker_threads or redis_threads.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


$ENV{TEST_NGINX_RESOLVER} ||= '8.8.8.8';
$ENV{TEST_NGINX_MEMCACHED_PORT} ||= '11211';
$ENV{TEST_NGINX_REDIS_PORT} ||= '6379';
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test plan formula blocks() * 5 + 1 was designed for the original 6 test blocks, each of which had stap_out, stream_response, no_error_log, and error_log sections. The new single test block (TEST 7) only has stream_response and no_error_log — it lacks stap_out and error_log sections. The multiplier 5 is now incorrect and will cause a test plan mismatch, leading to test failure. The plan should be updated to match the actual test sections present in the remaining test blocks. If the existing TEST 1-6 are restored (as they should be), the plan would need to account for both old and new test structures.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants