tooldata_db: Fix signal handling - waiting for multiple processes#3833
tooldata_db: Fix signal handling - waiting for multiple processes#3833smoe wants to merge 1 commit intoLinuxCNC:2.9from
Conversation
In tooldata_db.cc (line 67), the old code does: 1. waitpid(..., WNOHANG) in a while (...) ; loop with an empty body. 2. Then evaluates WIFSIGNALED(status) and WIFEXITED(status) once, after the loop. The problem is: 1. waitpid only writes status when it returns > 0. 2. With WNOHANG, it can return 0 immediately (no state change yet), or -1 on error. 3. In those cases, status is not guaranteed to be written. 4. The code then reads status anyway via WIF* macros, so it can read stale/uninitialized data. 5. That can cause false logs and incorrectly set db_live = 0. What the patch changes: 1. It moves WIFSIGNALED/WIFEXITED checks inside the while (waitpid(...) > 0) loop. 2. status is only inspected in iterations where waitpid definitely produced a valid child status. 3. If no child was reapable, the loop does not run, and nothing is logged or toggled. 4. If multiple children changed state, each valid status is handled, instead of only the last one.
|
Is there a real problem? To me it looks like SIGCHLD cause signal handler to fire, the while loop eats all available signal info until the WNOHANG case causes waitpid to return 0. If waitpid indeed doesn't alter status in this case (man page is avoiding that topic), it should contain the status of the last signal returned by watipid (and there should be one because we are in a signal handler). |
|
This is inside a signal handler... You cannot use fprintf in a signal handler, see signal-safety(7). The @smoe change in the while loop is correct, because, in theory, multiple child processes dying at the same time may only cause one SIGCHLD and that would leave a zombie if you only handle one. However, is it possible that multiple child processes are started in parallel at all? How are children created in the process? How do you handle a waitpid() return of -1? Reading the man page, I don't think it is possible for waitpid() to return -1 in this specific piece of code. Should we care? |
|
The while loop should handle all signals that are queued. I would leave this alone if there is no problem, i think @rene-dev has something new in the pipeline. |
that is probably the bigger correctness issue. also relevant for c++: https://en.cppreference.com/w/cpp/utility/program/signal |
I need some help with this one. It is all a bit confusing. Am I confused?
In tooldata_db.cc (line 67), the old code does:
The problem is:
What the patch changes:
I patched this against 2.9. But if that is too risky then maybe 2.10 is preferable to give this some time to mature? But then again, we should somehow figure out if this change is correct or not. :-)