-
-
Notifications
You must be signed in to change notification settings - Fork 200
Set thread QoS to USER_INITIATED on Apple Silicon #3278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
On Apple Silicon Macs, TBB worker threads are created with the default QoS class, which macOS may schedule to efficiency cores even when performance cores are available. This significantly degrades parallel performance. This adds a pthread_set_qos_class_self_np() call in on_scheduler_entry() to set USER_INITIATED QoS, signaling to macOS that these are compute threads the user is waiting for. This causes macOS to prefer performance cores when available. Fixes stan-dev#3277
|
Changes look good and relatively self contained to me, but I don't have a machine to test @bob-carpenter is our resident Mac Silicon fan and has used some of this functionality recently, so maybe he can take a peek before we'd merge |
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
| #if defined(__arm64__) || defined(__aarch64__) | ||
| // Set thread QoS to USER_INITIATED so macOS prefers scheduling | ||
| // TBB worker threads on performance cores rather than efficiency cores. | ||
| pthread_set_qos_class_self_np(QOS_CLASS_USER_INITIATED, 0); | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving this as is should be fine, but pthread_set_qos_class_self_np should work for x86 as well.
|
I only added those statements after @WardBrian or @SteveBronder told me about them. I haven't profiled without, which I should probably do. We |
|
@bob-carpenter you might have gotten lost in the notification soup, this is about adding similar settings to the Math library that you had previously set in walnuts |
|
I may have wrongly assumed people reading this knew I was working on WALNUTS, but I understood the context here. @WardBrian pinged me and I was just responding that I haven't even verified that the commands do anything useful in the place where I am using them. You might want to ask @SteveBronder, who I believe is the one who recommended these commands for Apple Silicon where there are a combination of "performance" cores and "efficiency" cores. |
Summary
On Apple Silicon Macs, TBB worker threads are created with the default QoS class, which macOS may schedule to efficiency cores even when performance cores are available. This significantly degrades parallel performance.
This adds a
pthread_set_qos_class_self_np()call inon_scheduler_entry()to setUSER_INITIATEDQoS, signaling to macOS that these are compute threads the user is waiting for. This causes macOS to prefer performance cores when available.Details
#include <pthread.h>and#include <sys/qos.h>on Applepthread_set_qos_class_self_np(QOS_CLASS_USER_INITIATED, 0)when TBB worker threads enter the scheduler__arm64__or__aarch64__)Testing
Tested on macOS 26.2 (Tahoe) with Apple M3 Ultra (24 P-cores, 8 E-cores).
Before: CPU usage drops from ~800% to ~100-300% per chain after ~4 minutes (threads demoted to E-cores)
After: CPU usage remains stable on P-cores
Fixes #3277