Skip to content

Fix subleaf in Intel CPUID#1278

Open
AntoinePrv wants to merge 1 commit intoxtensor-stack:masterfrom
AntoinePrv:cpuid-intel
Open

Fix subleaf in Intel CPUID#1278
AntoinePrv wants to merge 1 commit intoxtensor-stack:masterfrom
AntoinePrv:cpuid-intel

Conversation

@AntoinePrv
Copy link
Contributor

The original behaviour was maintained during the CPUID refactor.
The subleaf is ignored so the the behaviour must be incorrect.
I believe Intel compiler supports __cpuidex but I am not sure.

@AntoinePrv
Copy link
Contributor Author

@serge-sans-paille this is ready

@serge-sans-paille
Copy link
Contributor

It's actually supported only by decently recent intel compiler, but also by gcc and clang with the most recent version and providing <cpuid.h> is included. I've shared the minimal version through this link:
https://godbolt.org/z/jjx8z9dsh

@AntoinePrv
Copy link
Contributor Author

Alternatively, if we're going to have an assembly version anyways, should we always use it?

@serge-sans-paille
Copy link
Contributor

I think we can rely on the asm one, less code to maintain. Just double check that the assembly is actually as good as the one from __cpuidex

@AntoinePrv
Copy link
Contributor Author

They are very similar, I am am not knowledgeable enough to tell how it matters.
Though if I understand correctly, CPUID is typically ~100 clock cycles, and the result is cached, so this should not be what matters.

godbolt

@AntoinePrv
Copy link
Contributor Author

I can push the ASM fix, or we can also SFINAE detect if __cpuidex is available. Let me know what you think is best.

@serge-sans-paille
Copy link
Contributor

Let's keep only the asm, and the slight difference of clang saving one instruction compared to icc and gcc is negligible compared to the invocation cost of cpuid. Just keep a comment mentioning __cpuidex and why we just use the asm (same code in the end, less portability issues)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants