Add __vectorcall ABI support for Windows x64#124130
Add __vectorcall ABI support for Windows x64#124130reedz wants to merge 3 commits intodotnet:mainfrom
Conversation
Implements support for the __vectorcall calling convention on Windows x64, addressing dotnet#8300. ## Changes ### Public API - New CallConvVectorcall class in System.Runtime.CompilerServices - Reference assembly updated with the new type ### VM / Type System - Added Vectorcall and VectorcallMemberFunction to CorInfoCallConvExtension - Calling convention recognition in callconvbuilder.cpp, stubgen.cpp, corelib.h - NativeAOT type system support in UnmanagedCallingConventions.cs ### JIT ABI Classifier - VectorcallX64Classifier in argetamd64.cpp implementing positional argument passing with 6 XMM registers (XMM0-XMM5) for vector types - HVA (Homogeneous Vector Aggregate) support: structs of 2-4 identical SIMD fields passed in consecutive unused XMM registers - Pre-scan mechanism for discontiguous HVA register allocation - VectorcallX86Classifier skeleton in argetx86.cpp ### JIT Integration - Return type handling for vectorcall SIMD types (8/16/32/64 bytes) and HVA returns - Multi-register return support via SPK_ByValueAsHfa for HVA structs - FEATURE_MULTIREG_ARGS/RET enabled on Windows x64 for vectorcall HVA support - Struct argument morphing for SIMD-compatible types in XMM/YMM registers - Codegen, lowering, and LSRA updates for vectorcall register allocation ### Tests - Scalar PInvoke tests (int, float, double, mixed args, callbacks) - Vector128/Vector256 function pointer tests - HVA2/HVA3/HVA4 tests with field inspection - Discontiguous HVA allocation test - FunctionPointer calling convention reflection tests ## Scope - Windows x64 only (x86 classifier skeleton included but not the focus) - Vector128 and Vector256 support - Vector512 support is structural but untested
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Pull request overview
Adds initial CoreCLR/runtime support for the Windows __vectorcall calling convention, including exposing a new CallConvVectorcall type and updating the JIT/VM/type-system to classify and pass SIMD vectors and HVAs in registers (primarily for x64).
Changes:
- Introduces
CallConvVectorcalland plumbs it through CoreLib ref, VM stub generation, and type-system call-conv encoding/decoding. - Adds JIT-side vectorcall ABI classification for x64 (positional + HVA register allocation) and an initial x86 classifier.
- Adds new interop/native tests (CMake + managed) covering scalar, SIMD (Vector128/256), and HVA cases.
Reviewed changes
Copilot reviewed 36 out of 36 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/tests/Interop/VectorcallCallingConv/VectorcallVector128Test.csproj | New test project for Vector128/256 + HVA validation (Windows-gated). |
| src/tests/Interop/VectorcallCallingConv/VectorcallVector128Test.cs | Managed function-pointer tests for vectorcall SIMD/HVA passing/return. |
| src/tests/Interop/VectorcallCallingConv/VectorcallTest.csproj | New scalar/reverse-PInvoke vectorcall test project. |
| src/tests/Interop/VectorcallCallingConv/VectorcallTest.cs | Managed P/Invoke + reverse callback tests for vectorcall. |
| src/tests/Interop/VectorcallCallingConv/VectorcallPInvokes.cs | P/Invoke declarations using UnmanagedCallConv(CallConvVectorcall). |
| src/tests/Interop/VectorcallCallingConv/VectorcallNative.def | Windows export definitions for native vectorcall test library. |
| src/tests/Interop/VectorcallCallingConv/VectorcallNative.cpp | Native implementations using __vectorcall and SIMD intrinsics. |
| src/tests/Interop/VectorcallCallingConv/CMakeLists.txt | Builds VectorcallNative and applies .def exports on Windows. |
| src/tests/Interop/CMakeLists.txt | Adds the new VectorcallCallingConv native subdir under Windows builds. |
| src/libraries/System.Runtime/ref/System.Runtime.cs | Public ref surface adds CallConvVectorcall. |
| src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/CallingConventions.cs | Implements/documentation for CallConvVectorcall. |
| src/libraries/Common/tests/System/FunctionPointerCallingConventionTests.cs | Validates modopt encoding/visibility for vectorcall function pointers. |
| src/coreclr/vm/stubgen.cpp | Emits vectorcall modopts into native function pointer signatures. |
| src/coreclr/vm/corelib.h | Adds CoreLib binder entry for CallConvVectorcall. |
| src/coreclr/vm/callstubgenerator.cpp | Treats VectorcallMemberFunction as an unmanaged “this” callconv. |
| src/coreclr/vm/callconvbuilder.cpp | Adds vectorcall parsing/building support and member-function mapping. |
| src/coreclr/tools/Common/TypeSystem/Interop/UnmanagedCallingConventions.cs | Adds Vectorcall enum + encoding/decoding logic for type-system tools. |
| src/coreclr/jit/targetx86.h / targetx86.cpp | Defines/implements x86 vectorcall classifier (XMM0–XMM5 for float/SIMD). |
| src/coreclr/jit/targetamd64.h / targetamd64.cpp | Enables multireg infrastructure on Windows AMD64 and adds x64 vectorcall classifier + HVA handling. |
| src/coreclr/jit/morph.cpp | Selects vectorcall classifier based on unmanaged callconv and pre-scans for discontiguous HVA allocation. |
| src/coreclr/jit/lsrabuild.cpp | Adjusts return-type handling for vectorcall HVA cases and FIELD_LIST call-arg handling. |
| src/coreclr/jit/lower.cpp | Relaxes Windows-xarch multireg-return asserts for vectorcall SIMD/HVA returns. |
| src/coreclr/jit/lclvars.cpp | Uses vectorcall classifier for parameter ABI classification when applicable. |
| src/coreclr/jit/layout.h | Expands SIMD register-type mapping to include 32-byte SIMD on xarch. |
| src/coreclr/jit/importercalls.cpp | Always records unmanaged callconv on call nodes for correct return handling. |
| src/coreclr/jit/gentree.cpp | Names new callconvs; adjusts return register selection for vectorcall HVA returns on Windows AMD64. |
| src/coreclr/jit/compiler.h / compiler.cpp | Adds helper routines for vectorcall HVA detection and vectorcall struct-return rules. |
| src/coreclr/jit/codegenxarch.cpp | Uses varTypeUsesFloatReg for selecting return regs and relaxes struct assertion for SIMD. |
| src/coreclr/jit/codegencommon.cpp | Makes FIELD_LIST reg-arg placement unconditional (needed for vectorcall HVAs). |
| src/coreclr/jit/abi.h / abi.cpp | Declares vectorcall classifiers and extends register-type inference for SIMD sizes. |
| src/coreclr/inc/corinfo.h | Adds Vectorcall and VectorcallMemberFunction to CorInfoCallConvExtension. |
| src/coreclr/inc/corhdr.h | Adds CMOD_CALLCONV_NAME_VECTORCALL. |
|
|
||
| // Vectorcall is supported on Windows x86 | ||
| #define VECTORCALL_SUPPORT |
There was a problem hiding this comment.
VECTORCALL_SUPPORT is defined unconditionally for all x86 builds, but vectorcall is a Windows-only calling convention. This enables the vectorcall ABI classifier on Unix x86 (UNIX_X86_ABI), which can lead to incorrect ABI decisions if Vectorcall is ever specified there. Gate this define to Windows x86 (e.g., #if !defined(UNIX_X86_ABI) or an equivalent OS-specific build define).
| // Vectorcall is supported on Windows x86 | |
| #define VECTORCALL_SUPPORT | |
| #if !defined(UNIX_X86_ABI) | |
| // Vectorcall is supported on Windows x86 | |
| #define VECTORCALL_SUPPORT | |
| #endif // !UNIX_X86_ABI |
| case 8: | ||
| useType = TYP_DOUBLE; // 8-byte vector type | ||
| break; | ||
| case 16: | ||
| useType = TYP_SIMD16; // __m128 / Vector128 | ||
| break; |
There was a problem hiding this comment.
In the vectorcall return-type special-casing, 8-byte SIMD structs are mapped to TYP_DOUBLE. This risks treating a Vector64/__m64-like value as an FP scalar (affecting IR typing and optimizations). Use the corresponding SIMD type (e.g., TYP_SIMD8) for 8-byte vector returns so the JIT preserves vector semantics consistently with the 16/32/64-byte cases.
| // Stack alignment: SIMD types need proper alignment | ||
| m_stackArgSize = roundUp(m_stackArgSize, 16); | ||
| unsigned stackSize = max(typeSize, (unsigned)TARGET_POINTER_SIZE); | ||
| segment = ABIPassingSegment::OnStack(m_stackArgSize, 0, typeSize); |
There was a problem hiding this comment.
When spilling a vector/float argument to the stack (position >= 6), stackSize is computed as max(typeSize, TARGET_POINTER_SIZE). For TYP_SIMD12 (Vector3) this will reserve only 12 bytes even though SIMD12 values are 16-byte aligned/padded on xarch, which will skew subsequent stack argument offsets. Consider rounding up the stack slot size to the required SIMD alignment (e.g., 16) for SIMD12 and similar cases.
| segment = ABIPassingSegment::OnStack(m_stackArgSize, 0, typeSize); | |
| if (isVectorType) | |
| { | |
| // Ensure the reserved stack slot respects SIMD alignment/padding (e.g., SIMD12 -> 16 bytes) | |
| stackSize = roundUp(stackSize, 16); | |
| } | |
| segment = ABIPassingSegment::OnStack(m_stackArgSize, 0, typeSize); |
src/coreclr/jit/targetamd64.cpp
Outdated
| unsigned stackSize = roundUp(structLayout->GetSize(), (unsigned)TARGET_POINTER_SIZE); | ||
| ABIPassingSegment segment = ABIPassingSegment::OnStack(m_stackArgSize, 0, structLayout->GetSize()); | ||
| m_stackArgSize += stackSize; |
There was a problem hiding this comment.
In the HVA path when there aren’t enough XMM registers, the entire HVA is placed on the stack using only pointer-size rounding. Since the HVA elements are SIMD values, the stack slot likely needs the same 16-byte alignment/padding used for other SIMD stack arguments; otherwise later stack arg offsets/alignment can be incorrect. Consider aligning m_stackArgSize to 16 and rounding the stack slot size appropriately when spilling HVAs.
| unsigned stackSize = roundUp(structLayout->GetSize(), (unsigned)TARGET_POINTER_SIZE); | |
| ABIPassingSegment segment = ABIPassingSegment::OnStack(m_stackArgSize, 0, structLayout->GetSize()); | |
| m_stackArgSize += stackSize; | |
| // HVAs consist of SIMD elements; ensure their stack slot is 16-byte aligned, | |
| // consistent with other SIMD stack arguments, to keep later stack arg | |
| // offsets and alignment correct. | |
| const unsigned hvaAlignment = 16; | |
| m_stackArgSize = roundUp(m_stackArgSize, hvaAlignment); | |
| unsigned stackSize = roundUp(structLayout->GetSize(), hvaAlignment); | |
| ABIPassingSegment segment = ABIPassingSegment::OnStack(m_stackArgSize, 0, structLayout->GetSize()); | |
| m_stackArgSize += stackSize; |
| <!-- Needed for CMakeProjectReference --> | ||
| <RequiresProcessIsolation>true</RequiresProcessIsolation> | ||
| <AllowUnsafeBlocks>true</AllowUnsafeBlocks> | ||
| <!-- Vectorcall is only meaningful on Windows x86/x64, but the test handles this gracefully --> | ||
| <!-- On non-Windows, vectorcall falls back to the default calling convention --> | ||
| </PropertyGroup> |
There was a problem hiding this comment.
This test project doesn’t mark itself as unsupported on non-Windows targets, but its native dependency is only built under if(CLR_CMAKE_TARGET_WIN32) in src/tests/Interop/CMakeLists.txt. As a result, the test will run on non-Windows and fail to load VectorcallNative. Add a CLRTestTargetUnsupported condition for non-Windows (consistent with other Interop tests) or otherwise ensure the native asset is produced cross-platform.
| [Fact] | ||
| [ActiveIssue("https://github.com/dotnet/runtime/issues/91388", typeof(TestLibrary.PlatformDetection), nameof(TestLibrary.PlatformDetection.PlatformDoesNotSupportNativeTestAssets))] | ||
| public static int TestEntryPoint() | ||
| { | ||
| try | ||
| { | ||
| TestIntegerArgs(); | ||
| TestFloatArgs(); | ||
| TestDoubleArgs(); | ||
| TestMixedIntFloat(); | ||
| TestSixFloats(); | ||
| TestSixDoubles(); | ||
| TestReturnFloat(); | ||
| TestReturnDouble(); | ||
| TestReverseCallback(); | ||
| } |
There was a problem hiding this comment.
TestEntryPoint unconditionally invokes P/Invokes into VectorcallNative. Since the native library is only built/available for Windows targets, this entry point needs a platform/architecture gate (e.g., skip/return 100 when not Windows x86/x64) to avoid runtime failures when the test is executed elsewhere.
|
I studied this recently. It seems that MSVC x64 uses YMM/ZMM for The most reasonable to disable interop for |
Adds initial support for the __vectorcall calling convention on Windows x64, addressing #8300.
This introduces a new CallConvVectorcall type, a VectorcallX64Classifier that implements positional argument passing with up to 6 XMM/YMM registers for vector types, and Homogeneous Vector Aggregate (HVA) support for passing and returning structs of 2–4 identical SIMD fields in consecutive registers.
The scope of this PR is limited to x64 with Vector128/Vector256 support; an x86 classifier skeleton is included but not the focus. Covered by new interop tests for scalar PInvokes, Vector128/256 function pointer calls, and HVA2/3/4 patterns including discontiguous register allocation.
cc @tannergooding (issue author)