Skip to content

Add __vectorcall ABI support for Windows x64#124130

Open
reedz wants to merge 3 commits intodotnet:mainfrom
reedz:feature/vector-call-abi
Open

Add __vectorcall ABI support for Windows x64#124130
reedz wants to merge 3 commits intodotnet:mainfrom
reedz:feature/vector-call-abi

Conversation

@reedz
Copy link
Contributor

@reedz reedz commented Feb 7, 2026

Adds initial support for the __vectorcall calling convention on Windows x64, addressing #8300.

This introduces a new CallConvVectorcall type, a VectorcallX64Classifier that implements positional argument passing with up to 6 XMM/YMM registers for vector types, and Homogeneous Vector Aggregate (HVA) support for passing and returning structs of 2–4 identical SIMD fields in consecutive registers.

The scope of this PR is limited to x64 with Vector128/Vector256 support; an x86 classifier skeleton is included but not the focus. Covered by new interop tests for scalar PInvokes, Vector128/256 function pointer calls, and HVA2/3/4 patterns including discontiguous register allocation.

cc @tannergooding (issue author)

Implements support for the __vectorcall calling convention on Windows x64,
addressing dotnet#8300.

## Changes

### Public API
- New CallConvVectorcall class in System.Runtime.CompilerServices
- Reference assembly updated with the new type

### VM / Type System
- Added Vectorcall and VectorcallMemberFunction to CorInfoCallConvExtension
- Calling convention recognition in callconvbuilder.cpp, stubgen.cpp, corelib.h
- NativeAOT type system support in UnmanagedCallingConventions.cs

### JIT ABI Classifier
- VectorcallX64Classifier in 	argetamd64.cpp implementing positional argument
  passing with 6 XMM registers (XMM0-XMM5) for vector types
- HVA (Homogeneous Vector Aggregate) support: structs of 2-4 identical SIMD fields
  passed in consecutive unused XMM registers
- Pre-scan mechanism for discontiguous HVA register allocation
- VectorcallX86Classifier skeleton in 	argetx86.cpp

### JIT Integration
- Return type handling for vectorcall SIMD types (8/16/32/64 bytes) and HVA returns
- Multi-register return support via SPK_ByValueAsHfa for HVA structs
- FEATURE_MULTIREG_ARGS/RET enabled on Windows x64 for vectorcall HVA support
- Struct argument morphing for SIMD-compatible types in XMM/YMM registers
- Codegen, lowering, and LSRA updates for vectorcall register allocation

### Tests
- Scalar PInvoke tests (int, float, double, mixed args, callbacks)
- Vector128/Vector256 function pointer tests
- HVA2/HVA3/HVA4 tests with field inspection
- Discontiguous HVA allocation test
- FunctionPointer calling convention reflection tests

## Scope
- Windows x64 only (x86 classifier skeleton included but not the focus)
- Vector128 and Vector256 support
- Vector512 support is structural but untested
Copilot AI review requested due to automatic review settings February 7, 2026 14:42
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 7, 2026
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Feb 7, 2026
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds initial CoreCLR/runtime support for the Windows __vectorcall calling convention, including exposing a new CallConvVectorcall type and updating the JIT/VM/type-system to classify and pass SIMD vectors and HVAs in registers (primarily for x64).

Changes:

  • Introduces CallConvVectorcall and plumbs it through CoreLib ref, VM stub generation, and type-system call-conv encoding/decoding.
  • Adds JIT-side vectorcall ABI classification for x64 (positional + HVA register allocation) and an initial x86 classifier.
  • Adds new interop/native tests (CMake + managed) covering scalar, SIMD (Vector128/256), and HVA cases.

Reviewed changes

Copilot reviewed 36 out of 36 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/tests/Interop/VectorcallCallingConv/VectorcallVector128Test.csproj New test project for Vector128/256 + HVA validation (Windows-gated).
src/tests/Interop/VectorcallCallingConv/VectorcallVector128Test.cs Managed function-pointer tests for vectorcall SIMD/HVA passing/return.
src/tests/Interop/VectorcallCallingConv/VectorcallTest.csproj New scalar/reverse-PInvoke vectorcall test project.
src/tests/Interop/VectorcallCallingConv/VectorcallTest.cs Managed P/Invoke + reverse callback tests for vectorcall.
src/tests/Interop/VectorcallCallingConv/VectorcallPInvokes.cs P/Invoke declarations using UnmanagedCallConv(CallConvVectorcall).
src/tests/Interop/VectorcallCallingConv/VectorcallNative.def Windows export definitions for native vectorcall test library.
src/tests/Interop/VectorcallCallingConv/VectorcallNative.cpp Native implementations using __vectorcall and SIMD intrinsics.
src/tests/Interop/VectorcallCallingConv/CMakeLists.txt Builds VectorcallNative and applies .def exports on Windows.
src/tests/Interop/CMakeLists.txt Adds the new VectorcallCallingConv native subdir under Windows builds.
src/libraries/System.Runtime/ref/System.Runtime.cs Public ref surface adds CallConvVectorcall.
src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/CallingConventions.cs Implements/documentation for CallConvVectorcall.
src/libraries/Common/tests/System/FunctionPointerCallingConventionTests.cs Validates modopt encoding/visibility for vectorcall function pointers.
src/coreclr/vm/stubgen.cpp Emits vectorcall modopts into native function pointer signatures.
src/coreclr/vm/corelib.h Adds CoreLib binder entry for CallConvVectorcall.
src/coreclr/vm/callstubgenerator.cpp Treats VectorcallMemberFunction as an unmanaged “this” callconv.
src/coreclr/vm/callconvbuilder.cpp Adds vectorcall parsing/building support and member-function mapping.
src/coreclr/tools/Common/TypeSystem/Interop/UnmanagedCallingConventions.cs Adds Vectorcall enum + encoding/decoding logic for type-system tools.
src/coreclr/jit/targetx86.h / targetx86.cpp Defines/implements x86 vectorcall classifier (XMM0–XMM5 for float/SIMD).
src/coreclr/jit/targetamd64.h / targetamd64.cpp Enables multireg infrastructure on Windows AMD64 and adds x64 vectorcall classifier + HVA handling.
src/coreclr/jit/morph.cpp Selects vectorcall classifier based on unmanaged callconv and pre-scans for discontiguous HVA allocation.
src/coreclr/jit/lsrabuild.cpp Adjusts return-type handling for vectorcall HVA cases and FIELD_LIST call-arg handling.
src/coreclr/jit/lower.cpp Relaxes Windows-xarch multireg-return asserts for vectorcall SIMD/HVA returns.
src/coreclr/jit/lclvars.cpp Uses vectorcall classifier for parameter ABI classification when applicable.
src/coreclr/jit/layout.h Expands SIMD register-type mapping to include 32-byte SIMD on xarch.
src/coreclr/jit/importercalls.cpp Always records unmanaged callconv on call nodes for correct return handling.
src/coreclr/jit/gentree.cpp Names new callconvs; adjusts return register selection for vectorcall HVA returns on Windows AMD64.
src/coreclr/jit/compiler.h / compiler.cpp Adds helper routines for vectorcall HVA detection and vectorcall struct-return rules.
src/coreclr/jit/codegenxarch.cpp Uses varTypeUsesFloatReg for selecting return regs and relaxes struct assertion for SIMD.
src/coreclr/jit/codegencommon.cpp Makes FIELD_LIST reg-arg placement unconditional (needed for vectorcall HVAs).
src/coreclr/jit/abi.h / abi.cpp Declares vectorcall classifiers and extends register-type inference for SIMD sizes.
src/coreclr/inc/corinfo.h Adds Vectorcall and VectorcallMemberFunction to CorInfoCallConvExtension.
src/coreclr/inc/corhdr.h Adds CMOD_CALLCONV_NAME_VECTORCALL.

Comment on lines 353 to 355

// Vectorcall is supported on Windows x86
#define VECTORCALL_SUPPORT
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VECTORCALL_SUPPORT is defined unconditionally for all x86 builds, but vectorcall is a Windows-only calling convention. This enables the vectorcall ABI classifier on Unix x86 (UNIX_X86_ABI), which can lead to incorrect ABI decisions if Vectorcall is ever specified there. Gate this define to Windows x86 (e.g., #if !defined(UNIX_X86_ABI) or an equivalent OS-specific build define).

Suggested change
// Vectorcall is supported on Windows x86
#define VECTORCALL_SUPPORT
#if !defined(UNIX_X86_ABI)
// Vectorcall is supported on Windows x86
#define VECTORCALL_SUPPORT
#endif // !UNIX_X86_ABI

Copilot uses AI. Check for mistakes.
Comment on lines +1069 to +1074
case 8:
useType = TYP_DOUBLE; // 8-byte vector type
break;
case 16:
useType = TYP_SIMD16; // __m128 / Vector128
break;
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the vectorcall return-type special-casing, 8-byte SIMD structs are mapped to TYP_DOUBLE. This risks treating a Vector64/__m64-like value as an FP scalar (affecting IR typing and optimizations). Use the corresponding SIMD type (e.g., TYP_SIMD8) for 8-byte vector returns so the JIT preserves vector semantics consistently with the 16/32/64-byte cases.

Copilot uses AI. Check for mistakes.
// Stack alignment: SIMD types need proper alignment
m_stackArgSize = roundUp(m_stackArgSize, 16);
unsigned stackSize = max(typeSize, (unsigned)TARGET_POINTER_SIZE);
segment = ABIPassingSegment::OnStack(m_stackArgSize, 0, typeSize);
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When spilling a vector/float argument to the stack (position >= 6), stackSize is computed as max(typeSize, TARGET_POINTER_SIZE). For TYP_SIMD12 (Vector3) this will reserve only 12 bytes even though SIMD12 values are 16-byte aligned/padded on xarch, which will skew subsequent stack argument offsets. Consider rounding up the stack slot size to the required SIMD alignment (e.g., 16) for SIMD12 and similar cases.

Suggested change
segment = ABIPassingSegment::OnStack(m_stackArgSize, 0, typeSize);
if (isVectorType)
{
// Ensure the reserved stack slot respects SIMD alignment/padding (e.g., SIMD12 -> 16 bytes)
stackSize = roundUp(stackSize, 16);
}
segment = ABIPassingSegment::OnStack(m_stackArgSize, 0, typeSize);

Copilot uses AI. Check for mistakes.
Comment on lines 679 to 681
unsigned stackSize = roundUp(structLayout->GetSize(), (unsigned)TARGET_POINTER_SIZE);
ABIPassingSegment segment = ABIPassingSegment::OnStack(m_stackArgSize, 0, structLayout->GetSize());
m_stackArgSize += stackSize;
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the HVA path when there aren’t enough XMM registers, the entire HVA is placed on the stack using only pointer-size rounding. Since the HVA elements are SIMD values, the stack slot likely needs the same 16-byte alignment/padding used for other SIMD stack arguments; otherwise later stack arg offsets/alignment can be incorrect. Consider aligning m_stackArgSize to 16 and rounding the stack slot size appropriately when spilling HVAs.

Suggested change
unsigned stackSize = roundUp(structLayout->GetSize(), (unsigned)TARGET_POINTER_SIZE);
ABIPassingSegment segment = ABIPassingSegment::OnStack(m_stackArgSize, 0, structLayout->GetSize());
m_stackArgSize += stackSize;
// HVAs consist of SIMD elements; ensure their stack slot is 16-byte aligned,
// consistent with other SIMD stack arguments, to keep later stack arg
// offsets and alignment correct.
const unsigned hvaAlignment = 16;
m_stackArgSize = roundUp(m_stackArgSize, hvaAlignment);
unsigned stackSize = roundUp(structLayout->GetSize(), hvaAlignment);
ABIPassingSegment segment = ABIPassingSegment::OnStack(m_stackArgSize, 0, structLayout->GetSize());
m_stackArgSize += stackSize;

Copilot uses AI. Check for mistakes.
Comment on lines 3 to 8
<!-- Needed for CMakeProjectReference -->
<RequiresProcessIsolation>true</RequiresProcessIsolation>
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
<!-- Vectorcall is only meaningful on Windows x86/x64, but the test handles this gracefully -->
<!-- On non-Windows, vectorcall falls back to the default calling convention -->
</PropertyGroup>
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test project doesn’t mark itself as unsupported on non-Windows targets, but its native dependency is only built under if(CLR_CMAKE_TARGET_WIN32) in src/tests/Interop/CMakeLists.txt. As a result, the test will run on non-Windows and fail to load VectorcallNative. Add a CLRTestTargetUnsupported condition for non-Windows (consistent with other Interop tests) or otherwise ensure the native asset is produced cross-platform.

Copilot uses AI. Check for mistakes.
Comment on lines +140 to +155
[Fact]
[ActiveIssue("https://github.com/dotnet/runtime/issues/91388", typeof(TestLibrary.PlatformDetection), nameof(TestLibrary.PlatformDetection.PlatformDoesNotSupportNativeTestAssets))]
public static int TestEntryPoint()
{
try
{
TestIntegerArgs();
TestFloatArgs();
TestDoubleArgs();
TestMixedIntFloat();
TestSixFloats();
TestSixDoubles();
TestReturnFloat();
TestReturnDouble();
TestReverseCallback();
}
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestEntryPoint unconditionally invokes P/Invokes into VectorcallNative. Since the native library is only built/available for Windows targets, this entry point needs a platform/architecture gate (e.g., skip/return 100 when not Windows x86/x64) to avoid runtime failures when the test is executed elsewhere.

Copilot uses AI. Check for mistakes.
@huoyaoyuan
Copy link
Member

I studied this recently. It seems that MSVC x64 uses YMM/ZMM for __m256/__m512 return values unconditionally even when AVX is not enabled, but on x86 when not enabled they are passed by reference if AVX not enabled.

The most reasonable to disable interop for Vector128/Vector256/Vector512 when hardware support is not enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants