Add __vectorcall ABI support for Windows x64 by reedz · Pull Request #124130 · dotnet/runtime

reedz · 2026-02-07T14:42:12Z

Adds initial support for the __vectorcall calling convention on Windows x64, addressing #8300.

This introduces a new CallConvVectorcall type, a VectorcallX64Classifier that implements positional argument passing with up to 6 XMM/YMM registers for vector types, and Homogeneous Vector Aggregate (HVA) support for passing and returning structs of 2–4 identical SIMD fields in consecutive registers.

The scope of this PR is limited to x64 with Vector128/Vector256 support; an x86 classifier skeleton is included but not the focus. Covered by new interop tests for scalar PInvokes, Vector128/256 function pointer calls, and HVA2/3/4 patterns including discontiguous register allocation.

cc @tannergooding (issue author)

Implements support for the __vectorcall calling convention on Windows x64, addressing dotnet#8300. ## Changes ### Public API - New CallConvVectorcall class in System.Runtime.CompilerServices - Reference assembly updated with the new type ### VM / Type System - Added Vectorcall and VectorcallMemberFunction to CorInfoCallConvExtension - Calling convention recognition in callconvbuilder.cpp, stubgen.cpp, corelib.h - NativeAOT type system support in UnmanagedCallingConventions.cs ### JIT ABI Classifier - VectorcallX64Classifier in argetamd64.cpp implementing positional argument passing with 6 XMM registers (XMM0-XMM5) for vector types - HVA (Homogeneous Vector Aggregate) support: structs of 2-4 identical SIMD fields passed in consecutive unused XMM registers - Pre-scan mechanism for discontiguous HVA register allocation - VectorcallX86Classifier skeleton in argetx86.cpp ### JIT Integration - Return type handling for vectorcall SIMD types (8/16/32/64 bytes) and HVA returns - Multi-register return support via SPK_ByValueAsHfa for HVA structs - FEATURE_MULTIREG_ARGS/RET enabled on Windows x64 for vectorcall HVA support - Struct argument morphing for SIMD-compatible types in XMM/YMM registers - Codegen, lowering, and LSRA updates for vectorcall register allocation ### Tests - Scalar PInvoke tests (int, float, double, mixed args, callbacks) - Vector128/Vector256 function pointer tests - HVA2/HVA3/HVA4 tests with field inspection - Discontiguous HVA allocation test - FunctionPointer calling convention reflection tests ## Scope - Windows x64 only (x86 classifier skeleton included but not the focus) - Vector128 and Vector256 support - Vector512 support is structural but untested

dotnet-policy-service · 2026-02-07T14:47:32Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

Adds initial CoreCLR/runtime support for the Windows __vectorcall calling convention, including exposing a new CallConvVectorcall type and updating the JIT/VM/type-system to classify and pass SIMD vectors and HVAs in registers (primarily for x64).

Changes:

Introduces CallConvVectorcall and plumbs it through CoreLib ref, VM stub generation, and type-system call-conv encoding/decoding.
Adds JIT-side vectorcall ABI classification for x64 (positional + HVA register allocation) and an initial x86 classifier.
Adds new interop/native tests (CMake + managed) covering scalar, SIMD (Vector128/256), and HVA cases.

Reviewed changes

Copilot reviewed 36 out of 36 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
src/tests/Interop/VectorcallCallingConv/VectorcallVector128Test.csproj	New test project for Vector128/256 + HVA validation (Windows-gated).
src/tests/Interop/VectorcallCallingConv/VectorcallVector128Test.cs	Managed function-pointer tests for vectorcall SIMD/HVA passing/return.
src/tests/Interop/VectorcallCallingConv/VectorcallTest.csproj	New scalar/reverse-PInvoke vectorcall test project.
src/tests/Interop/VectorcallCallingConv/VectorcallTest.cs	Managed P/Invoke + reverse callback tests for vectorcall.
src/tests/Interop/VectorcallCallingConv/VectorcallPInvokes.cs	P/Invoke declarations using `UnmanagedCallConv(CallConvVectorcall)`.
src/tests/Interop/VectorcallCallingConv/VectorcallNative.def	Windows export definitions for native vectorcall test library.
src/tests/Interop/VectorcallCallingConv/VectorcallNative.cpp	Native implementations using `__vectorcall` and SIMD intrinsics.
src/tests/Interop/VectorcallCallingConv/CMakeLists.txt	Builds `VectorcallNative` and applies `.def` exports on Windows.
src/tests/Interop/CMakeLists.txt	Adds the new VectorcallCallingConv native subdir under Windows builds.
src/libraries/System.Runtime/ref/System.Runtime.cs	Public ref surface adds `CallConvVectorcall`.
src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/CallingConventions.cs	Implements/documentation for `CallConvVectorcall`.
src/libraries/Common/tests/System/FunctionPointerCallingConventionTests.cs	Validates modopt encoding/visibility for vectorcall function pointers.
src/coreclr/vm/stubgen.cpp	Emits vectorcall modopts into native function pointer signatures.
src/coreclr/vm/corelib.h	Adds CoreLib binder entry for `CallConvVectorcall`.
src/coreclr/vm/callstubgenerator.cpp	Treats VectorcallMemberFunction as an unmanaged “this” callconv.
src/coreclr/vm/callconvbuilder.cpp	Adds vectorcall parsing/building support and member-function mapping.
src/coreclr/tools/Common/TypeSystem/Interop/UnmanagedCallingConventions.cs	Adds `Vectorcall` enum + encoding/decoding logic for type-system tools.
src/coreclr/jit/targetx86.h / targetx86.cpp	Defines/implements x86 vectorcall classifier (XMM0–XMM5 for float/SIMD).
src/coreclr/jit/targetamd64.h / targetamd64.cpp	Enables multireg infrastructure on Windows AMD64 and adds x64 vectorcall classifier + HVA handling.
src/coreclr/jit/morph.cpp	Selects vectorcall classifier based on unmanaged callconv and pre-scans for discontiguous HVA allocation.
src/coreclr/jit/lsrabuild.cpp	Adjusts return-type handling for vectorcall HVA cases and FIELD_LIST call-arg handling.
src/coreclr/jit/lower.cpp	Relaxes Windows-xarch multireg-return asserts for vectorcall SIMD/HVA returns.
src/coreclr/jit/lclvars.cpp	Uses vectorcall classifier for parameter ABI classification when applicable.
src/coreclr/jit/layout.h	Expands SIMD register-type mapping to include 32-byte SIMD on xarch.
src/coreclr/jit/importercalls.cpp	Always records unmanaged callconv on call nodes for correct return handling.
src/coreclr/jit/gentree.cpp	Names new callconvs; adjusts return register selection for vectorcall HVA returns on Windows AMD64.
src/coreclr/jit/compiler.h / compiler.cpp	Adds helper routines for vectorcall HVA detection and vectorcall struct-return rules.
src/coreclr/jit/codegenxarch.cpp	Uses `varTypeUsesFloatReg` for selecting return regs and relaxes struct assertion for SIMD.
src/coreclr/jit/codegencommon.cpp	Makes FIELD_LIST reg-arg placement unconditional (needed for vectorcall HVAs).
src/coreclr/jit/abi.h / abi.cpp	Declares vectorcall classifiers and extends register-type inference for SIMD sizes.
src/coreclr/inc/corinfo.h	Adds `Vectorcall` and `VectorcallMemberFunction` to `CorInfoCallConvExtension`.
src/coreclr/inc/corhdr.h	Adds `CMOD_CALLCONV_NAME_VECTORCALL`.

Copilot · 2026-02-07T14:52:31Z

src/coreclr/jit/targetx86.h

+
+  // Vectorcall is supported on Windows x86
+  #define VECTORCALL_SUPPORT


VECTORCALL_SUPPORT is defined unconditionally for all x86 builds, but vectorcall is a Windows-only calling convention. This enables the vectorcall ABI classifier on Unix x86 (UNIX_X86_ABI), which can lead to incorrect ABI decisions if Vectorcall is ever specified there. Gate this define to Windows x86 (e.g., #if !defined(UNIX_X86_ABI) or an equivalent OS-specific build define).

Suggested change

// Vectorcall is supported on Windows x86

#define VECTORCALL_SUPPORT

#if !defined(UNIX_X86_ABI)

// Vectorcall is supported on Windows x86

#define VECTORCALL_SUPPORT

#endif // !UNIX_X86_ABI

Copilot · 2026-02-07T14:52:31Z

src/coreclr/jit/compiler.cpp

+                    case 8:
+                        useType = TYP_DOUBLE; // 8-byte vector type
+                        break;
+                    case 16:
+                        useType = TYP_SIMD16; // __m128 / Vector128
+                        break;


In the vectorcall return-type special-casing, 8-byte SIMD structs are mapped to TYP_DOUBLE. This risks treating a Vector64/__m64-like value as an FP scalar (affecting IR typing and optimizations). Use the corresponding SIMD type (e.g., TYP_SIMD8) for 8-byte vector returns so the JIT preserves vector semantics consistently with the 16/32/64-byte cases.

Copilot · 2026-02-07T14:52:32Z

src/coreclr/jit/targetamd64.cpp

+            // Stack alignment: SIMD types need proper alignment
+            m_stackArgSize = roundUp(m_stackArgSize, 16);
+            unsigned stackSize = max(typeSize, (unsigned)TARGET_POINTER_SIZE);
+            segment            = ABIPassingSegment::OnStack(m_stackArgSize, 0, typeSize);


When spilling a vector/float argument to the stack (position >= 6), stackSize is computed as max(typeSize, TARGET_POINTER_SIZE). For TYP_SIMD12 (Vector3) this will reserve only 12 bytes even though SIMD12 values are 16-byte aligned/padded on xarch, which will skew subsequent stack argument offsets. Consider rounding up the stack slot size to the required SIMD alignment (e.g., 16) for SIMD12 and similar cases.

Suggested change

segment = ABIPassingSegment::OnStack(m_stackArgSize, 0, typeSize);

if (isVectorType)

{

// Ensure the reserved stack slot respects SIMD alignment/padding (e.g., SIMD12 -> 16 bytes)

stackSize = roundUp(stackSize, 16);

}

segment = ABIPassingSegment::OnStack(m_stackArgSize, 0, typeSize);

Copilot · 2026-02-07T14:52:32Z

src/coreclr/jit/targetamd64.cpp

+                    unsigned stackSize = roundUp(structLayout->GetSize(), (unsigned)TARGET_POINTER_SIZE);
+                    ABIPassingSegment segment = ABIPassingSegment::OnStack(m_stackArgSize, 0, structLayout->GetSize());
+                    m_stackArgSize += stackSize;


In the HVA path when there aren’t enough XMM registers, the entire HVA is placed on the stack using only pointer-size rounding. Since the HVA elements are SIMD values, the stack slot likely needs the same 16-byte alignment/padding used for other SIMD stack arguments; otherwise later stack arg offsets/alignment can be incorrect. Consider aligning m_stackArgSize to 16 and rounding the stack slot size appropriately when spilling HVAs.

Suggested change

unsigned stackSize = roundUp(structLayout->GetSize(), (unsigned)TARGET_POINTER_SIZE);

ABIPassingSegment segment = ABIPassingSegment::OnStack(m_stackArgSize, 0, structLayout->GetSize());

m_stackArgSize += stackSize;

// HVAs consist of SIMD elements; ensure their stack slot is 16-byte aligned,

// consistent with other SIMD stack arguments, to keep later stack arg

// offsets and alignment correct.

const unsigned hvaAlignment = 16;

m_stackArgSize = roundUp(m_stackArgSize, hvaAlignment);

unsigned stackSize = roundUp(structLayout->GetSize(), hvaAlignment);

ABIPassingSegment segment = ABIPassingSegment::OnStack(m_stackArgSize, 0, structLayout->GetSize());

m_stackArgSize += stackSize;

Copilot · 2026-02-07T14:52:32Z

src/tests/Interop/VectorcallCallingConv/VectorcallTest.csproj

+    <!-- Needed for CMakeProjectReference -->
+    <RequiresProcessIsolation>true</RequiresProcessIsolation>
+    <AllowUnsafeBlocks>true</AllowUnsafeBlocks>
+    <!-- Vectorcall is only meaningful on Windows x86/x64, but the test handles this gracefully -->
+    <!-- On non-Windows, vectorcall falls back to the default calling convention -->
+  </PropertyGroup>


This test project doesn’t mark itself as unsupported on non-Windows targets, but its native dependency is only built under if(CLR_CMAKE_TARGET_WIN32) in src/tests/Interop/CMakeLists.txt. As a result, the test will run on non-Windows and fail to load VectorcallNative. Add a CLRTestTargetUnsupported condition for non-Windows (consistent with other Interop tests) or otherwise ensure the native asset is produced cross-platform.

Copilot · 2026-02-07T14:52:32Z

src/tests/Interop/VectorcallCallingConv/VectorcallTest.cs

+    [Fact]
+    [ActiveIssue("https://github.com/dotnet/runtime/issues/91388", typeof(TestLibrary.PlatformDetection), nameof(TestLibrary.PlatformDetection.PlatformDoesNotSupportNativeTestAssets))]
+    public static int TestEntryPoint()
+    {
+        try
+        {
+            TestIntegerArgs();
+            TestFloatArgs();
+            TestDoubleArgs();
+            TestMixedIntFloat();
+            TestSixFloats();
+            TestSixDoubles();
+            TestReturnFloat();
+            TestReturnDouble();
+            TestReverseCallback();
+        }


TestEntryPoint unconditionally invokes P/Invokes into VectorcallNative. Since the native library is only built/available for Windows targets, this entry point needs a platform/architecture gate (e.g., skip/return 100 when not Windows x86/x64) to avoid runtime failures when the test is executed elsewhere.

huoyaoyuan · 2026-02-07T15:41:31Z

I studied this recently. It seems that MSVC x64 uses YMM/ZMM for __m256/__m512 return values unconditionally even when AVX is not enabled, but on x86 when not enabled they are passed by reference if AVX not enabled.

The most reasonable to disable interop for Vector128/Vector256/Vector512 when hardware support is not enabled.

reedz requested a review from MichalStrehovsky as a code owner February 7, 2026 14:42

Copilot AI review requested due to automatic review settings February 7, 2026 14:42

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 7, 2026

Copilot started reviewing on behalf of reedz February 7, 2026 14:43 View session

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Feb 7, 2026

Copilot AI reviewed Feb 7, 2026

View reviewed changes

reedz added 2 commits February 7, 2026 23:00

merge main

8eaf7c7

copilot review fixes

2632881

build-analysis bot mentioned this pull request Feb 8, 2026

Test failure: baseservices/exceptions/stackoverflow/stackoverflowtester/stackoverflowtester.cmd #110173

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add __vectorcall ABI support for Windows x64#124130

Add __vectorcall ABI support for Windows x64#124130
reedz wants to merge 3 commits intodotnet:mainfrom
reedz:feature/vector-call-abi

reedz commented Feb 7, 2026 •

edited

Loading

Uh oh!

dotnet-policy-service bot commented Feb 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 7, 2026

Uh oh!

Copilot AI Feb 7, 2026

Uh oh!

Copilot AI Feb 7, 2026

Uh oh!

Copilot AI Feb 7, 2026

Uh oh!

Copilot AI Feb 7, 2026

Uh oh!

Copilot AI Feb 7, 2026

Uh oh!

huoyaoyuan commented Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		// Vectorcall is supported on Windows x86
		#define VECTORCALL_SUPPORT

-            segment            = ABIPassingSegment::OnStack(m_stackArgSize, 0, typeSize);
+            if (isVectorType)
+            {
+                // Ensure the reserved stack slot respects SIMD alignment/padding (e.g., SIMD12 -> 16 bytes)
+                stackSize = roundUp(stackSize, 16);
+            }
+            segment = ABIPassingSegment::OnStack(m_stackArgSize, 0, typeSize);

-                    unsigned stackSize = roundUp(structLayout->GetSize(), (unsigned)TARGET_POINTER_SIZE);
-                    ABIPassingSegment segment = ABIPassingSegment::OnStack(m_stackArgSize, 0, structLayout->GetSize());
-                    m_stackArgSize += stackSize;
+                    // HVAs consist of SIMD elements; ensure their stack slot is 16-byte aligned,
+                    // consistent with other SIMD stack arguments, to keep later stack arg
+                    // offsets and alignment correct.
+                    const unsigned hvaAlignment = 16;
+                    m_stackArgSize             = roundUp(m_stackArgSize, hvaAlignment);
+                    unsigned stackSize         = roundUp(structLayout->GetSize(), hvaAlignment);
+                    ABIPassingSegment segment  = ABIPassingSegment::OnStack(m_stackArgSize, 0, structLayout->GetSize());
+                    m_stackArgSize            += stackSize;

Conversation

reedz commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service bot commented Feb 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

huoyaoyuan commented Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

reedz commented Feb 7, 2026 •

edited

Loading