[Cross] Broadcast Refactor to 1-to-1 numpy match#538
Open
Conversation
NumPy's broadcast_to is unilateral — it only stretches source dimensions that are size 1 to match the target shape. If the source has a dimension larger than the target, or more dimensions than the target, it raises ValueError. NumSharp's broadcast_to was delegating directly to the bilateral Broadcast(Shape, Shape) which allows both sides to stretch. Added ValidateBroadcastTo() helper called from all 9 broadcast_to overloads before the bilateral Broadcast call. The check enforces: - source ndim <= target ndim - each source dimension (right-aligned) must be 1 or equal to target This cannot live inside Broadcast() itself because arithmetic operations (a + b) require bilateral stretching of both operands. Verified with dotnet_run scripts against NumPy: broadcast_to(ones(3), (1,)) → now throws (was accepted) broadcast_to(ones(1,2), (2,1)) → now throws (was accepted) broadcast_to(ones(1,3), (2,3)) → still works (valid unilateral)
…ug 4) 2-arg Broadcast(Shape, Shape) threw NotSupportedException when either input was already broadcast. This blocked legitimate operations like np.clip on broadcast arrays (which internally re-broadcasts) and explicit re-broadcasting via broadcast_to. Changes to the 2-arg path: - Removed the IsBroadcasted guard at line 299 - When an input IsBroadcasted, resolve BroadcastInfo.OriginalShape as the root original for chain tracking — stride=0 dims from prior broadcasts naturally propagate through the stride computation loop - ViewInfo for sliced inputs now uses the resolved original shape Changes to the N-arg Broadcast(Shape[]) path: - Added ViewInfo handling for sliced inputs, matching the 2-arg path. Without this, N-arg broadcast_arrays with sliced inputs produced wrong values (GetOffset couldn't resolve slice strides) - Added re-broadcast support via BroadcastInfo.OriginalShape - Removed dead code: `it.size = tmp` (was immediately overwritten by ComputeHashcode which recalculates size from dimensions) Verified both paths produce identical results for sliced inputs: arange(12).reshape(3,4)[:,1:2] broadcast to (3,3) now correctly returns [[1,1,1],[5,5,5],[9,9,9]] through both 2-arg and N-arg paths. Re-broadcast chains tested up to triple depth: broadcast_to(broadcast_to(broadcast_to(x, s1), s2), s3) works.
Updated and added tests for the broadcast system audit: - BroadcastTo_UnilateralSemantics_RejectsInvalidCases: replaces the old BroadcastTo_BilateralBroadcast_KnownDiscrepancy test. Now verifies that (3,)→(1,), (1,2)→(2,1), and (1,1)→(1,) all throw, matching NumPy. - ReBroadcast_2Arg_SameShape: broadcast → re-broadcast same shape - ReBroadcast_2Arg_HigherDim: (3,1)→(3,3)→(2,3,3) chain - ReBroadcast_2Arg_ClipOnBroadcast: np.clip on broadcast (Bug 4 variant) - BroadcastArrays_NArg_SlicedInput_CorrectValues: N-arg path with sliced column input (was returning [0,0,0],[1,1,1],[2,2,2] instead of [1,1,1],[5,5,5],[9,9,9] due to missing ViewInfo) - BroadcastPaths_2Arg_vs_NArg_SlicedInput_Identical: verifies 2-arg and N-arg paths produce identical results for the same sliced input
New bugs found by running 65 stress tests against the broadcast system after Phase 3/4 fixes. All are pre-existing, not regressions. Bug 23a — reshape col-broadcast wrong element order: reshape(broadcast_to([[10],[20],[30]], (3,3)), (9,)) returns [10,20,30,10,20,30,...] instead of [10,10,10,20,20,20,...]. _reshapeBroadcast uses offset % OriginalShape.size modular arithmetic which walks original storage linearly instead of logical row-major. Workaround: np.copy(a).reshape(...). Bug 23b — np.abs on broadcast throws IncorrectShapeException: Cast creates UnmanagedStorage with mismatched shape size (broadcast size=6 vs storage size=0). The abs implementation doesn't handle broadcast arrays that have storage smaller than the broadcast shape. Bug 24 — transpose col-broadcast returns wrong values: broadcast_to([[10],[20],[30]], (3,3)).T returns [[10,10,10],...×3] instead of [[10,20,30],...×3]. Transpose materializes via Clone and creates plain strides [3,1], losing the stride=0 broadcast semantics. Should swap strides to [0,1] (zero-copy, like NumPy). Row-broadcast .T works by coincidence.
Entry point for planning the rewrite of NumSharp's view/offset resolution to match NumPy's base_offset + strides architecture. Current model: ViewInfo + BroadcastInfo chains with 6+ GetOffset code paths, recursive ParentShape resolution, and complex lazy-loaded UnreducedBroadcastedShape computation. Target model: base_offset (int) + strides[] + dimensions[]. Offset computation becomes a single loop: sum(stride[i] * coord[i]). All slice/broadcast/transpose operations just adjust the base_offset and strides — no chains, no special cases. The plan covers: - Investigation checklist (12 items): catalog all consumers, understand IArraySlice bounds checking, NDIterator relationship, reshape-after- slice interactions, generated template code, IsSliced/IsBroadcasted derivation from strides, memory management - Risk assessment with mitigations - Suggested incremental approach (prototype Shape2, verify parity, migrate) - NumPy reference files in src/numpy/ for each subsystem
Adds a 'GitHub Issues' section to .claude/CLAUDE.md that documents using `gh issue create` for SciSharp/NumSharp and notes GH_TOKEN availability via the env-tokens skill. Provides structured templates for Feature/Enhancement and Bug Report issues (checklists and fields such as overview, problem/proposal, evidence, scope, benchmarks, breaking changes, reproduction, expected/actual behavior, workaround, root cause, and related issues) to standardize reporting.
3f446e0 to
ae86956
Compare
…ape_unsafe, re-broadcast Shape.GetCoordinates: use dimension-based decomposition for broadcast shapes instead of stride-based, which breaks on zero-stride dims. Matches NumPy's PyArray_ITER_GOTO1D factor-based approach. NDArray.flatten (both overloads): guard broadcast arrays by delegating to np.ravel() — flat.copy() produced wrong element order, and non-clone path caused out-of-bounds reads on the small backing buffer. Default.Reduction.CumAdd: strip broadcast metadata via shape.Clean() before allocating the result array, preventing slice writes from going to a detached clone (IsBroadcasted clone path in GetViewInternal). NdArray.ReShape: fix reshape_unsafe to pass ref newshape instead of the instance's shape — was silently ignoring the requested shape. Tests: update re-broadcast test to expect success (not throw) after Bug 4 fix; fix GetCoordinates_Broadcasted to validate correct logical coordinates; clean up OpenBugs.cs (remove fixed bugs, keep reference comments).
Replace broken type-switch + GetCoordinates/GetOffset implementation with NumPy's empty_like + slice-copy approach. Fixes 5 tracked bugs: - Bug 27: np.roll returns int instead of NDArray - Bug 45: no-axis roll returns null - Bug 50: roll only supports Int32/Single/Double (now all 12 dtypes) - Bug 14a/b: broadcast roll produces zeros - Bug 19a/b: broadcast roll Data<T> reads garbage NDArray.roll.cs: rewritten from 104-line type-switch to 2-line delegation to np.roll(this, shift, axis). np.roll.cs: new 70-line static method — no axis: ravel→roll→reshape; with axis: empty_like + 2 slice-copy pairs (body shift + tail wrap). Handles negative axis, shift modulo, all dtypes via slicing. np.array_manipulation.cs: removed broken static np.roll that returned int. Add 110 roll tests (100 pass, 10 OpenBugs for multi-axis tuple shift API gap and empty 2D with axis=1). Add 52 ravel tests (50 pass, 2 OpenBugs for upstream Shape.IsContiguous too conservative on contiguous slices). Document C-order-only as architectural constraint in CLAUDE.md Key Design Decisions table.
…verload Fix aliasing bug: prototype.shape (raw int[]) was passed by reference to the new Shape, causing both arrays to share the same dimensions array. Now clones via (int[])prototype.shape.Clone(), matching full_like's existing pattern. Add shape override parameter (Shape shape = default): when provided, overrides the prototype's shape while preserving its dtype. Matches NumPy's empty_like(a, shape=(4,5)) signature. Add NPTypeCode overload: empty_like(NDArray, NPTypeCode, Shape) for callers that already have an NPTypeCode, avoiding Type→NPTypeCode conversion. Delegates to np.empty() for consistency. Add 103 tests verified against NumPy 2.4.2 ground truth covering: shape/dtype preservation (1D–4D, scalar), dtype override (Type and NPTypeCode, all 12 types), shape override (2D→1D/2D/3D, with dtype, same/diff size, scalar, broadcast/slice sources), empty arrays (zero-dim), sliced/broadcast/transposed prototypes, memory independence, writeability, aliasing fix verification, sibling contract comparison (zeros_like/ones_like), chained operations, and integration with np.roll pattern.
… 5 assertion bugs
FluentAssertions went proprietary after v8. AwesomeAssertions is the Apache 2.0
community fork — permanently free, actively maintained.
Package upgrade:
- FluentAssertions 5.10.3 → AwesomeAssertions 9.3.0 in csproj
- Renamed `using FluentAssertions` → `using AwesomeAssertions` across 83 test files
- Adapted to AA 9.x API: ReferenceTypeAssertions now requires (subject, AssertionChain)
constructor, Execute.Assertion replaced with AssertionChain field, Subject read-only
Bugs fixed in FluentExtension.cs:
- Bug 1: AllValuesBe error messages showed literal "0","1","2" instead of actual values
due to unescaped {0}/{1}/{2} inside $"" strings — fixed to {{0}}/{{1}}/{{2}}
- Bug 2: BeOfValuesApproximately all 12 dtype branches said "(dtype: Boolean)" — fixed
each branch to show correct dtype name (Byte, Int16, Double, etc.)
- Bug 3: NDArrayAssertions.Identifier returned "shape" (copy-paste) — changed to "ndarray"
- Bug 4: BeShaped(ITuple)/BeEquivalentTo had no bounds check — added dimension count
assertion before accessing dimensions[i] to prevent IndexOutOfRangeException
- Bug 5: BeShaped used order-insensitive BeEquivalentTo, so BeShaped(3,2) would pass
on a (2,3) shape — changed to order-sensitive Equal(). This correctly exposed 5
pre-existing NumSharp bugs (np.moveaxis, NewAxis indexing, SlicingWithNewAxis)
AA 9.x API compatibility fixes across test files:
- .Array.Should().ContainInOrder() → .Data<int>().Should().ContainInOrder() (typed)
- .Array.Should().BeEquivalentTo(.Array) → .Data<bool>().Should().Equal() (typed)
- .Should().BeInAscendingOrder() → .Data<double>().Should().BeInAscendingOrder()
- BeEquivalentTo(params) → BeEquivalentTo(new[]{}) for type inference
- BeLessOrEqualTo → BeLessThanOrEqualTo (renamed in AA 8.x)
- .And.HaveCount() → .Which.Should().HaveCount() (chain semantics)
- Cast<T>().Should().BeEquivalentTo(NDArray) → .Should().Be(NDArray)
Infrastructure:
- Added FluentExtensionTests.cs with 72 tests covering all custom assertion methods,
error message quality (catches Bug 1/2 regressions), chaining, all 12 dtypes,
edge cases (scalar, sliced, broadcast, 2D), UnmanagedStorage entry point
- Removed OpenBugs.DeprecationAudit.cs (duplicate method names conflicting with OpenBugs.cs)
Test results: 1644 passed, 5 failed (pre-existing), 34 skipped — both net8.0 and net10.0
…new tests Correctness fixes in FluentExtension.cs: - Fix NotBe/NotBeShaped error messages: was "Expected shape to be X" when shapes ARE equal — now correctly says "Did not expect shape to be X" - Fix UInt64 overflow in BeOfValuesApproximately: unsigned subtraction (expected - nextval) wraps on underflow; cast to double before subtraction - Remove dead System.IO import New assertion capabilities added to ShapeAssertions and NDArrayAssertions: - BeContiguous() / NotBeContiguous() — asserts Shape.IsContiguous - HaveStrides(params int[]) — asserts exact stride values - BeEmpty() — asserts size == 0 (NDArrayAssertions only) - NotBe(NDArray) — complement to Be(), uses np.array_equal negation - NotBeOfType(NPTypeCode) / NotBeOfType<T>() — complement to BeOfType New infrastructure tests (16, total now 88): - Contiguous assertions: fresh array, sliced step, shape-level (4) - HaveStrides: shape pass, shape fail, ndarray pass (3) - BeEmpty: empty pass, non-empty fail (2) - NotBeOfType: mismatch pass, match fail, generic form (3) - NotBe: different pass, equal fail (2) - Error message correctness: NotBe/NotBeShaped say "Did not expect" (2) - UInt64 overflow regression: both directions (3UL vs 5UL) (1) All 88 infrastructure tests pass on net8.0 and net10.0. Full suite: 1644 pass, 5 fail (pre-existing NumSharp bugs), 34 skipped.
# Conflicts: # test/NumSharp.UnitTest/Selection/NDArray.Indexing.Test.cs
…rge files These 4 test files were added to broadcast-refactor after the tests branch diverged, so they weren't included in the AwesomeAssertions migration. The merge brought AwesomeAssertions as the package but these files still referenced `using FluentAssertions` — renamed to `using AwesomeAssertions`. Files fixed: - np.empty_like.Test.cs - np.ravel.Test.cs - np.reshape.Test.cs (new file, untracked) - np.roll.Test.cs
… correct offset resolution _reshapeBroadcast previously only set ViewInfo when the broadcast shape was also sliced (guarded by `if (IsSliced)`). Without ViewInfo, the reshaped shape's GetOffset fell through to the `offset % OriginalShape.size` modular arithmetic path, which happened to produce correct results for row broadcasts (where data is already laid out linearly) but produced wrong element ordering for column broadcasts and other non-trivial broadcast patterns. The fix removes the `if (IsSliced)` guard so ViewInfo is always set. This forces offset resolution through the recursive GetOffset path, which walks up to the parent broadcast shape and uses its strides (with zeros for broadcast dimensions) to compute the correct physical offset via GetCoordinates → parent.GetOffset. Validated against NumPy 2.4.2 output across 80+ individual checks: - Column, row, scalar, 3D, 4D, 5D broadcast reshapes - Slice→broadcast→reshape, broadcast→slice→reshape chains - Step slices, reverse slices, non-contiguous sources - Double/triple reshape chains, copy equivalence - All access patterns: flat, ToString, ravel, multi-dim indexing, copy Also adds comprehensive np.reshape test suite (61 tests) covering basic reshapes, -1 dimension inference, view semantics, scalar/empty arrays, sliced+reshape, broadcast+reshape, all 12 dtypes, large arrays, error cases, static vs instance API, transposed arrays.
Bug 66 (3 tests): swapaxes produces C-contiguous strides instead of permuted strides. For arange(24).reshape(2,3,4) with strides [12,4,1], swapaxes(0,2) should give [1,4,12] but gives [6,2,1]. Root cause: Default.Transpose.cs allocates new C-contiguous storage and copies data via MultiIterator.Assign, discarding the permuted strides. Direct consequence of Bug 64 (transpose copies instead of returning a view). Bug 67 (1 test): swapaxes on 0D scalar succeeds instead of throwing. NumPy scalar has shape=(), ndim=0 so any axis is out of bounds. NumSharp represents scalars as shape=[1], ndim=1, so swapaxes(0,0) is valid. Bug 68 (2 tests): swapaxes on empty arrays (shape with 0 dimension) crashes with InvalidOperationException from NDIterator. NumPy handles this correctly — just swaps dimensions. Resolves automatically when Bug 64 is fixed (no iteration needed for view). Bug 69 (2 tests): Out-of-bounds axis throws IndexOutOfRangeException (accidental leak from array access) instead of descriptive AxisError. Root cause: check_and_adjust_axis only adjusts negative indices but never validates bounds.
Migrate the test suite (156 files, ~2,076 tests) from MSTest to TUnit,
a modern .NET testing framework using source generators instead of reflection.
**csproj changes:**
- Add TUnit 1.13.11 as test framework (source-generated test discovery)
- Add OutputType=Exe (required by TUnit's Microsoft.Testing.Platform)
- Add TUnitAssertionsImplicitUsings=false (prevent TUnit.Assertions.Assert
from conflicting with MSTest's Assert class)
- Remove MSTest.TestAdapter 2.1.1 (replaced by TUnit engine)
- Remove Microsoft.NET.Test.Sdk 16.7.1 (replaced by Microsoft.Testing.Platform)
- Remove coverlet.collector 1.3.0 (incompatible with TUnit)
- Keep MSTest.TestFramework 2.1.1 for Assert.* compatibility (1,252 calls
across 85+ files — converting these risks argument-reorder bugs)
- Keep AwesomeAssertions 9.3.0 (~3,689 .Should() calls unchanged)
**New files:**
- global.json: Microsoft.Testing.Platform runner config, required for
`dotnet test` on .NET 10 SDK (MTP mode replaces VSTest)
- AssemblyAttributes.cs: [assembly: NotInParallel] disables TUnit's
default parallel execution for safety (MSTest ran sequentially)
**Attribute replacements across 156 test files:**
- [TestClass] → deleted (152 lines, TUnit doesn't need class-level markers)
- [TestMethod] → [Test] (2,056 occurrences)
- [DataTestMethod] → [Test] (11 files)
- [DataRow(] → [Arguments(] (195 parameterized test rows)
- [TestCategory(] → [Category(] (34 occurrences)
- [Ignore] / [Ignore("...")] → [Skip("...")] (12 occurrences)
- [ExpectedException(...)] → deleted (3 in np.any.Test.cs, all OpenBugs)
- [TestMethod, Ignore("...")] → [Test, Skip("...")] (combined attrs)
- [TestMethod, Timeout(10000)] → [Test, TUnit.Core.Timeout(10000)]
with CancellationToken parameter (TUnit requirement)
**Compile-time fixes:**
- TestClass.cs: Fully qualify System.Reflection.Assembly in 3 methods
to resolve ambiguity with TUnit's HookType.Assembly enum member
- Shape.Test.cs: Add CancellationToken parameter + System.Threading using
for TUnit's [Timeout] attribute requirement
**Test results (both net8.0 and net10.0):**
total: 2,076 | passed: 2,040 | failed: 25 (all pre-existing) | skipped: 11
All 25 failures are pre-existing dead-code/known-bug tests (AND/OR operators,
isnan/isfinite/isclose/allclose, memory allocation, broadcast/newaxis).
**Usage changes:**
- `dotnet test --project <path> --treenode-filter "/*/*/*/*[Category!=OpenBugs]"`
replaces the old `--filter "TestCategory!=OpenBugs"` syntax
- `dotnet run --project <path> -- --treenode-filter ...` also works directly
… for TUnit
Enable TUnit's default parallel test execution by removing the
[assembly: NotInParallel] guard. Tests run ~43% faster in parallel
(~8s vs ~14s sequential for 2,076 tests).
**Parallel race condition fixes:**
- np.load.Test.cs: Add [NotInParallel] on NumpyLoad class — tests share
a read-only data file (data/1-dim-int32_4_comma_empty.npy) that np.Load
opens with exclusive access
- np.tofromfile.Test.cs: Fix copy-paste bug in NumpyToFromFileTestUShort1
that used nameof(NumpyToFromFileTestByte1) — both tests wrote to the same
file "test.NumpyToFromFileTestByte1" causing race conditions
**WindowsOnly platform auto-skip:**
- Add WindowsOnlyAttribute (extends TUnit.Core.SkipAttribute) that
auto-skips tests on non-Windows via OperatingSystem.IsWindows()
- Replace [Category("WindowsOnly")] with [WindowsOnly] on 3 bitmap
test classes (BitmapExtensionsTests, BitmapWithAlphaTests, OpenBugsBitmap)
- Eliminates need for separate CI filter logic per OS
**CI workflow update (build-and-release.yml):**
- Switch from `dotnet test --filter "TestCategory!=OpenBugs"` (VSTest) to
`dotnet run -- --treenode-filter "/*/*/*/*[Category!=OpenBugs]"` (MTP)
- Remove per-OS filter matrix (WindowsOnly now handled by runtime skip)
- Simplify matrix to just os: [windows-latest, ubuntu-latest, macos-latest]
- Add --report-trx for TRX artifact upload
**Stability:** 8 consecutive runs (5 net10.0 + 3 net8.0), all identical:
2,076 total | 2,040 passed | 25 failed (pre-existing) | 11 skipped
Closes #539
The previous CI config used `dotnet run` without --framework, which only runs one TFM. Split into two explicit steps (net8.0 and net10.0) to ensure both target frameworks are tested on all 3 OS runners.
Targeted optimizations on the tests dominating wall-clock time:
**Allocate_1GB (1,113ms → 70ms, 16x faster):**
np.ones → np.empty — test verifies large allocation succeeds,
not that 4GB of memory is filled with ones
**GcDoesntCollectArraySliceAlone (361ms → 95ms, 3.8x faster):**
Reduce iterations from 100K+1M to 10K+100K — still 110K allocations
with GC.Collect + sleep, more than sufficient to test GC correctness
**Dot product tests (removed redundant work):**
- Remove Console.WriteLine(np.dot(x,y).ToString(false)) calls that
recomputed the entire dot product AND stringified the result array
- Dot2x2, Dot2222x2222, Dot3412x5621, Dot311x511: each was calling
np.dot twice — once for debug output, once for assertion
- Dot30_300x30_300: remove Stopwatch + Console.WriteLine benchmark
scaffolding — the test just verifies the operation completes
**Net effect on total suite (2,076 tests, Release, parallel):**
Before: ~8.0s wall clock
After: ~6.6s wall clock (18% faster)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes two broadcasting bugs to align NumSharp with NumPy 2.x semantics:
broadcast_tonow enforces unilateral validation — only source dimensions of size 1 can stretch, matching NumPy's behavior. Previously used bilateralBroadcast()which allowed invalid shape expansions silently.IsBroadcastedguard inBroadcast(Shape, Shape)that prevented re-broadcasting and brokenp.clipon broadcast arrays. Re-broadcast now resolves viaBroadcastInfo.OriginalShapechain tracking. Unified N-arg path withViewInfohandling for sliced inputs and removed dead code.Additionally includes:
base_offset + stridesmodelCloses #536
Key Changes
np.broadcast_to.csValidateBroadcastTo()helper, called from all 9 overloads before bilateralBroadcast()Default.Broadcasting.csIsBroadcastedthrow guard; resolve originals viaOriginalShape; addedViewInfofor sliced inputs in N-arg path; removed deadit.size = tmpNpBroadcastFromNumPyTests.csOpenBugs.csoffset-model-rewrite.mdTest Results
Test Plan
broadcast_torejects invalid unilateral broadcasts (e.g.,(2,3)→(4,3))broadcast_to(broadcast_to(a, shape1), shape2)np.clipon broadcast arrays no longer throws