Skip to content

CI: Add benchmark reports to releases and regression detection on main #543

@Nucs

Description

@Nucs

Overview

Integrate the benchmarking infrastructure (#542) into the CI/CD pipeline to automatically detect performance regressions and include benchmark reports as release artifacts.

Parent Issue

This is a sub-issue of #542 (Benchmarking Infrastructure).

Proposal

1. Benchmark Stage on Main Branch

Add a GitHub Actions workflow that runs on every push/merge to main:

name: Performance Regression Check

on:
  push:
    branches: [main, master]
  pull_request:
    branches: [main, master]

jobs:
  benchmark:
    runs-on: ubuntu-latest  # or windows-latest for consistency
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup .NET
        uses: actions/setup-dotnet@v4
        with:
          dotnet-version: '10.0.x'
          
      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'
          
      - name: Install NumPy
        run: pip install numpy tabulate
        
      - name: Run Benchmarks
        run: |
          cd benchmark
          pwsh -File run-benchmarks.ps1 -Quick
          
      - name: Upload Benchmark Report
        uses: actions/upload-artifact@v4
        with:
          name: benchmark-report
          path: benchmark/benchmark-report.md
          
      - name: Compare with Baseline
        run: |
          # Compare current results with stored baseline
          # Alert if regression > 10%

2. Release Artifacts

Include benchmark reports in GitHub Releases:

- name: Attach Benchmark Report to Release
  uses: softprops/action-gh-release@v1
  with:
    files: |
      benchmark/benchmark-report.md
      benchmark/NumSharp.Benchmark.GraphEngine/BenchmarkDotNet.Artifacts/results/*.html

3. Regression Detection

Implement baseline comparison:

  • Store baseline results in benchmark/baseline.json
  • Compare current run against baseline
  • Fail CI if regression > 10% on key metrics
  • Auto-update baseline on release tags
# Add to run-benchmarks.ps1
param(
    [switch]$CompareBaseline,
    [string]$BaselinePath = 'baseline.json',
    [int]$RegressionThreshold = 10  # percent
)

if ($CompareBaseline) {
    $baseline = Get-Content $BaselinePath | ConvertFrom-Json
    $regressions = @()
    
    foreach ($metric in $current.Keys) {
        $change = (($current[$metric] - $baseline[$metric]) / $baseline[$metric]) * 100
        if ($change -gt $RegressionThreshold) {
            $regressions += "$metric regressed by $([math]::Round($change, 1))%"
        }
    }
    
    if ($regressions.Count -gt 0) {
        Write-Error "Performance regressions detected:`n$($regressions -join "`n")"
        exit 1
    }
}

4. PR Comments

Post benchmark results as PR comments:

- name: Comment PR with Benchmark Results
  uses: actions/github-script@v7
  if: github.event_name == 'pull_request'
  with:
    script: |
      const fs = require('fs');
      const report = fs.readFileSync('benchmark/benchmark-report.md', 'utf8');
      
      // Extract summary section
      const summary = report.match(/## Key Comparisons[\s\S]*?(?=##|$)/)[0];
      
      github.rest.issues.createComment({
        issue_number: context.issue.number,
        owner: context.repo.owner,
        repo: context.repo.repo,
        body: `## 📊 Benchmark Results\n\n${summary}\n\n<details><summary>Full Report</summary>\n\n${report}\n</details>`
      });

Implementation Checklist

  • Create .github/workflows/benchmark.yml
  • Add benchmark step to release workflow
  • Create benchmark/baseline.json with current results
  • Add --compare-baseline flag to run-benchmarks.ps1
  • Add regression threshold configuration
  • Add PR comment integration
  • Document the CI workflow in benchmark README

Key Metrics to Track

Metric Baseline Threshold
np.add (int32, N=10M) TBD ±10%
aa + 2b (float64, N=10M) TBD ±10%
np.var (float64, N=10M) TBD ±10%
Memory allocation per op TBD ±20%

Benefits

  1. Automated regression detection - Catch performance issues before release
  2. Historical tracking - See performance trends over time
  3. Release documentation - Users can see benchmark results per version
  4. PR feedback - Contributors see impact of their changes
  5. Data-driven decisions - Objective metrics for optimization PRs

Related

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestinfrastructureCI/CD, build system, testing framework, tooling

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions