ci: split integration tests into parallel LLVM and Haskell jobs with shared build by Stevengre · Pull Request #972 · runtimeverification/mir-semantics

Stevengre · 2026-03-04T15:14:03Z

Summary

Extract the kompile build into a dedicated build job that pushes a pre-built Docker image to GHCR, avoiding duplicate compilation across test jobs
Split integration tests into two parallel jobs — LLVM and Haskell — each pulling the shared image
All tests still run; no tests are skipped or removed

Approach

Instead of the original approach of skipping tests via pytest -k filters, this PR restructures the CI workflow to:

Build once: A new build job runs make stable-mir-json build, commits the Docker container state, and pushes it to ghcr.io/runtimeverification/mir-semantics/ci:<sha>
Test in parallel: integration-tests-llvm and integration-tests-haskell jobs pull the pre-built image and run their respective test suites (test-integration-llvm / test-integration-haskell) concurrently

This eliminates the ~40min duplicate kompile that previously ran before each test suite, while keeping full test coverage.

Test plan

CI pipeline passes with the new parallel structure
Both LLVM and Haskell integration test suites complete successfully
No tests are lost compared to the previous single-job configuration

Resolves #971

Add TEST_ARGS to the CI integration test step to skip: - test_exec_smir[*-llvm]: keep Haskell backend only, since it's the backend used for proving and bugs there have higher impact - test_prove_termination: the same 19 programs are already executed via test_exec_smir[*-haskell] This deselects 58 of 247 tests (39 LLVM exec + 19 prove_termination) without modifying any test code — tests remain available for local use. Expected CI time reduction: ~2h37m → ~1h20m. Resolves #971 (Phase 1)

The parentheses in the -k expression were interpreted by the shell inside the docker exec / make pipeline. Rewrite the filter to avoid parentheses: "not llvm" is sufficient since only test_exec_smir tests have "llvm" in their test IDs.

mariaKt · 2026-03-05T22:54:39Z

LLVM backend regression will be missed in the previous test, which should be handled by future test framework refactoring. But if we don't add new exec_smir test or add new exec_smir test with llvm to update expected files, the result is the same as we run CI before.

I am not sure I understand what you mean here, could you clarify?

Stevengre · 2026-03-06T03:01:50Z

LLVM backend regression will be missed in the previous test, which should be handled by future test framework refactoring.

I thin exec_smir test is just for mir-semantics availability of llvm backend. If we assume the backends are correct, what we need to do is just leaving some quick tests for llvm backend to make sure our semantics can run on it. That's what I mean about refactoring. But just a thought for now.

But if we don't add new exec_smir test or add new exec_smir test with llvm to update expected files, the result is the same as we run CI before.

Existing tests have been validated by both backends. If we assume that the backends are correct, the semantics may only cause problem because new rules will introduce nd. This case, haskell backend will produce different expected file and will show errors in CI.

@mariaKt I don't this description is enough. Please let me know if you have more questions.

dkcumming · 2026-03-06T05:55:27Z

If we assume the backends are correct

Is it true that we can assume the backends are correct? I thought the main way that we were finding regressions in the backends was from the test suites of the semantics using them. I thought the main way @jberthold was finding out about problems in the haskell backend was from the KEVM test suite, and I feel that when Pi Squared changed the LLVM backend in the past we noticed it in our semantics tests. @ehildenb @palinatolmach what do you think? I am interested in the speed up, but is this the right way to go? It feels to me the solution we would really like is to have #853 implemented - but I don't know how likely that is

ehildenb · 2026-03-18T16:55:28Z

I've split the test-suite across runners rather than removing it entirely, and upped the default parallelism. We should be getting better performance out of it now. I'm also going to add timeouts to each stage too.

ehildenb · 2026-03-18T18:43:35Z

I've stratified the tests a bit. Someone shoudl check that all the expected test-suites are running somewhere, then we need to update the expected quality checks. This should run significantly faster than it was before.

ehildenb · 2026-03-18T20:28:10Z

.github/workflows/test.yml

+          - name: 'Haskell Exec SMIR'
+            test-args: '-k "test_exec_smir and haskell"'
+            parallel: 6
+            timeout: 20


These are the tests originally proposed to be removed I believe. In other semantics, we do not test the concrete tests on both backends, they are just to test that the semantics is correct (assuming the backends agree on them). But this is also not dominating execution time, the Haskell Proofs phase does.

ehildenb · 2026-03-18T20:31:24Z

New testsuite has:

LLVM Concrete Tests: 59 tests in 10.5m.
Haskell Exec SMIR: 39 tests in 14m.
Haskell Termination tests: 19 tests in 5m.
Haskell Proof tetss: 93 tests in 20.5m.
Remaining Integration tests: 124 in 10m.

Total: 59 + 39 + 19 + 93 + 124 = 334 tests

This makes me think we're running some tests redundantly.

dkcumming

Okay I am going to approve this as it is blocking other PRs for merging with the newly added branch protection rules

I did some checking of logs and I feel that no tests are getting skipped.

@ehildenb @Stevengre I think that the description of this PR need to be changed to reflect the actual changes that are being merged. Right now it is misleading to what the PR actually does.

Also I think the timeouts are going to need to be immediately increased as we will be adding many tests for verification challenges asap and hopefully the ui test suite will be getting included at some point soon too.

Stevengre force-pushed the jh/reduce-integration-test-time branch 2 times, most recently from 448018a to eb489e6 Compare March 4, 2026 15:16

Stevengre marked this pull request as draft March 4, 2026 15:17

Stevengre self-assigned this Mar 4, 2026

Stevengre added 2 commits March 4, 2026 23:22

fix(ci): quote pytest -k expression in TEST_ARGS

871776b

Stevengre requested review from dkcumming, ehildenb and mariaKt March 5, 2026 02:36

Stevengre marked this pull request as ready for review March 5, 2026 02:36

Stevengre requested a review from palinatolmach March 6, 2026 02:58

ehildenb added 2 commits March 18, 2026 16:50

.github/workflows/test.yml: split test-suite rather than removing

dc5eda8

.github/workflows/test: increase test parallelism

fc600b9

ehildenb added 6 commits March 18, 2026 17:24

.github/workflows/test.yml: more stratification of the tests

c6707c4

.github/test.yml: more stratification

171ec97

.github/test: fix quotations

ea89e5f

.github/test: refine test.yml more

d64d30e

.github/test: compact tests, up timeouts for testing

dd534bf

.github/test: tighten timeouts

e4d0cd8

ehildenb reviewed Mar 18, 2026

View reviewed changes

Stevengre mentioned this pull request Mar 19, 2026

ci: split integration tests into parallel LLVM and Haskell jobs #993

Closed

dkcumming and others added 2 commits March 20, 2026 14:04

Merge branch 'master' into jh/reduce-integration-test-time

f47db99

.github/test: adjust timeouts

6026f61

ehildenb added the automerge label Mar 20, 2026

dkcumming approved these changes Mar 20, 2026

View reviewed changes

dkcumming merged commit f0257bc into master Mar 20, 2026
11 checks passed

dkcumming deleted the jh/reduce-integration-test-time branch March 20, 2026 20:24

Stevengre changed the title ~~perf(test): remove redundant integration test executions~~ ci: split integration tests into parallel LLVM and Haskell jobs with shared build Mar 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: split integration tests into parallel LLVM and Haskell jobs with shared build#972

ci: split integration tests into parallel LLVM and Haskell jobs with shared build#972
dkcumming merged 13 commits intomasterfrom
jh/reduce-integration-test-time

Stevengre commented Mar 4, 2026 •

edited

Loading

Uh oh!

mariaKt commented Mar 5, 2026 •

edited

Loading

Uh oh!

Stevengre commented Mar 6, 2026

Uh oh!

dkcumming commented Mar 6, 2026 •

edited

Loading

Uh oh!

ehildenb commented Mar 18, 2026

Uh oh!

ehildenb commented Mar 18, 2026

Uh oh!

ehildenb Mar 18, 2026

Uh oh!

ehildenb commented Mar 18, 2026 •

edited

Loading

Uh oh!

dkcumming left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Stevengre commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Approach

Test plan

Uh oh!

mariaKt commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Stevengre commented Mar 6, 2026

Uh oh!

dkcumming commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ehildenb commented Mar 18, 2026

Uh oh!

ehildenb commented Mar 18, 2026

Uh oh!

ehildenb Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

ehildenb commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dkcumming left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Stevengre commented Mar 4, 2026 •

edited

Loading

mariaKt commented Mar 5, 2026 •

edited

Loading

dkcumming commented Mar 6, 2026 •

edited

Loading

ehildenb commented Mar 18, 2026 •

edited

Loading