ci: split integration tests into parallel LLVM and Haskell jobs with shared build#972
ci: split integration tests into parallel LLVM and Haskell jobs with shared build#972
Conversation
Add TEST_ARGS to the CI integration test step to skip: - test_exec_smir[*-llvm]: keep Haskell backend only, since it's the backend used for proving and bugs there have higher impact - test_prove_termination: the same 19 programs are already executed via test_exec_smir[*-haskell] This deselects 58 of 247 tests (39 LLVM exec + 19 prove_termination) without modifying any test code — tests remain available for local use. Expected CI time reduction: ~2h37m → ~1h20m. Resolves #971 (Phase 1)
448018a to
eb489e6
Compare
The parentheses in the -k expression were interpreted by the shell inside the docker exec / make pipeline. Rewrite the filter to avoid parentheses: "not llvm" is sufficient since only test_exec_smir tests have "llvm" in their test IDs.
I am not sure I understand what you mean here, could you clarify? |
I thin
Existing tests have been validated by both backends. If we assume that the backends are correct, the semantics may only cause problem because new rules will introduce nd. This case, haskell backend will produce different expected file and will show errors in CI. @mariaKt I don't this description is enough. Please let me know if you have more questions. |
Is it true that we can assume the backends are correct? I thought the main way that we were finding regressions in the backends was from the test suites of the semantics using them. I thought the main way @jberthold was finding out about problems in the haskell backend was from the KEVM test suite, and I feel that when Pi Squared changed the LLVM backend in the past we noticed it in our semantics tests. @ehildenb @palinatolmach what do you think? I am interested in the speed up, but is this the right way to go? It feels to me the solution we would really like is to have #853 implemented - but I don't know how likely that is |
|
I've split the test-suite across runners rather than removing it entirely, and upped the default parallelism. We should be getting better performance out of it now. I'm also going to add timeouts to each stage too. |
|
I've stratified the tests a bit. Someone shoudl check that all the expected test-suites are running somewhere, then we need to update the expected quality checks. This should run significantly faster than it was before. |
| - name: 'Haskell Exec SMIR' | ||
| test-args: '-k "test_exec_smir and haskell"' | ||
| parallel: 6 | ||
| timeout: 20 |
There was a problem hiding this comment.
These are the tests originally proposed to be removed I believe. In other semantics, we do not test the concrete tests on both backends, they are just to test that the semantics is correct (assuming the backends agree on them). But this is also not dominating execution time, the Haskell Proofs phase does.
|
New testsuite has:
Total: 59 + 39 + 19 + 93 + 124 = 334 tests This makes me think we're running some tests redundantly. |
dkcumming
left a comment
There was a problem hiding this comment.
Okay I am going to approve this as it is blocking other PRs for merging with the newly added branch protection rules
I did some checking of logs and I feel that no tests are getting skipped.
@ehildenb @Stevengre I think that the description of this PR need to be changed to reflect the actual changes that are being merged. Right now it is misleading to what the PR actually does.
Also I think the timeouts are going to need to be immediately increased as we will be adding many tests for verification challenges asap and hopefully the ui test suite will be getting included at some point soon too.
Summary
kompilebuild into a dedicatedbuildjob that pushes a pre-built Docker image to GHCR, avoiding duplicate compilation across test jobsApproach
Instead of the original approach of skipping tests via pytest
-kfilters, this PR restructures the CI workflow to:buildjob runsmake stable-mir-json build, commits the Docker container state, and pushes it toghcr.io/runtimeverification/mir-semantics/ci:<sha>integration-tests-llvmandintegration-tests-haskelljobs pull the pre-built image and run their respective test suites (test-integration-llvm/test-integration-haskell) concurrentlyThis eliminates the ~40min duplicate
kompilethat previously ran before each test suite, while keeping full test coverage.Test plan
Resolves #971