feat: Add AUDIT_ONLY model kind for multi-table validation#5362
Draft
Lewis-Lyons wants to merge 7 commits intoTobikoData:mainfrom
Draft
feat: Add AUDIT_ONLY model kind for multi-table validation#5362Lewis-Lyons wants to merge 7 commits intoTobikoData:mainfrom
Lewis-Lyons wants to merge 7 commits intoTobikoData:mainfrom
Conversation
Introduces a new model kind that validates data relationships across multiple tables without materializing results. Combines model benefits (DAG participation, dependencies) with audit behavior (validation only). - Add AUDIT_ONLY to ModelKindName enum and create AuditOnlyKind class - Implement AuditOnlyStrategy for execution without materialization - Add comprehensive unit and integration tests - Update documentation with usage examples and best practices - Add three example models to sushi project demonstrating use cases
|
|
- Handle potential None return from fetchone() properly - Apply ruff formatting
- Update model counts in analytics and integration tests - Account for 3 new AUDIT_ONLY models in sushi example - Fix snapshot count assertions
- Fix test_forward_only_plan_with_effective_date to handle audit_waiter_revenue_anomalies - Update assertions to check snapshot IDs in a set rather than exact order - Revert incorrect change to test_migrate_rows (uses fixtures, not live models)
This file should not have been committed to the repository. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
AUDIT_ONLY models are symbolic and don't create physical tables, but they still need to be included in plans so their validation queries can run. Changes: - Exclude symbolic models from missing_intervals in Plan to prevent them from being scheduled for backfill - Update integration tests to filter out AUDIT_ONLY models when counting new snapshots and checking intervals - Fix test validation to skip table existence checks for symbolic models - Distinguish between AUDIT_ONLY and EXTERNAL models (both symbolic but EXTERNAL models still track intervals) This ensures AUDIT_ONLY models serve their validation purpose without participating in the physical deployment lifecycle. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add AUDIT_ONLY Model Kind for Multi-Table Validation
Summary
This PR introduces a new
AUDIT_ONLYmodel kind to SQLMesh, addressing the gap in validating relationships between multiple tables without materializing unnecessary tables. This feature combines the benefits of models (DAG participation, dependencies, scheduling) with audit behavior (validation without materialization).Problem Statement
Previously, SQLMesh users had to choose between:
Solution
The
AUDIT_ONLYmodel kind enables users to:Implementation Details
Core Changes
1. Model Kind Definition (
sqlmesh/core/model/kind.py)AUDIT_ONLYtoModelKindNameenumAuditOnlyKindclass with configuration:blocking(default:True): Whether failures stop the pipelinemax_failing_rows(default:10): Number of sample rows in error messagesis_symbolic=True(no materialization)2. Execution Strategy (
sqlmesh/core/snapshot/evaluator.py)AuditOnlyStrategyextendingSymbolicStrategyAuditErrorwith sample data if validation fails3. Parser Support (
sqlmesh/core/dialect.py)AUDIT_ONLYto list of model kinds that accept properties4. Snapshot Definition (
sqlmesh/core/snapshot/definition.py)evaluatableproperty to include audit-only modelsTesting
Unit Tests (
tests/core/test_model.py)Integration Tests (
tests/core/test_integration.py)Documentation
User Documentation Updates
docs/concepts/audits.md: Added comprehensive AUDIT_ONLY section under Advanced Usagedocs/concepts/models/model_kinds.md: Added detailed AUDIT_ONLY section with examplesdocs/reference/model_configuration.md: Added AUDIT_ONLY configuration referenceExample Models (
examples/sushi/models/)Added 3 demonstration models (all non-blocking for demo purposes):
audit_order_integrity.sql: Validates referential integrityaudit_waiter_revenue_anomalies.sql: Detects revenue anomaliesaudit_duplicate_orders.sql: Identifies duplicate ordersUsage Example
MODEL ( name data_quality.order_validation, kind AUDIT_ONLY ( blocking TRUE, max_failing_rows 20 ), depends_on [orders, customers], cron '@daily' ); -- Query returns 0 rows for success SELECT o.order_id, o.customer_id, 'Missing customer record' as issue FROM orders o LEFT JOIN customers c ON o.customer_id = c.customer_id WHERE c.customer_id IS NULL;Key Differences from Traditional Audits
audits/directorymodels/directoryMigration Path
Testing Instructions
Run unit tests:
Run integration tests:
Try the sushi examples:
Create a test AUDIT_ONLY model:
Related Issues
Addresses the need for multi-table validation without materialization.
Notes for Reviewers
Future Enhancements (Not in this PR)