Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #189 +/- ##
==========================================
+ Coverage 96.48% 96.88% +0.40%
==========================================
Files 97 101 +4
Lines 3552 3978 +426
==========================================
+ Hits 3427 3854 +427
+ Misses 125 124 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
emyller
left a comment
There was a problem hiding this comment.
Looks good overall. Left a few questions around, none a blocker I believe. Also, couple non-action feedback:
- I think context-manager-based design makes it trippy to understand what's going on.
- Both unit and integration tests look very similar in purpose sometimes. Feels duplicated — but I also appreciate both unit and integration test suites are accomplishing their goal here. Still, in practice, are we maybe probing for the same thing twice?
It's essential to handle graceful shutdowns in a manageable way. Without this, we'll lose spans.
Fair observation. The difference is that unit tests configure structlog directly while integration tests go through I inventoried all tests and found three unit tests that were genuinely redundant with integration counterparts. Dropped all three in 7657026. The remaining unit tests cover edge cases the integration tests don't exercise: kebab-case normalisation, dunder vs single underscore distinction, all 5 severity levels (parametrised), module name fallback when no logger name is given, span event emission, and the factory/lifecycle functions. |
Contributes to #182.
Adds OpenTelemetry integration to flagsmith-common, enabling trace and structured log export over OTLP.
Tracing
When
OTEL_EXPORTER_OTLP_ENDPOINTis set,ensure_cli_env()configures:TracerProviderwith OTLP/HTTP span export, W3C TraceContext + Baggage propagationDjangoInstrumentor— creates a root span per HTTP requestPsycopg2Instrumentor— creates child spans for each SQL query withdb.system,db.statement, anddb.nameattributes. SQL commenter is enabled, adding trace context as SQL comments for database-side correlation.RedisInstrumentor— creates child spans for each Redis commandRouteLoggerMiddlewarerenames spans usingget_route_template(e.g.GET /api/v1/projects/{pk}/) and setshttp.routeto the same normalised route. URL paths can be excluded from tracing viaOTEL_TRACING_EXCLUDED_URL_PATHS.Structured logs
A custom structlog processor emits each log event as:
body,event_name,severity_text,severity_number, and attributes (double-underscore keys are converted to dots, e.g.organisation__id→organisation.id). W3C baggage entries from the current OTel context are copied into log attributes. Aflagsmith.eventattribute duplicates the event name for backends that don't surfaceEventName.A companion processor
add_otel_trace_contextinjectstrace_idandspan_idfrom the active span into the structlog event dict for JSON log output.Configuration
OTEL_EXPORTER_OTLP_ENDPOINThttp://collector:4318). If unset, no OTel setup occurs.OTEL_SERVICE_NAMEservice.nameresource attribute.flagsmith-apiOTEL_TRACING_EXCLUDED_URL_PATHSHow to review
Code structure
The entry point is
src/common/core/main.py— theif otel_endpoint:block inensure_cli_env()conditionally imports and wires up OTel. When the env var is unset, no OTel code is imported.All OTel logic lives in
src/common/core/otel.py:setup_tracing()— context manager that sets the globalTracerProvider, configures W3C propagators, and instruments Django, psycopg2, and Redis. Teardown uninstruments in reverse order and shuts down the provider.make_structlog_otel_processor()— factory that returns a structlog processor. The processor maps event dicts to OTLP log records (attribute key conversion, reserved key filtering,EventNameconstruction, baggage propagation) and attaches them as span events on the active span. Theflagsmith.eventattribute is a workaround for SigNoz not surfacing OTel'sEventNamefield.build_tracer_provider()/build_otel_log_provider()— factory functions for the trace and log providers, both using batch processors with OTLP/HTTP export.add_otel_trace_context— structlog processor that injectstrace_id/span_idinto the event dict.src/common/gunicorn/middleware.pyhas one addition:span.set_attribute("http.route", route_template)— overrides the raw regex from Django's instrumentor with the normalised route template.pyproject.tomladdsopentelemetry-instrumentation-psycopg2,opentelemetry-instrumentation-redis, andredisto thecommon-coreextra.Testing
tests/unit/common/core/test_otel.py) — test the structlog processor in isolation usingInMemoryLogExporter, and verify instrumentor setup/teardown via mocks.tests/integration/core/test_otel.py) — test the full stack with real Django test client, real database, andInMemorySpanExporter. The tracing tests assert the complete span output (exact span names, kinds, attributes, parent-child relationships, SQL statements) rather than spot-checking individual fields.The integration tracing tests use a module-scoped fixture because OTel only allows setting the global
TracerProvideronce per process. Per-test isolation is achieved via thespan_exporterfixture which clears the exporter between tests.