-
Notifications
You must be signed in to change notification settings - Fork 155
Add OpenTelemetry v2 integration with enhanced features and comprehensive testing #1314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit adds a new OpenTelemetry interceptor (opentelemetryv2) with enhanced capabilities for Temporal workflow integration: Features: - Deterministic ID generation for spans/traces in workflows using TemporalIdGenerator - Context propagation across workflow and activity boundaries - Support for workflow-level span creation via workflow.start_as_current_span - Enhanced interceptor with context propagation to activities and nexus operations - Compatible with existing opentelemetry module while providing additional functionality Implementation: - New TemporalIdGenerator uses workflow.random() for deterministic IDs in workflows - TracingInterceptor handles client, worker, activity, workflow, and nexus operations - Workflow-safe span creation context manager in workflow module - Comprehensive test coverage for trace propagation scenarios This is separate from the OpenAI agents OTEL integration and provides general-purpose OpenTelemetry improvements for Temporal workflows. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
…inting fixes This commit significantly improves the OpenTelemetry v2 integration for the Temporal SDK with the following enhancements: ## Core Features Added: - **Comprehensive test coverage**: Added `test_opentelemetryv2_comprehensive_tracing` covering all workflow operations including activities, local activities, child workflows, timers, signals, updates, queries, and Nexus operations - **Read-only mode detection**: Implemented `workflow.unsafe.is_read_only()` to prevent span ID generation errors during queries and update validators - **Test isolation**: Added pytest fixture to reset OpenTelemetry tracer provider state between test runs - **Span hierarchy validation**: Refactored tests to use `dump_spans()` hierarchy validation for better maintainability ## Linting and Documentation: - Fixed all import path issues for OpenTelemetry ID generators - Added comprehensive docstrings for all public classes and methods - Fixed type annotations and null handling throughout the codebase - Resolved Nexus headers access issues with proper type protocols - Achieved complete pydocstyle compliance ## Technical Improvements: - Enhanced `TemporalSpanProcessor` with proper replay handling - Improved `TemporalIdGenerator` with deterministic workflow-safe random generation - Updated span parenting validation to ensure proper trace relationships - Added max_cached_workflows=0 to all test workers for deterministic behavior 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
…rovider in a workflow will use a replay safe version. Care should still be taken if creating one from scratch inside a workflow
cretz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of minor things now...
| """Implementation of | ||
| :py:meth:`temporalio.worker.Interceptor.workflow_interceptor_class`. | ||
| """ | ||
| if not isinstance(self._provider, ReplaySafeTracerProvider): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So since we accept a provider instead of use the global, but expect users in workflows to use the global, should we be checking get_tracer_provider() instead? I'd be ok if we wanted to remove allowing custom provider at first even.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was considering it, they go through the plugin generally which doesn't accept one anyway.
| """Implementation of | ||
| :py:meth:`temporalio.worker.Interceptor.workflow_interceptor_class`. | ||
| """ | ||
| if not isinstance(self._provider, ReplaySafeTracerProvider): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are now asking users to use global provider, we should probably check get_tracer_provider instead of whatever provider is on the interceptor. It actually makes me wonder if we should not allow provider to be provided if we only use it half the time.
temporalio/plugin.py
Outdated
|
|
||
| client = config.get("client") | ||
|
|
||
| # Don't add new worker interceptors which are already registered in the client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused here. We should add all worker_interceptors that are set, right? That's the same as providing interceptors to worker constructor. We don't expect to go there and remove ones that are also in the client.
If the concern is that the plugin is on a worker but not on a client, then the plugin author can error if it wasn't applied to client or we can offer some approach for trying to be smart here. But having the plugin author provide the same interceptor instance to both is the opposite of what we want with interceptor propagation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose I could do the opposite, which would be assume any client interceptors provided in the plugin should also be registered in the worker if the plugin wasn't given to the client.
Summary
This PR introduces a new OpenTelemetry v2 integration for the Temporal Python SDK with significant enhancements over the existing OpenTelemetry support. The integration provides deterministic tracing, comprehensive test coverage, and improved maintainability.
Key Features Added:
SimplePluginbase classtemporalio.contrib.opentelemetryv2.workflow.start_as_current_span()for user workflow tracingArchitecture Improvements:
TemporalSpanProcessorskips span export during workflow replay to prevent duplicate telemetryworkflow.unsafe.is_read_only()to handle queries and update validators safelyTracingInterceptorcovering all client and worker operationsTesting & Quality:
test_opentelemetryv2_comprehensive_tracingcovering all workflow operations with proper span hierarchy validationdump_spans()for maintainable hierarchy validation similar to existing OpenTelemetry testsTest plan
add_temporal_spans=False) and comprehensive tracing (add_temporal_spans=True)🤖 Generated with Claude Code