Skip to content

Conversation

@tconley1428
Copy link
Contributor

Summary

This PR introduces a new OpenTelemetry v2 integration for the Temporal Python SDK with significant enhancements over the existing OpenTelemetry support. The integration provides deterministic tracing, comprehensive test coverage, and improved maintainability.

Key Features Added:

  • Deterministic ID generation: Uses Temporal's workflow-safe random generator for consistent span/trace IDs across replays
  • Comprehensive operation support: Full tracing for workflows, activities, local activities, child workflows, timers, signals, updates, queries, and Nexus operations
  • Context propagation: Proper trace context propagation across all Temporal operation boundaries
  • Plugin architecture: Clean plugin-based integration using Temporal's SimplePlugin base class
  • Workflow-safe span creation: temporalio.contrib.opentelemetryv2.workflow.start_as_current_span() for user workflow tracing

Architecture Improvements:

  • Modular design: Restructured as proper package with separate modules for interceptor, plugin, processor, and workflow utilities
  • Replay-aware processing: TemporalSpanProcessor skips span export during workflow replay to prevent duplicate telemetry
  • Read-only mode detection: Added workflow.unsafe.is_read_only() to handle queries and update validators safely
  • Enhanced interceptor: Comprehensive TracingInterceptor covering all client and worker operations

Testing & Quality:

  • Comprehensive test suite: Added test_opentelemetryv2_comprehensive_tracing covering all workflow operations with proper span hierarchy validation
  • Test isolation: Pytest fixtures ensure OpenTelemetry state doesn't leak between tests
  • Span hierarchy validation: Tests use dump_spans() for maintainable hierarchy validation similar to existing OpenTelemetry tests
  • Complete linting compliance: All code passes ruff, pyright, mypy, and pydocstyle checks with comprehensive docstrings

Test plan

  • All existing tests continue to pass
  • New comprehensive test validates tracing across all Temporal operations
  • Span hierarchy validation ensures proper parent-child relationships
  • Test isolation prevents OpenTelemetry state conflicts between test runs
  • Integration works with both basic tracing (add_temporal_spans=False) and comprehensive tracing (add_temporal_spans=True)
  • Deterministic span IDs maintained across workflow replays
  • No span export during workflow replay to prevent duplicate telemetry
  • Read-only operations (queries, update validators) handle tracing safely
  • All linting and type checking passes

🤖 Generated with Claude Code

tconley1428 and others added 2 commits January 20, 2026 12:38
This commit adds a new OpenTelemetry interceptor (opentelemetryv2) with enhanced
capabilities for Temporal workflow integration:

Features:
- Deterministic ID generation for spans/traces in workflows using TemporalIdGenerator
- Context propagation across workflow and activity boundaries
- Support for workflow-level span creation via workflow.start_as_current_span
- Enhanced interceptor with context propagation to activities and nexus operations
- Compatible with existing opentelemetry module while providing additional functionality

Implementation:
- New TemporalIdGenerator uses workflow.random() for deterministic IDs in workflows
- TracingInterceptor handles client, worker, activity, workflow, and nexus operations
- Workflow-safe span creation context manager in workflow module
- Comprehensive test coverage for trace propagation scenarios

This is separate from the OpenAI agents OTEL integration and provides
general-purpose OpenTelemetry improvements for Temporal workflows.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
…inting fixes

This commit significantly improves the OpenTelemetry v2 integration for the Temporal SDK with the following enhancements:

## Core Features Added:
- **Comprehensive test coverage**: Added `test_opentelemetryv2_comprehensive_tracing` covering all workflow operations including activities, local activities, child workflows, timers, signals, updates, queries, and Nexus operations
- **Read-only mode detection**: Implemented `workflow.unsafe.is_read_only()` to prevent span ID generation errors during queries and update validators
- **Test isolation**: Added pytest fixture to reset OpenTelemetry tracer provider state between test runs
- **Span hierarchy validation**: Refactored tests to use `dump_spans()` hierarchy validation for better maintainability

## Linting and Documentation:
- Fixed all import path issues for OpenTelemetry ID generators
- Added comprehensive docstrings for all public classes and methods
- Fixed type annotations and null handling throughout the codebase
- Resolved Nexus headers access issues with proper type protocols
- Achieved complete pydocstyle compliance

## Technical Improvements:
- Enhanced `TemporalSpanProcessor` with proper replay handling
- Improved `TemporalIdGenerator` with deterministic workflow-safe random generation
- Updated span parenting validation to ensure proper trace relationships
- Added max_cached_workflows=0 to all test workers for deterministic behavior

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@tconley1428 tconley1428 requested a review from a team as a code owner February 3, 2026 01:39
Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of minor things now...

"""Implementation of
:py:meth:`temporalio.worker.Interceptor.workflow_interceptor_class`.
"""
if not isinstance(self._provider, ReplaySafeTracerProvider):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So since we accept a provider instead of use the global, but expect users in workflows to use the global, should we be checking get_tracer_provider() instead? I'd be ok if we wanted to remove allowing custom provider at first even.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering it, they go through the plugin generally which doesn't accept one anyway.

"""Implementation of
:py:meth:`temporalio.worker.Interceptor.workflow_interceptor_class`.
"""
if not isinstance(self._provider, ReplaySafeTracerProvider):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are now asking users to use global provider, we should probably check get_tracer_provider instead of whatever provider is on the interceptor. It actually makes me wonder if we should not allow provider to be provided if we only use it half the time.


client = config.get("client")

# Don't add new worker interceptors which are already registered in the client.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused here. We should add all worker_interceptors that are set, right? That's the same as providing interceptors to worker constructor. We don't expect to go there and remove ones that are also in the client.

If the concern is that the plugin is on a worker but not on a client, then the plugin author can error if it wasn't applied to client or we can offer some approach for trying to be smart here. But having the plugin author provide the same interceptor instance to both is the opposite of what we want with interceptor propagation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I could do the opposite, which would be assume any client interceptors provided in the plugin should also be registered in the worker if the plugin wasn't given to the client.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants