Skip to content

Operational Visibility for Message Queues #65

@coolguy1771

Description

@coolguy1771

Proposal

The current message handling implementation is effectively a "black box", making it difficult to monitor or debug the state of the scaleset. We lack the ability to inspect the queue without consuming messages, and there is no visibility into the backlog depth, which is critical for monitoring and alerting.

Proposed API Additions

I suggest extending the MessageSessionClient with methods for non-destructive inspection and queue telemetry:

// PeekMessage allows inspecting the next message in the queue without marking it as delivered.
func (c *MessageSessionClient) PeekMessage(ctx context.Context, lastMessageID int) (*RunnerScaleSetMessage, error)

// GetQueueDepth returns the current number of pending messages in the scaleset.
func (c *MessageSessionClient) GetQueueDepth(ctx context.Context) (int, error)

Key Use Cases

  • Production Monitoring: GetQueueDepth is essential for exporting metrics to systems like Prometheus or Datadog to alert on scaling lags or stuck queues.
  • Debugging: PeekMessage allows engineers to inspect problematic messages that might be causing processing failures without permanently removing them from the queue.
  • Operational Control: These additions lay the groundwork for more advanced features like Dead Letter Queue (DLQ) handling and message priority management.

Benefits

  • Observability: Provides concrete data on queue health and backlog.
  • Safety: Enables "dry-run" inspections of the message stream.
  • Testability: Simplifies integration testing by allowing us to verify queue state without side effects.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions