Skip to content

Resilience and Error Recovery #66

@coolguy1771

Description

@coolguy1771

Proposal

The current implementation is susceptible to permanent failure from transient errors. If a session expires or an endpoint enters a failing state, the listener can exit or hang without attempting a recovery. I propose adding a resilience layer to handle session recreation and health monitoring automatically.

Proposed Solution

We should introduce a ResilienceConfig to govern how the client handles degraded states and expose health/recovery methods for the listener loop to utilize.

type ResilienceConfig struct {
    AutoRecoverSession     bool
    CircuitBreakerSettings CircuitBreakerConfig
    HealthCheckInterval    time.Duration
}

// IsHealthy performs a lightweight check to ensure the client can still communicate with the API.
func (c *Client) IsHealthy(ctx context.Context) error

// Recover attempts to re-establish the session and refresh tokens without a full process restart.
func (c *Client) Recover(ctx context.Context) error

Technical Improvements

  • Automatic Session Recovery: If AutoRecoverSession is enabled, the client will attempt to negotiate a new session ID if the current one is invalidated by the server, preventing unnecessary listener crashes.
  • Circuit Breaking: Implementing a circuit breaker for the scaleset API prevents the client from hammering GitHub's infrastructure during an outage, allowing it to back off and "probe" for health gracefully.
  • Health Probing: The IsHealthy method allows the listener (or an external orchestrator like Kubernetes) to verify the connection's integrity, enabling proactive restarts if the client enters an unrecoverable state.

Benefits

  • Self-Healing: Reduces manual intervention by recovering from expired sessions or transient network partitions.
  • Stability: Prevents cascading failures during upstream API degradations.
  • Operational Visibility: Provides a clear hook for liveness and readiness probes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions