Skip to content

Performance Optimizations for High-Scale Deployments #67

@coolguy1771

Description

@coolguy1771

Proposal

The current implementation relies on sequential, single-resource API calls which can become a bottleneck during massive scale-up events (e.g., spinning up hundreds of runners simultaneously). To reduce RTT overhead and API pressure, I propose adding support for request batching and an optional caching layer for static metadata.

1. Request Batching (JIT Configs)

Generating JIT configurations one-by-one is inefficient during bursts. Adding a batching method allows the listener to request multiple configurations in a single round-trip.

// BatchGenerateJitConfigs reduces API overhead by fetching multiple runner configs in one call.
func (c *Client) BatchGenerateJitConfigs(
    ctx context.Context, 
    count int, 
    settings *RunnerScaleSetJitRunnerSetting, 
    scaleSetID int,
) ([]*RunnerScaleSetJitRunnerConfig, error)

2. Response Caching

Many resources—such as runner info and scaleset settings—change infrequently. Implementing a TTL-based cache prevents redundant network calls and improves responsiveness.

type CacheConfig struct {
    TTL              time.Duration
    CacheRunnerInfo  bool
    CacheStatistics  bool // Optional: for non-real-time telemetry
}

func WithCache(config CacheConfig) HTTPOption {
    return func(c *httpClientOption) {
        c.cacheConfig = config
    }
}

Technical Benefits

  • Reduced Latency: Batching significantly cuts down the total time spent in the "Scaling Up" state by minimizing the TCP/TLS handshake and request overhead.
  • API Quota Preservation: Caching static metadata reduces the total request volume, which is critical for staying within GitHub's primary rate limits.
  • Connection Pre-warming: By utilizing a cache and batching, we can maintain a "warm" pool of connections that are used more efficiently, rather than spiking connection counts during bursts.

Implementation Strategy

  • Batching: Update the internal transport to handle slice-based payloads for JIT endpoints.
  • Caching: Use an in-memory LRU cache or a simple map with mutex protection, ensuring that CacheRunnerInfo honors the configured TTL to avoid stale runner states.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions