Performance Optimizations for High-Scale Deployments

## Proposal

The current implementation relies on sequential, single-resource API calls which can become a bottleneck during massive scale-up events (e.g., spinning up hundreds of runners simultaneously). To reduce RTT overhead and API pressure, I propose adding support for request batching and an optional caching layer for static metadata.

### 1. Request Batching (JIT Configs)
Generating JIT configurations one-by-one is inefficient during bursts. Adding a batching method allows the listener to request multiple configurations in a single round-trip.

```go
// BatchGenerateJitConfigs reduces API overhead by fetching multiple runner configs in one call.
func (c *Client) BatchGenerateJitConfigs(
    ctx context.Context, 
    count int, 
    settings *RunnerScaleSetJitRunnerSetting, 
    scaleSetID int,
) ([]*RunnerScaleSetJitRunnerConfig, error)
```

### 2. Response Caching
Many resources—such as runner info and scaleset settings—change infrequently. Implementing a TTL-based cache prevents redundant network calls and improves responsiveness.

```go
type CacheConfig struct {
    TTL              time.Duration
    CacheRunnerInfo  bool
    CacheStatistics  bool // Optional: for non-real-time telemetry
}

func WithCache(config CacheConfig) HTTPOption {
    return func(c *httpClientOption) {
        c.cacheConfig = config
    }
}
```

### Technical Benefits
*   **Reduced Latency**: Batching significantly cuts down the total time spent in the "Scaling Up" state by minimizing the TCP/TLS handshake and request overhead.
*   **API Quota Preservation**: Caching static metadata reduces the total request volume, which is critical for staying within GitHub's primary rate limits.
*   **Connection Pre-warming**: By utilizing a cache and batching, we can maintain a "warm" pool of connections that are used more efficiently, rather than spiking connection counts during bursts.

### Implementation Strategy
*   **Batching**: Update the internal transport to handle slice-based payloads for JIT endpoints.
*   **Caching**: Use an in-memory LRU cache or a simple map with mutex protection, ensuring that `CacheRunnerInfo` honors the configured TTL to avoid stale runner states.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Optimizations for High-Scale Deployments #67

Proposal

1. Request Batching (JIT Configs)

2. Response Caching

Technical Benefits

Implementation Strategy

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance Optimizations for High-Scale Deployments #67

Description

Proposal

1. Request Batching (JIT Configs)

2. Response Caching

Technical Benefits

Implementation Strategy

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions