Draft
Conversation
Replaced individual createItem calls with executeBulkOperations for document pre-population in AsyncBenchmark, AsyncCtlWorkload, AsyncEncryptionBenchmark, and ReadMyWriteWorkflow. Also migrated ReadMyWriteWorkflow from internal Document/AsyncDocumentClient APIs to the public PojoizedJson/CosmosAsyncContainer v4 APIs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace pre-materialized List<CosmosItemOperation> with Flux.range().map() to lazily emit operations on demand. This avoids holding all N operations in memory simultaneously - the bulk executor consumes them as they are generated, allowing GC to reclaim processed operation wrappers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
If a bulk operation fails, fall back to individual createItem calls with retry logic (max 5 retries for transient errors: 410, 408, 429, 500, 503) and 409 conflict suppression. The retry helper is centralized in BenchmarkHelper.retryFailedBulkOperations(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. HttpHeaders.set()/getHeader(): Add toLowerCaseIfNeeded() fast-path that skips String.toLowerCase() allocation when header name is already all-lowercase (common for x-ms-* and standard Cosmos headers). 2. RxGatewayStoreModel.getUri(): Build URI via StringBuilder instead of the 7-arg URI constructor which re-validates and re-encodes all components. Since components are already well-formed, the single-arg URI(String) constructor is sufficient and avoids URI$Parser overhead. 3. RxDocumentServiceRequest: Cache getCollectionName() result to avoid repeated O(n) slash-scanning across 14+ call sites per request lifecycle. Cache is invalidated when resourceAddress changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The char-by-char scan added method call + branch overhead that offset the toLowerCase savings. Profiling showed ConcurrentHashMap.get(), HashMap.putVal(), and the scan loop itself caused ~10% throughput regression. Reverting to original toLowerCase(Locale.ROOT) which the JIT handles as an intrinsic. The URI construction and collection name caching optimizations are retained as they don't have this issue. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkHelper.java
Show resolved
Hide resolved
The JFR profiling showed URI$Parser.parse() consuming ~757 CPU samples per 60s recording, all from RxGatewayStoreModel.getUri(). The root cause was a String->URI->String round-trip: we built a URI string, parsed it into java.net.URI (expensive), then Reactor Netty called .toASCIIString() to convert it back to a String. Changes: - RxGatewayStoreModel.getUri() now returns String directly (no URI parse) - HttpRequest: add uriString field with lazy URI parsing via uri() - HttpRequest: new String-based constructor to skip URI parse entirely - ReactorNettyClient: use request.uriString() instead of uri().toASCIIString() - RxGatewayStoreModel: use uriString() for diagnostics/error paths - URI is only parsed lazily on error paths that require a URI object Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add http2Enabled and http2MaxConcurrentStreams config options to TenantWorkloadConfig. When http2Enabled=true, configures Http2ConnectionConfig on GatewayConnectionConfig for AsyncBenchmark, AsyncCtlWorkload, and AsyncEncryptionBenchmark. Usage in workload JSON config: "http2Enabled": true, "http2MaxConcurrentStreams": 30 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…aults Add missing cases in applyField switch statement so these fields are properly inherited from tenantDefaults, not only from individual tenant entries. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Ensures every @JsonProperty field in TenantWorkloadConfig has a corresponding case in the applyField() switch statement. This prevents future fields from silently failing to inherit from tenantDefaults, which was the root cause of the http2Enabled bug. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previously, every Gateway response copied ALL Netty response headers through a 3-step chain: 1. Netty headers → HttpHeaders (toLowerCase + new HttpHeader per entry) 2. HttpHeaders.toLowerCaseMap() → new HashMap<String,String> 3. StoreResponse constructor → String[] arrays Now the flow is: 1. Netty headers → Map<String,String> directly (single toLowerCase pass) 2. StoreResponse constructor → String[] arrays Changes: - HttpResponse: add headerMap() returning Map<String,String> directly - ReactorNettyHttpResponse: override headerMap() to build lowercase map from Netty headers without intermediate HttpHeaders object - HttpTransportSerializer: unwrapToStoreResponse takes Map<String,String> instead of HttpHeaders - RxGatewayStoreModel: use httpResponse.headerMap() instead of headers() - ThinClientStoreModel: pass response.getHeaders().asMap() directly instead of wrapping in new HttpHeaders() This eliminates per-response: ~20 HttpHeader object allocations, ~20 extra toLowerCase calls, and one intermediate HashMap. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
StoreResponse now stores the response headers Map<String,String> directly instead of converting to parallel String[] arrays. This eliminates a redundant copy since RxDocumentServiceResponse and StoreClient were immediately converting back to Map. Before: Map → String[] + String[] → Map (3 allocations, 2 iterations) After: Map shared directly (0 extra allocations, 0 extra iterations) Also upgrades StoreResponse.getHeaderValue() from O(n) linear scan to O(1) HashMap.get() with case-insensitive fallback. Null header values from Netty are skipped (matching old HttpHeaders.set behavior which removed null entries). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The new toArray(new String[0]) calls in getResponseHeaderNames() and getResponseHeaderValues() created garbage arrays on every call. These methods have zero production callers — only test validators used them. Changes: - Mark getResponseHeaderNames/Values as @deprecated - Update StoreResponseValidator to use getResponseHeaders() map directly instead of converting to arrays and doing indexOf lookups Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Revert the headerMap() direct-from-Netty path because the per-header toLowerCase() calls caused a throughput regression vs v4. The JIT optimizes the existing HttpHeaders.set() + toLowerCaseMap() path better. Kept improvements: - StoreResponse stores Map<String,String> directly (no String[] arrays) - RxDocumentServiceResponse shares the Map reference (no extra copy) - StoreClient uses getResponseHeaders() directly (no Map reconstruction) - StoreResponse.getHeaderValue() uses HashMap.get() instead of O(n) scan - unwrapToStoreResponse calls toLowerCaseMap() once, reuses the Map for both validateOrThrow and StoreResponse construction Net effect vs v4: eliminates the Map→String[]→Map round-trip while preserving the JIT-optimized HttpHeaders copy path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Netty's HttpObjectDecoder starts with a 256-byte buffer for header parsing and resizes via ensureCapacityInternal() as headers grow. Cosmos responses have ~2-4KB of headers, triggering multiple resizes. Pre-sizing to 16KB (16384 bytes) avoids the resize overhead at the cost of ~16KB per connection (negligible vs connection pool size). JFR v6 showed AbstractStringBuilder.ensureCapacityInternal at 248 samples (1.6% CPU). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Revert all header copy chain changes (R3/v5/v6/v7) back to the v4 state which had the best throughput. Only addition on top of v4 is initialBufferSize(16384) to pre-size Netty's header parsing buffer and reduce ensureCapacityInternal() resize overhead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Benchmark showed initialBufferSize change also produced regression. Reverting to pure v4 state (URI elimination + collection name cache) which had the best throughput at 2,421 ops/s. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaced individual
createItemcalls withexecuteBulkOperationsfor document pre-population across all benchmark workloads. This leverages the SDK's built-in bulk executor which handles throttling and retries internally, resulting in simpler and more efficient pre-population.Changes
Files modified
createItemloop +Flux.merge()withCosmosBulkOperations.getCreateItemOperation()+executeBulkOperations()createPrePopulatedDocs(), with success/failure tracking viaCosmosBulkOperationResponsecosmosEncryptionAsyncContainer.executeBulkOperations()Document/AsyncDocumentClientAPIs to publicPojoizedJson/CosmosAsyncContainerv4 APIsKey design points
executeBulkOperations()(same pattern as the existingDataLoaderin the linkedin subpackage)ReadMyWriteWorkflowno longer depends on internal SDK types (AsyncDocumentClient,Document,QueryFeedOperationState, etc.)