[Query]: Adds ability to choose global vs local/focused statistics for FullTextScore#48431
[Query]: Adds ability to choose global vs local/focused statistics for FullTextScore#48431aayush3011 wants to merge 6 commits intoAzure:mainfrom
Conversation
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This PR adds a CosmosFullTextScoreScope option to CosmosQueryRequestOptions that lets developers choose between GLOBAL (default, all partitions) and LOCAL (scoped to target partitions) BM25 statistics computation for hybrid search queries. It also fixes two bugs: a NPE in query plan caching for hybrid queries and a ConcurrentModificationException race condition in component query execution.
Changes:
- New
CosmosFullTextScoreScopeenum withGLOBAL/LOCALvalues, wired throughCosmosQueryRequestOptions - Bug fix: null guard for
queryInfointryCacheQueryPlan, and synchronized block ingetComponentQueryResultsto prevent concurrent modification - Tests updated with new partition key structure (
/pk) and new test methods for LOCAL/GLOBAL scope validation
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
CosmosFullTextScoreScope.java |
New enum defining GLOBAL and LOCAL scopes |
CosmosQueryRequestOptions.java |
Public getter/setter for fullTextScoreScope |
CosmosQueryRequestOptionsImpl.java |
Implementation field, copy constructor, getter/setter |
HybridSearchDocumentQueryExecutionContext.java |
Uses scope to select statistics target ranges; synchronized fix for race condition |
DocumentQueryExecutionContextFactory.java |
Null guard for queryInfo in tryCacheQueryPlan |
CHANGELOG.md |
Documents new feature and bug fixes |
HybridSearchQueryTest.java |
Updated partition key, new tests for LOCAL/GLOBAL scope, updated expected results |
You can also share your feedback on Copilot code review. Take the survey.
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
...in/java/com/azure/cosmos/implementation/query/HybridSearchDocumentQueryExecutionContext.java
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/models/CosmosFullTextScoreScope.java
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/models/CosmosQueryRequestOptions.java
Outdated
Show resolved
Hide resolved
Deep Review SummaryPR Intent: Adds Overall Assessment: The feature is well-designed and matches the .NET SDK's equivalent implementation. The bug fixes are correct. Main concerns are around the null-default inconsistency between public/Impl API layers, missing validation for the LOCAL-without-partition-key edge case, and test coverage gaps. Existing Comments: Only a Copilot summary review (0 inline comments). No overlap with findings below.
Top findings:
|
Description
Why?
Cosmos DB's implementation of FullTextScore computes BM25 statistics (term frequency, inverse document frequency, and document length) across all documents in the container, including all physical and logical partitions.
While this provides a valid and comprehensive representation of statistics for the entire dataset, it introduces challenges for several common use cases:
What?
This PR extends the flexibility of BM25 scoring so that developers can choose between:
How?
A new CosmosFullTextScoreScope enum and setFullTextScoreScope() method are added to CosmosQueryRequestOptions:
When CosmosFullTextScoreScope.LOCAL is set, the hybrid search aggregator uses only the query's target partition ranges (instead of all ranges) when executing the global statistics query. This is a client-side only change — no new HTTP headers are sent to the backend.
Bug Fixes (discovered during development)
When executing hybrid search queries with a partition key filter, getQueryInfo() returned null (hybrid search queries use hybridSearchQueryInfo instead), causing a NPE in query plan caching. Added a null guard.
The documentProducers field in ParallelDocumentQueryExecutionContextBase is shared mutable state that was reused across multiple logical operations (global statistics query, component queries). When component queries ran via flatMap (concurrent), or when the global statistics Mono was re-subscribed after component queries had reassigned the field, it caused ConcurrentModificationException and IllegalArgumentException: retries must not be negative. Fixed by introducing a createProducers() helper that wraps super.initialize(), captures the produced list into a local variable, and clears the shared field, ensuring each logical operation (global stats, each component query) gets its own isolated producer list. Additionally changed flatMap to concatMap for component queries to serialize initialization of the parent class's shared metrics state.
Testing
- GLOBAL scope (default) cross-partition returns all matching results
- Explicit GLOBAL matches default behavior
- LOCAL scope + pk="2" returns only pk="2" results
- LOCAL scope + pk="1" returns only pk="1" results
- RRF queries work with both LOCAL and GLOBAL scopes
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines