-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[Query Engine] Improve DistinctCountSmartHLL for dictionary-encoded columns #17411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #17411 +/- ##
============================================
+ Coverage 63.26% 63.28% +0.02%
Complexity 1474 1474
============================================
Files 3152 3162 +10
Lines 187881 188725 +844
Branches 28765 28881 +116
============================================
+ Hits 118855 119433 +578
- Misses 59810 60039 +229
- Partials 9216 9253 +37
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
547f4b8 to
713700b
Compare
713700b to
bd97325
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces an adaptive cardinality-aware execution path for DISTINCT_COUNT_SMART_HLL aggregation on dictionary-encoded columns. For high-cardinality scenarios (>100K distinct values), bypassing RoaringBitmap and directly updating HLL reduces server-side CPU time by 4x-10x. A new dictThreshold parameter (default: 100K) controls when to convert from RoaringBitmap to HLL during aggregation instead of waiting until finalization.
Key changes:
- Added adaptive conversion logic that switches from RoaringBitmap to HLL when cardinality exceeds threshold
- Introduced
dictThresholdparameter with 100K default based on benchmark results - Implemented early conversion checks during aggregation for both group-by and non-group-by queries
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
DistinctCountSmartHLLAggregationFunction.java |
Added _dictIdCardinalityThreshold field, parameter parsing, and adaptive conversion logic for non-group-by aggregation |
BaseDistinctCountSmartSketchAggregationFunction.java |
Implemented group-by adaptive conversion with modified group tracking and cardinality checks |
DistinctCountSmartHLLAggregationFunctionTest.java |
Added unit tests for parameter parsing, HLL operations, and adaptive conversion behavior |
BenchmarkDistinctCountHLLThreshold.java |
Added JMH benchmark to measure performance across different cardinalities and record counts |
pinot-perf/pom.xml |
Registered new benchmark class in Maven build configuration |
...e/pinot/core/query/aggregation/function/BaseDistinctCountSmartSketchAggregationFunction.java
Outdated
Show resolved
Hide resolved
...e/pinot/core/query/aggregation/function/BaseDistinctCountSmartSketchAggregationFunction.java
Outdated
Show resolved
Hide resolved
| * - dictThreshold: Threshold for dictionary-encoded columns to trigger early conversion from RoaringBitmap to HLL | ||
| * during aggregation. 100_000 by default. Set to Integer.MAX_VALUE to disable and convert only | ||
| * at finalization. |
Copilot
AI
Jan 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation should clarify that non-positive values (≤0) are also treated as disabled and converted to Integer.MAX_VALUE, consistent with the implementation in the Parameters class where values ≤0 are set to Integer.MAX_VALUE.
| for (int i = 0; i < 1000; i++) { | ||
| hll.offer(i); | ||
| } | ||
| Long cardinality = Long.valueOf(function.extractFinalResult(hll)); |
Copilot
AI
Jan 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace Long.valueOf() with direct cast (Long) since extractFinalResult() already returns a Long object. Using Long.valueOf() on an object is unnecessary and could cause a NullPointerException if the result is null.
| Long cardinality = Long.valueOf(function.extractFinalResult(hll)); | |
| Long cardinality = (Long) function.extractFinalResult(hll); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function.extractFinalResult(hll) returns Int actually hence Long.valueOf() is used
xiangfu0
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm otherwise
| @Override | ||
| public void aggregate(int length, AggregationResultHolder aggregationResultHolder, | ||
| Map<ExpressionContext, BlockValSet> blockValSetMap) { | ||
| Map<ExpressionContext, BlockValSet> blockValSetMap) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert this
|
also please add a release note for this enhancement |
|
Also please fix the tests. |
Sure will add one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm!
One question, how was the default of 100_000 chosen as the threshold? Is that a desirable default for all queries going forward? Are there any scenarios where using the Integer.MAX_VALUE may be preferred? It'll be good to call this out in documentation as a guide for users
Also, can you take care of updating OSS documentation to reflect these changes? Thanks!
Thanks for the review @somandal On the 100,000 threshold? When would Integer.MAX_VALUE be preferred? |
Updating docs based on this optimization : apache/pinot#17411
|
Docs PR : pinot-contrib/pinot-docs#459 |
Summary
ISSUE=#17336
For dictionary-encoded columns, DISTINCT_COUNT_SMART_HLL currently uses a RoaringBitmap to deduplicate dictionary IDs before feeding values into HLL. While efficient for low cardinality, this approach becomes CPU-intensive for high cardinality (hundreds of thousands to millions of distinct values), where RoaringBitmap insertions dominate query execution time and negate the benefits of HLL.
Proposal
Introduce a cardinality-aware execution path for DISTINCT_COUNT_HLL:
Observed improvements
Testing Done
Added JMH benchmark covering:
This JMH benchmark isolates server-side aggregation cost for the DistinctCountHLLAggregationFunction under controlled parameters: Each variation was run for 10 minutes
recordCount: {100K, 500K, 1M, 5M, 10M, 25M}
cardinalityRatioPercent: {1, 10, 30, 50, 80, 100} → Creates a record with configured cardinality
useRoaringBitMap/HLL -> Controls on to run the test with useRoaringBitMap or HLL
DictIds are pre-generated so benchmark timing includes only aggregation, not data generation.
Sample plots :

Flame graph after optimization : Aggregate doesn't dominate CPU

Benchmark Results (Average Latency, ms/op)
Record Count = 100,000
Record Count = 500,000
Record Count = 1,000,000
Record Count = 5,000,000
Record Count = 10,000,000
Record Count = 25,000,000
Recommendation:
Based on the micro-benchmark results across record counts and cardinalities, 100K distinct values is a good default threshold to start with for switching away from the RoaringBitmap path. At this scale, RoaringBitmap remains efficient for low-cardinality cases, while higher cardinalities already show clear benefits from using direct HLL updates. This threshold provides a safe balance between preserving deduplication benefits for low cardinality and avoiding excessive bitmap maintenance cost for high-cardinality workloads