Skip to content

Conversation

@anuragrai16
Copy link
Contributor

@anuragrai16 anuragrai16 commented Dec 16, 2025

Motivation

Resolves #17262

Summary

This PR is a follow up of #17264 that added a computation of data-only CRC for segments and uploaded it to ZK. It introduces a new property in ZK metadata for segments useDataCrcForReplace only for realtime committing segments. If this flag is present, the replica servers when downloading and replacing the realtime segments will end up using the data CRC (if also available in ZK) to carry out in place replace.

Details

When This Optimization Applies
Only for realtime segments during CONSUMING → ONLINE transition
Only when the flag is set by the controller during segment commit
Only on replica servers that have built the segment locally
Fallback behavior: If data CRCs also don't match, segment is still downloaded (safe)

When Standard Behavior Still Applies
Offline table segments (flag never set)
Uploaded LLC segments (flag never set)
Manual segment replacement operations
When instance.check.crc.on.segment.load is disabled

Testing

UTs for replaceSegmentIFCRCMismatch() added.

Ran quick start for realtime tables,

Data CRC and useDataCRC flag present for realtime committed segments,
Screenshot 2025-12-30 at 10 51 30 AM

Flag not present for IN_PROGRESS consuming segments,
Screenshot 2025-12-30 at 10 51 35 AM

Data CRC present but Flag NOT present for offline table segments,
Screenshot 2025-12-30 at 10 57 58 AM

@anuragrai16 anuragrai16 changed the title Deterministic data-only CRC (2/2) - use data-only CRC for segment replace flow Deterministic data-only CRC (2/2) - use data-only CRC for some segment replace flow Dec 16, 2025
@codecov-commenter
Copy link

codecov-commenter commented Dec 16, 2025

Codecov Report

❌ Patch coverage is 69.23077% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.26%. Comparing base (d47ea27) to head (ad01cab).
⚠️ Report is 47 commits behind head on master.

Files with missing lines Patch % Lines
.../pinot/core/data/manager/BaseTableDataManager.java 50.00% 2 Missing and 1 partial ⚠️
...not/common/metadata/segment/SegmentZKMetadata.java 83.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17380      +/-   ##
============================================
+ Coverage     63.22%   63.26%   +0.04%     
  Complexity     1475     1475              
============================================
  Files          3147     3161      +14     
  Lines        187575   188440     +865     
  Branches      28712    28837     +125     
============================================
+ Hits         118590   119220     +630     
- Misses        59794    59977     +183     
- Partials       9191     9243      +52     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.24% <69.23%> (+0.05%) ⬆️
java-21 63.21% <69.23%> (+0.03%) ⬆️
temurin 63.26% <69.23%> (+0.04%) ⬆️
unittests 63.26% <69.23%> (+0.04%) ⬆️
unittests1 55.64% <61.53%> (+0.01%) ⬆️
unittests2 33.98% <23.07%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@anuragrai16
Copy link
Contributor Author

@Jackie-Jiang PTAL as well.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces intelligent segment replacement logic by adding a useDataCrcIfAvailable parameter to distinguish between data changes and index-only changes. When enabled, the system avoids unnecessary segment downloads if only indexes have changed while the underlying data remains the same.

Key changes:

  • Modified replaceSegmentIfCrcMismatch method to accept a useDataCrcIfAvailable boolean parameter
  • Updated hasSameCRC method to implement fallback logic: check full CRC first, then optionally check data-only CRC
  • Applied data CRC logic selectively: enabled for CONSUMING→ONLINE transitions, offline segment additions, and server startup; disabled for explicit reload/replace operations

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
BaseTableDataManager.java Added useDataCrcIfAvailable parameter to replaceSegmentIfCrcMismatch and hasSameCRC methods; implemented fallback CRC comparison logic
RealtimeTableDataManager.java Enabled data CRC usage (true) for CONSUMING→ONLINE segment transitions
OfflineTableDataManager.java Enabled data CRC usage (true) for offline segment additions
BaseTableDataManagerTest.java Added test coverage for new CRC comparison scenarios including both CRCs matching, data CRC matching with index changes, and data CRC unavailability
BaseTableDataManagerEnqueueSegmentToReplaceTest.java Updated mock verifications to include the new useDataCrcIfAvailable parameter

Comment on lines 1673 to 1675
return zkMetadata.getDataCrc() >= 0
&& Long.parseLong(localMetadata.getDataCrc()) >= 0
&& zkMetadata.getDataCrc() == Long.parseLong(localMetadata.getDataCrc());
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code may throw NumberFormatException if localMetadata.getDataCrc() returns a non-numeric string. The check should validate that the string is parseable before attempting to parse it, or the method should use a try-catch block to handle potential parsing failures gracefully.

Suggested change
return zkMetadata.getDataCrc() >= 0
&& Long.parseLong(localMetadata.getDataCrc()) >= 0
&& zkMetadata.getDataCrc() == Long.parseLong(localMetadata.getDataCrc());
if (zkMetadata.getDataCrc() < 0) {
return false;
}
String localDataCrc = localMetadata.getDataCrc();
if (!StringUtils.isNumeric(localDataCrc)) {
LOGGER.warn("Local segment metadata data CRC is not numeric: {}", localDataCrc);
return false;
}
long localDataCrcValue;
try {
localDataCrcValue = Long.parseLong(localDataCrc);
} catch (NumberFormatException e) {
// Guard against values that are numeric but outside the range of a long
LOGGER.warn("Failed to parse local segment metadata data CRC: {}", localDataCrc, e);
return false;
}
if (localDataCrcValue < 0) {
return false;
}
return zkMetadata.getDataCrc() == localDataCrcValue;

Copilot uses AI. Check for mistakes.
Comment on lines 1286 to 1291
_logger.warn("Segment: {} has CRC changed from: {} to: {} and data CRC prev value: {} new value: {}",
segmentName,
segmentMetadata.getCrc(),
zkMetadata.getCrc(),
segmentMetadata.getDataCrc(),
zkMetadata.getDataCrc());
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log message uses positional parameters but the order is inconsistent with the description. 'from: {} to: {}' should show old then new, but the code logs local metadata first, then ZK metadata. Consider reordering to 'from: {} (local) to: {} (ZK)' for clarity, or swap the order of getCrc() calls to match the 'from→to' semantic.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to discuss when should we use data CRC instead of index CRC, and what is the side effect.
When using data CRC, index only change happening in the deep store (i.e. new index added) won't be honored. This could prevent users from creating the index from minion and reduce the index creation on server.
Given we want to solve the problem of real-time committed segment potentially having different CRC, I feel a better way to address this is to add a flag in ZK metadata to indicate that we can check only the data CRC. This flag only exists in committed segment, but not segment pushed from other ingestion flow

@Jackie-Jiang
Copy link
Contributor

@anuragrai16
Copy link
Contributor Author

We need to discuss when should we use data CRC instead of index CRC, and what is the side effect. When using data CRC, index only change happening in the deep store (i.e. new index added) won't be honored. This could prevent users from creating the index from minion and reduce the index creation on server. Given we want to solve the problem of real-time committed segment potentially having different CRC, I feel a better way to address this is to add a flag in ZK metadata to indicate that we can check only the data CRC. This flag only exists in committed segment, but not segment pushed from other ingestion flow

Thanks @Jackie-Jiang , we can do that. For my understanding, in the current code, I'm only using Data CRC in doAddOnlineSegment of the class OfflineTableDataManager and RealtimeTableDataManager , which are called in helix transition states during onBecomeOnlineFromConsuming and onBecomeOnlineFromOffline.

For the other flows of reload segment, replace segments (used by minions), data CRC is not used. So, in a way, is the code already handling this point ? Or are there other flows that might be accidentally included in this ?

@Jackie-Jiang
Copy link
Contributor

We need to discuss when should we use data CRC instead of index CRC, and what is the side effect. When using data CRC, index only change happening in the deep store (i.e. new index added) won't be honored. This could prevent users from creating the index from minion and reduce the index creation on server. Given we want to solve the problem of real-time committed segment potentially having different CRC, I feel a better way to address this is to add a flag in ZK metadata to indicate that we can check only the data CRC. This flag only exists in committed segment, but not segment pushed from other ingestion flow

Thanks @Jackie-Jiang , we can do that. For my understanding, in the current code, I'm only using Data CRC in doAddOnlineSegment of the class OfflineTableDataManager and RealtimeTableDataManager , which are called in helix transition states during onBecomeOnlineFromConsuming and onBecomeOnlineFromOffline.

For the other flows of reload segment, replace segments (used by minions), data CRC is not used. So, in a way, is the code already handling this point ? Or are there other flows that might be accidentally included in this ?

These helix state transitions apply to a lot of scenarios, not only for the committed segments. E.g. when server starts, all segments are loaded through these 2 state transitions, which will make server ignore changes applied by minions.

@anuragrai16 anuragrai16 changed the title Deterministic data-only CRC (2/2) - use data-only CRC for some segment replace flow Deterministic data-only CRC (2/2) - use data-only CRC for realtime committing segments Dec 30, 2025
@anuragrai16 anuragrai16 changed the title Deterministic data-only CRC (2/2) - use data-only CRC for realtime committing segments Deterministic data-only CRC (2/2) - use data-only CRC for realtime committing segments during replace Dec 30, 2025
@anuragrai16
Copy link
Contributor Author

We need to discuss when should we use data CRC instead of index CRC, and what is the side effect. When using data CRC, index only change happening in the deep store (i.e. new index added) won't be honored. This could prevent users from creating the index from minion and reduce the index creation on server. Given we want to solve the problem of real-time committed segment potentially having different CRC, I feel a better way to address this is to add a flag in ZK metadata to indicate that we can check only the data CRC. This flag only exists in committed segment, but not segment pushed from other ingestion flow

Thanks @Jackie-Jiang , we can do that. For my understanding, in the current code, I'm only using Data CRC in doAddOnlineSegment of the class OfflineTableDataManager and RealtimeTableDataManager , which are called in helix transition states during onBecomeOnlineFromConsuming and onBecomeOnlineFromOffline.
For the other flows of reload segment, replace segments (used by minions), data CRC is not used. So, in a way, is the code already handling this point ? Or are there other flows that might be accidentally included in this ?

These helix state transitions apply to a lot of scenarios, not only for the committed segments. E.g. when server starts, all segments are loaded through these 2 state transitions, which will make server ignore changes applied by minions.

@Jackie-Jiang Makes sense. Updated the logic to persist an optional ZK flag for realtime committing segments that will be used to tell the replicas to use data CRC for replace. PTAL.

public static final String TOTAL_DOCS = "segment.total.docs";
public static final String CRC = "segment.crc";
public static final String DATA_CRC = "segment.data.crc";
public static final String USE_DATA_CRC_FOR_REPLACE = "segment.useDataCRCForReplace";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we keep the name consistent with other segment configs? segment.replacement.strategy.data.crc can be one suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compute deterministic CRC for segment data to avoid Lucene segment CRC mismatch issue

4 participants