HDDS-14242. Make RDBStoreAbstractIterator set bounds in ReadOptions for prefix based iteration #9559
+398
−348
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Currently for prefix based iteration, the comparison happens on Java side which is inefficient firstly because this incurs an additional buffer copy cost in each and every call of hasNext on iteration i.e. in case of RDBStoreCodecBufferIterator it is going to be a copy b/w 2 direct byte buffer and in case of RDBStoreByteArrayIterator it is going to be a direct to heap byte array allocation which is expensive.
Moreover the prefix check is also inefficient on java side and this would be very efficient on c++ since rocksdb internally does a memcmp which is going to be way more efficient than the O(n) comparator written in java side.
The patch aims to set lowerBound and upperBound in readOptions on rocksItr initialize and offload all the comparisons to rocksdb side.
Here the lowerBound is going to be the prefix itself and upperBound is going to be nextHigherByteArray entry. If the prefix is going to have all bytes start with 0xFF or if it is empty then there can be no upper bound for the iterator and lower bound is going to be just and the iteration has to happen beginning from the prefix and iterate till the end of the table.
BTW we have already been using this kinda model in snapshot code introduced in the PersistentMap impl
#4722
I am planning to completely get rid of PersistentMap impl which I feel is completely redundant as part of this umbrella jira
https://issues.apache.org/jira/browse/HDDS-14031
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-14242
How was this patch tested?
Added unit tests and also depending on pre existing rocksdb test cases in class TestRDBTableStore