-
Notifications
You must be signed in to change notification settings - Fork 3.7k
feat: Add nullable vector support for proxy and querynode #46305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: marcelo-cjl The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
[ci-v2-notice]
To rerun ci-v2 checks, comment with:
If you have any questions or requests, please contact @zhikunyao. |
|
@marcelo-cjl Please associate the related issue to the body of your Pull Request. (eg. "issue: #") |
Supported types: FloatVector, BinaryVector, Float16Vector, BFloat16Vector, Int8Vector and SparseFloatVector. C++ layer changes: - Add FieldDataVectorImpl class with LogicalToPhysicalMapping for sparse storage - LogicalToPhysicalMapping uses adaptive strategy: map when valid_ratio < 10%, otherwise uses vector for O(1) lookup - Override FillFieldData() in FieldDataVectorImpl to handle nullable vectors - Move FieldDataSparseVectorImpl from FieldDataInterface.h to FieldData.h - Remove "vector not support null" restriction in Util.cpp Go layer changes: - Add LogicalToPhysicalMapping struct with validCount and l2pMap - Add ValidData, Nullable, L2PMapping fields to all vector structs - Update GetRow() to use L2PMapping.GetPhysicalOffset() for physical index - Update AppendRow()/AppendValidDataRows() to build mapping incrementally - Update GetMemorySize() to include ValidData, Nullable and L2PMapping Storage strategy: nullable vectors use sparse storage where Data array only contains non-null vectors, ValidData bitmap tracks null positions, L2PMapping translates logical offset to physical offset. Signed-off-by: marcelo-cjl <[email protected]>
This commit adapts the storage layer (binlog read/write) to correctly handle nullable vectors with sparse storage, where Data array only contains valid vectors, ValidData bitmap tracks null positions, and L2PMapping translates logical offset to physical offset. C++ storage layer changes: - Update PayloadReader to extract dim from Arrow schema metadata for nullable vectors and relax IsFull() check for nullable vector fields - Update PayloadWriter to use BinaryBuilder for nullable vectors with dim stored in Arrow metadata - Update Util add_vector_payload() to handle nullable vectors via BinaryBuilder - Add FieldDataSparseVectorImpl using declaration to access base FillFieldData - Update DataCodecTest and DiskFileManagerTest with nullable vector test cases Go storage layer changes: - Update PayloadReader to handle nullable vectors reading with sparse data - Update PayloadWriter to serialize nullable vectors with validity bitmap - Update serde.go serialization to iterate by logical count (len(ValidData)) and use GetRow(j) with logical index for nullable vectors - Update data_codec.go binlog deserialization for nullable vector fields - Update insert_data.go RowNum() to return logical count (len(ValidData)) for nullable vectors - Update utils.go to remove obsolete NumRows field references - Add comprehensive test coverage for nullable vector payload operations Storage format: nullable vectors stored as Arrow Binary with dim in metadata, validity bitmap indicates null positions, data contains only valid vectors. Signed-off-by: marcelo-cjl <[email protected]>
This commit adds nullable vector support for both growing and sealed segments, including search, retrieval, and index building operations. It introduces OffsetMapping for efficient logical-to-physical offset translation. Core infrastructure changes: - Add OffsetMapping class with Build()/BuildIncremental() for bitmap-based offset mapping, supporting both dense lookup and efficient valid count tracking - Add TransformBitset() and TransformOffset() utilities for converting between logical and physical coordinate spaces during search operations - Update SearchResult to include has_raw_data_ flag and physical offset handling Growing segment changes: - Update ConcurrentVector to support nullable vectors with OffsetMapping - Add get_offset_mapping() and valid data tracking in AckSeal() - Update FieldIndexing to handle nullable vectors with physical offset storage - Modify SearchOnGrowing to transform bitset/offsets for nullable vector search - Add nullable vector tests in SegmentGrowingTest covering search and retrieval Sealed segment changes: - Update ChunkedColumn/ChunkedColumnGroup to support OffsetMapping - Modify ChunkedSegmentSealedImpl to handle nullable vector field data loading - Update SearchOnSealedIndex/SearchOnSealedColumn with early return for 100% null case to prevent crash when searching empty index - Add FilterVectorValidOffsets() for output field retrieval with nullable vectors - Expand ChunkedSegmentSealedBinlogIndexTest with comprehensive nullable test cases Index building changes: - Update VectorMemIndex/VectorDiskIndex to store and serialize OffsetMapping - Add BuildValidData()/UpdateValidData() for nullable vector index building - Update VecIndexCreator and index_c API with valid data parameter support - Modify InterimSealedIndexTranslator to handle nullable vector binlog index Search and reduce changes: - Update Reduce.cpp to transform physical offsets back to logical for output - Update StreamReduce with nullable vector offset transformation support - Add early return in SearchOnSealed when offset_mapping has zero valid count Signed-off-by: marcelo-cjl <[email protected]>
This commit extends nullable vector support to the proxy layer, querynode, and adds comprehensive validation, search reduce, and field data handling for nullable vectors with sparse storage. Proxy layer changes: - Update validate_util.go checkAligned() with getExpectedVectorRows() helper to validate nullable vector field alignment using valid data count - Update checkFloatVectorFieldData/checkSparseFloatVectorFieldData for nullable vector validation with proper row count expectations - Add FieldDataIdxComputer in typeutil/schema.go for logical-to-physical index translation during search reduce operations - Update search_reduce_util.go reduceSearchResultData to use idxComputers for correct field data indexing with nullable vectors - Update task.go, task_query.go, task_upsert.go for nullable vector handling - Update msg_pack.go with nullable vector field data processing QueryNode layer changes: - Update segments/result.go for nullable vector result handling - Update segments/search_reduce.go with nullable vector offset translation Storage and index changes: - Update data_codec.go and utils.go for nullable vector serialization - Update indexcgowrapper/dataset.go and index.go for nullable vector indexing Utility changes: - Add FieldDataIdxComputer struct with Compute() method for efficient logical-to-physical index mapping across multiple field data - Update EstimateEntitySize() and AppendFieldData() with fieldIdxs parameter - Update funcutil.go with nullable vector support functions Signed-off-by: marcelo-cjl <[email protected]>
3fc9569 to
43e8106
Compare
|
@marcelo-cjl go-sdk check failed, comment |
This commit extends nullable vector support to the proxy layer, querynode,
and adds comprehensive validation, search reduce, and field data handling
for nullable vectors with sparse storage.