Skip to content

Conversation

@marcelo-cjl
Copy link
Contributor

This commit extends nullable vector support to the proxy layer, querynode,
and adds comprehensive validation, search reduce, and field data handling
for nullable vectors with sparse storage.

Proxy layer changes:
- Update validate_util.go checkAligned() with getExpectedVectorRows() helper
  to validate nullable vector field alignment using valid data count
- Update checkFloatVectorFieldData/checkSparseFloatVectorFieldData for
  nullable vector validation with proper row count expectations
- Add FieldDataIdxComputer in typeutil/schema.go for logical-to-physical
  index translation during search reduce operations
- Update search_reduce_util.go reduceSearchResultData to use idxComputers
  for correct field data indexing with nullable vectors
- Update task.go, task_query.go, task_upsert.go for nullable vector handling
- Update msg_pack.go with nullable vector field data processing

QueryNode layer changes:
- Update segments/result.go for nullable vector result handling
- Update segments/search_reduce.go with nullable vector offset translation

Storage and index changes:
- Update data_codec.go and utils.go for nullable vector serialization
- Update indexcgowrapper/dataset.go and index.go for nullable vector indexing

Utility changes:
- Add FieldDataIdxComputer struct with Compute() method for efficient
  logical-to-physical index mapping across multiple field data
- Update EstimateEntitySize() and AppendFieldData() with fieldIdxs parameter
- Update funcutil.go with nullable vector support functions

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: marcelo-cjl
To complete the pull request process, please assign czs007 after the PR has been reviewed.
You can assign the PR to them by writing /assign @czs007 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot added size/XXL Denotes a PR that changes 1000+ lines. area/test sig/testing labels Dec 12, 2025
@mergify mergify bot added the dco-passed DCO check passed. label Dec 12, 2025
@sre-ci-robot
Copy link
Contributor

[ci-v2-notice]
Notice: We are gradually rolling out the new ci-v2 system.

  • Legacy CI jobs remain unaffected, you can just ignore ci-v2 if you don't want to run it.
  • Additional "ci-v2/*" checkers will run for this PR to ensure the new ci-v2 system is working as expected.
  • For tests that exist in both v1 and v2, passing in either system is considered PASS.

To rerun ci-v2 checks, comment with:

  • /ci-rerun-code-check // for ci-v2/code-check
  • /ci-rerun-build // for ci-v2/build
  • /ci-rerun-ut-integration // for ci-v2/ut-integration
  • /ci-rerun-ut-go // for ci-v2/ut-go
  • /ci-rerun-ut-cpp // for ci-v2/ut-cpp
  • /ci-rerun-ut // for all ci-v2/ut-integration, ci-v2/ut-go, ci-v2/ut-cpp
  • /ci-rerun-e2e-arm // for ci-v2/e2e-arm [master branch only]
  • /ci-rerun-e2e-default // for ci-v2/e2e-default [master branch only]

If you have any questions or requests, please contact @zhikunyao.

@mergify mergify bot added the kind/feature Issues related to feature request from users label Dec 12, 2025
@mergify
Copy link
Contributor

mergify bot commented Dec 12, 2025

@marcelo-cjl Please associate the related issue to the body of your Pull Request. (eg. "issue: #")

Supported types: FloatVector, BinaryVector, Float16Vector, BFloat16Vector,
Int8Vector and SparseFloatVector.

C++ layer changes:
- Add FieldDataVectorImpl class with LogicalToPhysicalMapping for sparse storage
- LogicalToPhysicalMapping uses adaptive strategy: map when valid_ratio < 10%,
  otherwise uses vector for O(1) lookup
- Override FillFieldData() in FieldDataVectorImpl to handle nullable vectors
- Move FieldDataSparseVectorImpl from FieldDataInterface.h to FieldData.h
- Remove "vector not support null" restriction in Util.cpp

Go layer changes:
- Add LogicalToPhysicalMapping struct with validCount and l2pMap
- Add ValidData, Nullable, L2PMapping fields to all vector structs
- Update GetRow() to use L2PMapping.GetPhysicalOffset() for physical index
- Update AppendRow()/AppendValidDataRows() to build mapping incrementally
- Update GetMemorySize() to include ValidData, Nullable and L2PMapping

Storage strategy: nullable vectors use sparse storage where Data array only
contains non-null vectors, ValidData bitmap tracks null positions,
L2PMapping translates logical offset to physical offset.

Signed-off-by: marcelo-cjl <[email protected]>
This commit adapts the storage layer (binlog read/write) to correctly handle
nullable vectors with sparse storage, where Data array only contains valid
vectors, ValidData bitmap tracks null positions, and L2PMapping translates
logical offset to physical offset.

C++ storage layer changes:
- Update PayloadReader to extract dim from Arrow schema metadata for nullable
  vectors and relax IsFull() check for nullable vector fields
- Update PayloadWriter to use BinaryBuilder for nullable vectors with dim
  stored in Arrow metadata
- Update Util add_vector_payload() to handle nullable vectors via BinaryBuilder
- Add FieldDataSparseVectorImpl using declaration to access base FillFieldData
- Update DataCodecTest and DiskFileManagerTest with nullable vector test cases

Go storage layer changes:
- Update PayloadReader to handle nullable vectors reading with sparse data
- Update PayloadWriter to serialize nullable vectors with validity bitmap
- Update serde.go serialization to iterate by logical count (len(ValidData))
  and use GetRow(j) with logical index for nullable vectors
- Update data_codec.go binlog deserialization for nullable vector fields
- Update insert_data.go RowNum() to return logical count (len(ValidData))
  for nullable vectors
- Update utils.go to remove obsolete NumRows field references
- Add comprehensive test coverage for nullable vector payload operations

Storage format: nullable vectors stored as Arrow Binary with dim in metadata,
validity bitmap indicates null positions, data contains only valid vectors.

Signed-off-by: marcelo-cjl <[email protected]>
This commit adds nullable vector support for both growing and sealed segments,
including search, retrieval, and index building operations. It introduces
OffsetMapping for efficient logical-to-physical offset translation.

Core infrastructure changes:
- Add OffsetMapping class with Build()/BuildIncremental() for bitmap-based
  offset mapping, supporting both dense lookup and efficient valid count tracking
- Add TransformBitset() and TransformOffset() utilities for converting between
  logical and physical coordinate spaces during search operations
- Update SearchResult to include has_raw_data_ flag and physical offset handling

Growing segment changes:
- Update ConcurrentVector to support nullable vectors with OffsetMapping
- Add get_offset_mapping() and valid data tracking in AckSeal()
- Update FieldIndexing to handle nullable vectors with physical offset storage
- Modify SearchOnGrowing to transform bitset/offsets for nullable vector search
- Add nullable vector tests in SegmentGrowingTest covering search and retrieval

Sealed segment changes:
- Update ChunkedColumn/ChunkedColumnGroup to support OffsetMapping
- Modify ChunkedSegmentSealedImpl to handle nullable vector field data loading
- Update SearchOnSealedIndex/SearchOnSealedColumn with early return for 100% null
  case to prevent crash when searching empty index
- Add FilterVectorValidOffsets() for output field retrieval with nullable vectors
- Expand ChunkedSegmentSealedBinlogIndexTest with comprehensive nullable test cases

Index building changes:
- Update VectorMemIndex/VectorDiskIndex to store and serialize OffsetMapping
- Add BuildValidData()/UpdateValidData() for nullable vector index building
- Update VecIndexCreator and index_c API with valid data parameter support
- Modify InterimSealedIndexTranslator to handle nullable vector binlog index

Search and reduce changes:
- Update Reduce.cpp to transform physical offsets back to logical for output
- Update StreamReduce with nullable vector offset transformation support
- Add early return in SearchOnSealed when offset_mapping has zero valid count

Signed-off-by: marcelo-cjl <[email protected]>
This commit extends nullable vector support to the proxy layer, querynode,
and adds comprehensive validation, search reduce, and field data handling
for nullable vectors with sparse storage.

Proxy layer changes:
- Update validate_util.go checkAligned() with getExpectedVectorRows() helper
  to validate nullable vector field alignment using valid data count
- Update checkFloatVectorFieldData/checkSparseFloatVectorFieldData for
  nullable vector validation with proper row count expectations
- Add FieldDataIdxComputer in typeutil/schema.go for logical-to-physical
  index translation during search reduce operations
- Update search_reduce_util.go reduceSearchResultData to use idxComputers
  for correct field data indexing with nullable vectors
- Update task.go, task_query.go, task_upsert.go for nullable vector handling
- Update msg_pack.go with nullable vector field data processing

QueryNode layer changes:
- Update segments/result.go for nullable vector result handling
- Update segments/search_reduce.go with nullable vector offset translation

Storage and index changes:
- Update data_codec.go and utils.go for nullable vector serialization
- Update indexcgowrapper/dataset.go and index.go for nullable vector indexing

Utility changes:
- Add FieldDataIdxComputer struct with Compute() method for efficient
  logical-to-physical index mapping across multiple field data
- Update EstimateEntitySize() and AppendFieldData() with fieldIdxs parameter
- Update funcutil.go with nullable vector support functions

Signed-off-by: marcelo-cjl <[email protected]>
@mergify
Copy link
Contributor

mergify bot commented Dec 12, 2025

@marcelo-cjl go-sdk check failed, comment rerun go-sdk can trigger the job again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/test dco-passed DCO check passed. do-not-merge/missing-related-issue kind/feature Issues related to feature request from users sig/testing size/XXL Denotes a PR that changes 1000+ lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants