-
Notifications
You must be signed in to change notification settings - Fork 84
Open
Description
Problem
When embedding large datasets (thousands or millions of texts), the current embed() method accumulates all results in memory before returning. This causes:
- Out-of-memory errors for very large datasets
- Memory pressure when processing many texts sequentially
- No way to process results incrementally (e.g., save to database as embeddings arrive)
For enterprise workloads processing large document corpora, this is a significant limitation.
Proposed Solution
A new embed_stream() method that:
- Processes texts in configurable batches
- Yields embeddings one at a time via an iterator
- Keeps memory usage proportional to
batch_sizerather than total dataset size - Works with both v1 and v2 clients
Usage Example
import cohere
client = cohere.Client()
# Process large dataset incrementally
for embedding in client.embed_stream(
texts=large_text_list, # Can be thousands of texts
model="embed-english-v3.0",
input_type="classification",
batch_size=20
):
save_to_database(embedding.index, embedding.embedding)
# Only batch_size worth of embeddings in memory at a timeMemory Impact
| Dataset Size | Current embed() |
Proposed embed_stream() |
|---|---|---|
| 1,000 texts | ~4 MB | ~20 KB |
| 100,000 texts | ~400 MB | ~20 KB |
| 1,000,000 texts | ~4 GB+ (OOM) | ~20 KB |
Context
We are using the Cohere Python SDK at Oracle for processing large embedding workloads. We have a working implementation in PR #698 that has been tested with the real Cohere API, passes all unit tests, and is backward compatible (no changes to existing embed()).
Additional Details
- No breaking changes to existing APIs
- Optional dependency on
ijsonfor more efficient incremental parsing (works without it) - Supports both
embeddings_floatsandembeddings_by_typeresponse formats
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels