Skip to content

Feature request: Memory-efficient streaming API for embed() #733

@fede-kamel

Description

@fede-kamel

Problem

When embedding large datasets (thousands or millions of texts), the current embed() method accumulates all results in memory before returning. This causes:

  • Out-of-memory errors for very large datasets
  • Memory pressure when processing many texts sequentially
  • No way to process results incrementally (e.g., save to database as embeddings arrive)

For enterprise workloads processing large document corpora, this is a significant limitation.

Proposed Solution

A new embed_stream() method that:

  • Processes texts in configurable batches
  • Yields embeddings one at a time via an iterator
  • Keeps memory usage proportional to batch_size rather than total dataset size
  • Works with both v1 and v2 clients

Usage Example

import cohere

client = cohere.Client()

# Process large dataset incrementally
for embedding in client.embed_stream(
    texts=large_text_list,  # Can be thousands of texts
    model="embed-english-v3.0",
    input_type="classification",
    batch_size=20
):
    save_to_database(embedding.index, embedding.embedding)
    # Only batch_size worth of embeddings in memory at a time

Memory Impact

Dataset Size Current embed() Proposed embed_stream()
1,000 texts ~4 MB ~20 KB
100,000 texts ~400 MB ~20 KB
1,000,000 texts ~4 GB+ (OOM) ~20 KB

Context

We are using the Cohere Python SDK at Oracle for processing large embedding workloads. We have a working implementation in PR #698 that has been tested with the real Cohere API, passes all unit tests, and is backward compatible (no changes to existing embed()).

Additional Details

  • No breaking changes to existing APIs
  • Optional dependency on ijson for more efficient incremental parsing (works without it)
  • Supports both embeddings_floats and embeddings_by_type response formats

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions