Skip to content

Conversation

@drborges
Copy link

@drborges drborges commented Nov 28, 2025

Summary

Add support to Stale-While-Revalidate (SWR) prompt cache strategy based off the Design Document: docs/future-enhancements/STALE_WHILE_REVALIDATE_DESIGN.md

The GIF below shows the new feature in action against a self-hosted production instance of Langfuse using the following config:

Langfuse.configure do |config|
  config.public_key = ENV["LANGFUSE_PUB_KEY"]
  config.secret_key = ENV["LANGFUSE_SECRET_KEY"]
  config.base_url = ENV["LANGFUSE_BASE_URL"]

  config.cache_backend = :rails
  config.cache_ttl = 2 # cache entries are fresh for 2 seconds, after that, they become stale

  # Enable SWR
  config.cache_stale_while_revalidate = true
  config.cache_stale_ttl = 2 # cache stale entries are served for 2 seconds until the expire completely (grace period).
  config.cache_refresh_threads = 5
end

2025-11-28 14 23 40

📊 Impact

  • Performance: Enables instant responses for 99% of requests by serving stale data while refreshing in background
  • Backward Compatibility: Fully backward compatible - SWR is opt-in via configuration (requires :rails backend)
  • Dependencies: Requires concurrent-ruby gem for thread pool management

Usage

Configure the client:

Langfuse.configure do |config|
  config.public_key = ENV['LANGFUSE_PUBLIC_KEY']
  config.secret_key = ENV['LANGFUSE_SECRET_KEY']
  
  config.cache_backend = :rails
  config.cache_ttl = 300 # Fresh for 5 minutes
  
  # Enable SWR
  config.cache_stale_while_revalidate = true
  config.cache_stale_ttl = 300 # Grace period: 5 more minutes
  config.cache_refresh_threads = 10 # Background thread pool size
end

Once configured, SWR works transparently:

client = Langfuse.client

# First request - populates cache
prompt = client.get_prompt("greeting") # ~100ms (API call)

# Subsequent requests while fresh
prompt = client.get_prompt("greeting") # ~1ms (cache hit)

# After cache_ttl expires but within grace period
prompt = client.get_prompt("greeting") # ~1ms (stale data + background refresh)

# Background refresh completes, next request gets fresh data
prompt = client.get_prompt("greeting") # ~1ms (fresh cache)

More details on the proposed documentation.

@drborges drborges force-pushed the feature/stale-while-revalidate branch 3 times, most recently from 979705c to ae8746b Compare November 28, 2025 14:55
@drborges drborges marked this pull request as draft November 28, 2025 17:13
@drborges
Copy link
Author

@NoahFisher I still have to deal with some rubocop offenses and so one final review of the test coverage but I could use some feedback on this draft, I also have one open question to double check in the PR description.

@drborges drborges force-pushed the feature/stale-while-revalidate branch from da9e215 to d798182 Compare November 28, 2025 18:00
@drborges drborges force-pushed the feature/stale-while-revalidate branch 3 times, most recently from 43fcf9b to 7ff496b Compare December 3, 2025 20:03
Add three new configuration options to support SWR caching:
- cache_stale_while_revalidate: Enable/disable SWR (default: false)
- cache_stale_ttl: Grace period for serving stale data (default: 300s)
- cache_refresh_threads: Background thread pool size (default: 5)

SWR caching requires Rails cache backend and will be validated
during configuration. This enables serving stale data instantly
while refreshing in the background for better performance.
Restructure CacheEntry to support stale-while-revalidate pattern:
- Replace single expires_at with fresh_until and stale_until
- Add fresh? method: entry is fresh and can be served immediately
- Add stale? method: entry is stale but usable (revalidate in background)
- Update expired? method: entry must be revalidated synchronously

This three-state model enables SWR caching where stale entries
can be served instantly while a background refresh occurs.
Add full SWR caching support with concurrent background refresh:
- fetch_with_stale_while_revalidate: Main SWR fetch method
- Three-state handling: FRESH (instant), STALE (instant + refresh), MISS (sync fetch)
- Concurrent thread pool for background refresh operations
- Stampede protection during background refresh
- Logger integration for debugging cache states
- Graceful shutdown of thread pool

The adapter now accepts stale_ttl and refresh_threads parameters.
When stale_ttl is set, SWR is enabled. When nil, falls back to
standard fetch_with_lock behavior.

This provides instant responses for 99% of requests by serving
stale data while refreshing in the background.
Update Client to pass SWR configuration options to RailsCacheAdapter:
- Pass stale_ttl (only when cache_stale_while_revalidate is enabled)
- Pass refresh_threads for background thread pool size
- Pass logger for debugging and error reporting

Extract create_rails_cache_adapter method for clarity.
When SWR is disabled, stale_ttl is nil, keeping standard behavior.

Tests verify correct configuration propagation and thread pool
creation based on SWR settings.
Refactor get_prompt to choose optimal caching strategy based on
available cache capabilities:
1. SWR cache (fetch_with_stale_while_revalidate) - best performance
2. Distributed cache (fetch_with_lock) - stampede protection
3. Simple cache (get/set) - basic in-memory caching
4. No cache - direct API fetch

Extract separate methods for each strategy for better maintainability
and testability. The strategy is selected at runtime based on which
methods the cache adapter responds to.

This enables automatic use of SWR when available without breaking
existing cache implementations.
This was the pattern already in place. This commit simply expands the scope to some of the new spec examples added, however, it would be interesting to discuss whether we need to allow for a larger number of memoized helpers or if some refactoring is needed for these specs.
@drborges drborges force-pushed the feature/stale-while-revalidate branch from 7ff496b to ab7628d Compare December 4, 2025 14:02
Add spec examples to verify that SWR (Stale-While-Revalidate) is correctly
enabled only when stale_ttl is positive, and disabled for zero or negative
values. Adds thread pool initialization coverage for both cache adapters.

All 785 tests pass with 97.36% coverage maintained.
@drborges drborges marked this pull request as ready for review December 4, 2025 18:08
@drborges
Copy link
Author

drborges commented Dec 4, 2025

@NoahFisher, @kxzk I think this is in a good spot for some review. Let me know if there are any concerns or change suggestions you'd like me to address. Thanks.

@drborges
Copy link
Author

drborges commented Dec 4, 2025

My last commit is disabling Rubocop's Metrics/ClassLength for the ApiClient class but it may be interesting to perhaps move parts of the class into mixin modules to keep the file under the desired length. Thoughts @NoahFisher?

@kxzk
Copy link
Collaborator

kxzk commented Dec 4, 2025

@NoahFisher, @kxzk I think this is in a good spot for some review. Let me know if there are any concerns or change suggestions you'd like me to address. Thanks.

Will try and review today.

Copy link
Collaborator

@kxzk kxzk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments but overall looks solid

@drborges
Copy link
Author

drborges commented Dec 9, 2025

@kxzk thanks for all the feedback. I'll address them next.

Consolidates all cache locking mechanisms to use a single, simplified API:
- Unified lock keys: Both fetch and refresh operations use the same lock
- Single timeout: All locks use lock_timeout (10s) instead of separate values
- Simplified method names: build_lock_key, acquire_lock, release_lock

Changes:
- StaleWhileRevalidate module now defines single set of abstract lock methods
- RailsCacheAdapter implements distributed locking with Rails.cache
- PromptCache implements in-memory thread-safe locking
- All tests updated to reflect unified lock key format (:lock suffix)

Benefits:
- Simpler, more maintainable code
- Single lock prevents duplicate work for both fetch and refresh scenarios
- Clearer intent and easier to reason about
- Reduced code duplication

All tests passing (785 examples, 97.58% coverage)
Resolves TODO by moving fetch_with_lock out of StaleWhileRevalidate module
and into RailsCacheAdapter where it belongs.

Rationale:
- StaleWhileRevalidate module is specifically for SWR functionality
- fetch_with_lock is for non-SWR scenarios (when stale_ttl is 0)
- Only RailsCacheAdapter needs distributed stampede protection
- PromptCache uses in-memory locking, not suitable for fetch_with_lock

Changes:
- Moved fetch_with_lock from module to RailsCacheAdapter (public method)
- Moved wait_for_cache helper to RailsCacheAdapter
- Removed cache_set_simple (no longer needed, uses set() directly)
- Kept build_lock_key in module (used by schedule_refresh)
- Updated tests to reflect PromptCache doesn't support fetch_with_lock

Benefits:
- Cleaner separation of concerns
- StaleWhileRevalidate module reduced from 370 to 283 lines
- No rubocop disable needed for module length
- Clearer API: only RailsCacheAdapter provides distributed locking

All tests passing (786 examples, 97.01% coverage)
Simplifies cache API by deriving expiration values from ttl and stale_ttl
instead of requiring explicit expires_in parameter.

Rationale:
- Cache implementations already know their ttl and stale_ttl values
- No need for callers to calculate and pass expires_in
- Expiration logic is encapsulated within each cache implementation
- Cleaner, simpler API with fewer parameters

Changes:
- Removed expires_in parameter from PromptCache#set
- Removed expires_in parameter from RailsCacheAdapter#set
- Removed expires_in parameter from cache_set abstract method
- PromptCache now calculates: fresh_until = ttl, stale_until = ttl + stale_ttl
- RailsCacheAdapter calculates: expires_in = ttl + stale_ttl (if SWR) else ttl
- Updated all call sites and tests

Benefits:
- Simpler API - fewer parameters to think about
- Encapsulation - expiration logic lives in cache implementation
- Less error-prone - no risk of passing wrong expires_in value
- More maintainable - changing TTL logic only affects cache class

All tests passing (786 examples, 96.93% coverage)
Ensures cache thread pools are properly shutdown when client is disposed.

Rationale:
- Client#shutdown already calls score_client.shutdown for cleanup
- SWR-enabled caches use thread pools that need proper shutdown
- Thread pools should be gracefully terminated to prevent leaks
- Consistent cleanup pattern across all client components

Changes:
- Added ApiClient#shutdown method to shutdown cache if it supports it
- Client#shutdown now calls api_client.shutdown in addition to score_client
- Uses respond_to?(:shutdown) check for backward compatibility
- StaleWhileRevalidate#shutdown safely handles nil thread pools

Test Coverage:
- Added 5 new tests for ApiClient#shutdown (all scenarios)
- Added 3 new tests for Client#shutdown with cache scenarios
- Tests verify shutdown is called on SWR-enabled caches
- Tests verify shutdown is safe when cache is nil or doesn't support SWR
- All 794 examples passing, 96.86% coverage maintained

Benefits:
- Proper resource cleanup for long-running applications
- No thread pool leaks when client is shutdown
- Safe for all cache configurations (SWR, TTL-only, or no cache)
- Graceful degradation for caches without shutdown support
@drborges drborges force-pushed the feature/stale-while-revalidate branch from f325ea2 to dff1a15 Compare December 10, 2025 20:22
…or cache_stale_ttl

Improves API ergonomics by replacing Float::INFINITY with a more Ruby-idiomatic
:indefinite symbol for never-expiring cache configuration.

Rationale:
- :indefinite is more readable and Ruby-idiomatic than Float::INFINITY
- Symbols are the conventional way to represent special values in Ruby config
- Makes intent clearer in configuration code
- Aligns with Ruby community conventions (e.g., Rails cache :expires_in)

Changes:
- Renamed THOUSAND_YEARS_IN_SECONDS constant to INDEFINITE_SECONDS
- Updated constant comment to clarify intent ("indefinite cache duration")
- Changed normalize_stale_ttl from class method to instance method (normalized_stale_ttl)
- Instance method reads from cache_stale_ttl and returns normalized value on-demand
- Removed normalization from Config#initialize (now done lazily via method)
- Updated create_memory_cache to use config.normalized_stale_ttl
- Updated create_rails_cache_adapter to use config.normalized_stale_ttl
- Updated validate_swr_config! to handle :indefinite symbol properly
- Updated validation error message: "must be non-negative or :indefinite"

Documentation Updates:
- docs/CACHING.md: Changed Float::INFINITY to :indefinite in examples
- docs/CONFIGURATION.md: Changed Float::INFINITY to :indefinite in examples
- lib/langfuse/config.rb: Updated attr_accessor doc to mention :indefinite
- lib/langfuse/prompt_cache.rb: Updated comment to reflect :indefinite normalization

Test Coverage:
- Updated all tests to use :indefinite instead of Float::INFINITY
- Updated all tests to use INDEFINITE_SECONDS instead of THOUSAND_YEARS_IN_SECONDS
- Updated test expectations for new validation error message
- All 797 examples passing, 96.86% coverage maintained

Benefits:
- More intuitive API for developers (config.cache_stale_ttl = :indefinite)
- Clearer intent in configuration code
- Normalization happens on-demand via instance method
- Original config value preserved (not mutated during initialization)
- Better alignment with Ruby conventions and best practices
@drborges drborges force-pushed the feature/stale-while-revalidate branch from dff1a15 to f7e9fdc Compare December 10, 2025 20:24
Adds specific validation to catch when cache_stale_while_revalidate is enabled
but cache_stale_ttl is nil, providing a helpful error message with recommended
solutions.

Rationale:
- When SWR is enabled, cache_stale_ttl must have a value
- nil is invalid regardless of SWR status
- Users need clear guidance on how to fix the configuration
- Prevents confusing runtime errors from occurring later

Changes:
- Added validate_swr_stale_ttl! private method to handle stale_ttl validation
- Added validate_refresh_threads! private method to validate refresh threads
- Split validate_swr_config! into smaller, focused methods
- New error message guides users to set cache_ttl or :indefinite
- Maintains existing validation that cache_stale_ttl cannot be nil
- Removed apply_swr_defaults! method from Config class
- Removed apply_swr_defaults! call from Config#initialize
- Removed test for auto-defaulting behavior
- Validation now catches missing cache_stale_ttl with helpful message

Error Message:
"cache_stale_ttl cannot be nil when cache_stale_while_revalidate is enabled.
Set it to cache_ttl for a logical default, or use :indefinite for never-expiring cache."

Test Coverage:
- Added test for SWR enabled with nil cache_stale_ttl scenario
- All 802 examples passing with 96.87% coverage maintained
- RuboCop: No offenses detected

Benefits:
- Clear, actionable error messages for misconfiguration
- Helps users understand the relationship between SWR and stale_ttl
- Reduces cyclomatic complexity through method extraction
- Easier to maintain and test individual validation rules
- More explicit, less magical configuration
- Easier to understand what values are actually set
- No hidden side effects during initialization
@drborges
Copy link
Author

@kxzk I think I was able to address all your feedback. This is ready for another round of reviews. Thanks again.

@drborges
Copy link
Author

@kxzk, @NoahFisher let me know if this look good and what the next steps would be for us getting this merged to main.

@drborges drborges requested a review from kxzk December 11, 2025 19:42
Copy link
Member

@NoahFisher NoahFisher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Considerations:

  1. Memory Leak Potential with In-Memory Locks

In PromptCache, locks are stored in a hash but only deleted on release_lock.

If a refresh thread crashes between acquire_lock and release_lock (despite the ensure block), the lock is never released. The ensure should handle this, but if the thread is killed externally (e.g., Thread#kill), the lock persists forever. Unlike Redis locks which have TTL expiration, these in-memory locks have no timeout.

Suggestion: Accept this as a known limitation of the in-memory backend, but document it.

  1. No Backpressure Signal to Callers

When the thread pool queue fills up and tasks are discarded, callers have no idea. This is fine for SWR (they get stale data), but it means:

  • Monitoring is blind to refresh failures
  • If the API is down for extended periods, stale data keeps being served with no signal

Question: Should there be a hook or callback for observability? e.g., on_refresh_dropped, on_refresh_failed?

  1. Cold Start Behavior

On application boot with an empty cache, the first request for each prompt will be a synchronous API call (cache miss). With SWR enabled, you've documented cache warming, which is good. But worth calling out:

SWR doesn't help cold starts - it only helps after the cache has been populated at least once.

  1. Ruby's GVL (Global VM Lock)

Worth noting that Ruby's GVL means these threads aren't truly parallel for CPU work. However, for I/O-bound work (HTTP calls to Langfuse API), threads do release the GVL during I/O waits, so this is fine. Just worth being aware that max_threads: 5 doesn't mean 5x throughput for CPU-bound work.

  1. Should SWR be Default-On for Rails Backend?

The PR makes SWR opt-in (cache_stale_while_revalidate = false by default). Given the latency benefits and that the distributed lock prevents thundering herd, is there an argument for making
it default-on for the :rails backend specifically? Users with Redis already have the
infrastructure for distributed locks.

Counter-argument: Surprising behavior changes are bad. Opt-in is safer.

Good work so far, I wouldn't consider the above blocking comments (I feel more strongly about the inline ones), but curious if we should document our thinking for next steps.

Comment on lines +43 to +45
| `cache_stale_while_revalidate` | Boolean | No | `false` | Enable stale-while-revalidate |
| `cache_stale_ttl` | Integer | No | `60` when SWR is enabled | Stale TTL (seconds) |
| `cache_refresh_threads` | Integer | No | `5` | Background refresh threads |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Configuration Complexity

Concern: This PR adds 3 new configuration options (cache_stale_while_revalidate, cache_stale_ttl, cache_refresh_threads). I'd push back on exposing all of these to users.

Principle: Every configuration option is a decision the user has to make. Good defaults mean fewer decisions.

Suggestion: Consider collapsing to a single option:

Instead of:

config.cache_stale_while_revalidate = true
config.cache_stale_ttl = 300
config.cache_refresh_threads = 5

Just:

config.cache_stale_while_revalidate = true  # Uses sensible defaults internally

Reasoning:

  • cache_stale_ttl defaulting to cache_ttl is already the right choice 99% of the time - why expose it?
  • cache_refresh_threads at 5 is fine for virtually all deployments - this is an implementation detail, not a user concern
  • Users who truly need to tune these can always subclass or we can add them later

The bar for adding config options should be: "Will a significant number of users need to change this from the default?" If not, hardcode it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We absolutely need to set the TTL to infinite. Not exposing the TTL setting won't work for us.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what the use case for a non-infinite TTL?

Copy link
Author

@drborges drborges Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every configuration option is a decision the user has to make. Good defaults mean fewer decisions.

I agree, it certainly makes sense to hide cache_refresh_threads. Exposing cache_stale_ttl would be more of a flexibility in my opinion in case clients have very specific needs.

With that said, if I choose to enable SWR, I would more often than not want an infinite grace period, e.g. cache_stale_ttl = :indefinite for high availability reasons, then use cache_ttl to control how often cache is refreshed.

Based on that, only exposing config.cache_stale_while_revalidate would make sense to me if that defaults cache_stale_ttl to :indefinite as per the proposed implementation.

Does that make sense?

@kxzk kxzk added the enhancement New feature or request label Dec 12, 2025
@drborges
Copy link
Author

Thanks for all of the great feedback so far @NoahFisher! I've addressed some of the points you raised and wanted to follow up on a few of the points you brought up above:

Point 2

No Backpressure Signal to Callers
When the thread pool queue fills up and tasks are discarded, callers have no idea. This is fine for SWR (they get stale data), but it means:

Monitoring is blind to refresh failures
If the API is down for extended periods, stale data keeps being served with no signal
Question: Should there be a hook or callback for observability? e.g., on_refresh_dropped, on_refresh_failed?

This is a great point and I think it would be interesting to provide such hooks. Is the following somewhat aligned with what you had in mind?

Langfuse.configure do |config|
  # Hook called whenever a refresh attempt fails for whatever reason
  config.on_cache_refresh_failed do |error, cache_key|
    # handle refresh error accordingly... e.g. publish to Sentry, log something out, etc...
  end
  
  # Hook called whenever a refresh attempt is dropped due to not being able to acquire the lock
  config.on_cache_refresh_dropped do |cache_key|
    # handle refresh drops accordingly...
  end
end

Can you think of any other scenario we'd like to cover?

Also, would it make sense to ship that as another enhancement PR?

Point 3

With SWR enabled, you've documented cache warming, which is good.

Could you point me to that doc entry so I can double check? Cache warming should not be impacted by SWR as you pointed out, so that would likely be a mishap on my part.

Point 5

Should SWR be Default-On for Rails Backend?
The PR makes SWR opt-in (cache_stale_while_revalidate = false by default). Given the latency benefits and that the distributed lock prevents thundering herd, is there an argument for making
it default-on for the :rails backend specifically? Users with Redis already have the infrastructure for distributed locks.

Counter-argument: Surprising behavior changes are bad. Opt-in is safer.

My initial thinking was exactly that "Opt-in seemed safer" and intentional. Now, you do raise a good point, if 99% of the time Rails clients end up turning SWR on, then perhaps it should be the default? I could go either way on this one to be honest.

@drborges
Copy link
Author

drborges commented Dec 16, 2025

@NoahFisher, @kxzk do you think we may be able to cut a new release including the work in this PR at some point this week?

Also, one small snag I just ran into is that the project we plan on using langfuse-rb already has the faraday gem pinned to 1.0 due to some old dependency limitations we yet have to deal with.

Now the simplest solution to our problem seems to be having langfuse-rb depend on faraday >= 1.0 as it does not appear to require any specific feature from faraday 2.0. Would this be something we could consider for this PR or a follow-up?

@drborges
Copy link
Author

Moving the faraday convo to a new issue: #38

@drborges
Copy link
Author

@NoahFisher, @kxzk any thoughts on the last update? Do you think we could move on with the solution as is and address anything that is non-blocking in a follow-up PR perhaps?

@benlangfeld
Copy link

@kxzk @NoahFisher Happy new year. Could we please get this merged? It's been going quite a long time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants