Messages timing out while Backlog is SpinningDown

In which: I notice a lot of issues with (intermittent?) timeouts that say bw: SpinningDown and wonder if it is a significant pattern. 

Client reports timeouts from a slow EVAL command. But there was no specific symptom of timeout or slow EVAL command on the server side.

It occurs to me that there are a lot of issues mentioning "SpinningDown", and maybe they arise from a race condition related to the backlog, around the _backlogProcessorIsRunning and _backlogAutoReset state. So, let's see..

_The theoretical analysis_

The job of `StartBacklogProcessor` is to ensure a backlog processing thread is running and handling any backlog work. It handles this either by starting a new backlog processor, or by trying to wake up an existing one by setting the `_backlogAutoReset` event, if it sees that _backlogProcessorIsRunning.

The event is observed by doing `WaitOne(_backlogAutoReset, 5 seconds)`, but there is a race as to:

a) when anybody *sets* the event, and whether WaitOne randomly returns after 5 seconds, or the event is set first.
b) if multiple people set the event how many times it is observed - in theory though, being observed once should be enough, so I'm not convinced this is an actual problem

If `WaitOne` sees the event, great, the backlog processing thread wakes up and loops to handle the work.
If `WaitOne` doesn't see the event, the event is set, and the backlog processing thread exits, releasing its hold on _backlogProcessorIsRunning, and leaving the _backlogStatus equal to SpinningDown. No new backlog thread got started up, since the event was triggered... just not in time.

So yes, we can get to the following undesirable state:

a) we have an item in the backlog
b) backlog status is spinning down
c) backlog processor is no long running, and released `_backlogProcessorIsRunning`
d) somebody might have tried to restart the backlog processor, but backlog thread didn't observe that

_Example repro error message that I feel matches this scenario and motivated me to look deeper_

(from 2025)
>StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (outbound=0KiB, inbound=1KiB, 3168ms elapsed, timeout is 3000ms), command=EVAL, next: EVAL, inst: 0, qu: 0, qs: 0, aw: False, bw: SpinningDown, rs: ReadAsync, ws: Idle, in: 0, last-in: 0, cur-in: 0, sync-ops: 0, async-ops: 63738, serverEndpoint: x.x.x.x:15011, conn-sec: 2617.06, aoc: 0, mc: 1/1/0, mgr: 10 of 10 available, clientName: xxx(SE.Redis-v2.10.1.65101), PerfCounterHelperkeyHashSlot: 13549, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=3,Free=32764,Min=10,Max=32767), POOL: (Threads=13,QueuedItems=0,CompletedItems=8851526,Timers=1526), v: 2.10.1.6510
 
Now I just wonder if there's an actual decent way to repro this to prove the theory...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Messages timing out while Backlog is SpinningDown #2989

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Messages timing out while Backlog is SpinningDown #2989

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions