process readiness #273

whoiskatrin · 2025-12-03T21:07:35Z

fixes this #270

…preview URL; added `waitFor` method to processes, allowing them to wait for specific conditions (log patterns or port availability)

changeset-bot · 2025-12-03T21:07:38Z

🦋 Changeset detected

Latest commit: 2c70430

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
@cloudflare/sandbox	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

pkg-pr-new · 2025-12-03T21:10:38Z

Open in StackBlitz

npm i https://pkg.pr.new/cloudflare/sandbox-sdk/@cloudflare/sandbox@273

commit: bb7006a

github-actions · 2025-12-03T21:10:39Z

🐳 Docker Images Published

Default (no Python):

FROM cloudflare/sandbox:0.0.0-pr-273-bcab50c

With Python:

FROM cloudflare/sandbox:0.0.0-pr-273-bcab50c-python

Version: 0.0.0-pr-273-bcab50c

Use the -python variant if you need Python code execution.

ghostwriternr

Okay I absolutely love the overall DX win here. Before diving deep into the underlying changes, the biggest comment in my head is how many controls we offer the user. I'll use this snippet as reference:

// Pattern 1: Inline readiness - blocks until pattern appears (recommended)
const proc = await sandbox.startProcess('npm start', {
  ready: 'Server listening on port 3000',
  readyTimeout: 30000
});

// Pattern 2: Sequential waits - for multiple conditions
const proc = await sandbox.startProcess('npm start');
await proc.waitFor('Database connected');
await proc.waitFor(3000); // Wait for port 3000 to be available

// Pattern 3: Server shorthand - start, wait, and expose in one call
const { url, process } = await sandbox.serve('npm start', {
  port: 3000,
  hostname: 'example.com'
});

I'm in favour of keeping only pattern 2, and not doing patterns 1 & 3.

I want to avoid 3 because we already have too many methods to run commands on sandbox and I'd prefer to get everything working using the existing paradigms - which this PR already does. We've seen many questions from users already on when to use exec vs execStream vs startProcess, and this might add to the dilemma.
Both 1 & 3 also offer lesser programmatic control overall, because there will certainly be cases where the user might want to look for a PORT to be healthy AND for a specific log to show up. By giving users primarily .waitFor, we can let them chain any number of these (like in your pattern 2 example) and the flow can continue after all the conditions are met.

And that being said, I think we can also be a lot more explicit with the parameters for waitFor - mainly so it becomes very clear and obvious what should be specified. Determining the intent based on the parameter feels flaky. We could perhaps switch to:
a. Explicit waitForPort and waitForLog methods - I like this more
b. waitFor({ports: string[], logs: number[]}) - I like this less because it implicitly encourages adding everything in one go and doesn't give the same flexibility as:

const proc = await sandbox.startProcess('npm start');
await proc.waitForLog('DB startup complete');
await setupConnectionPool();
await proc.waitFor(3000);
startExecuting();

What do you think of this? The TL;DR is wondering whether keeping this change a bit more narrow in terms of what we offer lets us provide a straightforward and more programmable approach by default.

packages/sandbox/README.md

packages/sandbox/src/sandbox.ts

packages/shared/src/errors/contexts.ts

Adds documentation for waitForLog() and waitForPort() methods that allow waiting for processes to be ready before proceeding. Updates examples throughout the docs to use these new simpler patterns instead of manual delays or streaming logs. Changes: - API reference: Document waitForLog() and waitForPort() on Process interface - Background processes guide: Show new readiness methods as primary pattern - Expose services guide: Update all examples to use waitForPort() Synced from cloudflare/sandbox-sdk#273 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Add documentation for new waitForLog() and waitForPort() methods added to the Process interface. These methods allow users to wait for processes to become ready before continuing execution. Changes: - Add Process method reference documentation in commands.mdx - Update background-processes guide with readiness checking examples - Show error handling for ProcessReadyTimeoutError - Demonstrate regex pattern matching for log analysis - Include timeout configuration examples Related to cloudflare/sandbox-sdk#273 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

ghostwriternr · 2025-12-04T16:50:48Z

packages/sandbox/src/index.ts

+export type {
+  ProcessExitedBeforeReadyContext,
+  ProcessReadyTimeoutContext
+} from './errors';


Do end users need these?

ghostwriternr · 2025-12-04T17:30:21Z

packages/sandbox/src/sandbox.ts

+        const result = await this.exec(
+          `bash -c 'echo > /dev/tcp/localhost/${port}' 2>/dev/null`,
+          { timeout: execTimeoutMs }
+        );


okay these 3 lines are a bit deceptive but perhaps force us to think about this design some more. So when developers say waitForPort, they're basically saying - "Block until this port is serving requests properly".

That could mean different things, based on what developers are starting. For example, properly could mean a health endpoint returning 200, or a database is now accepting connections, or some other custom logic too. But the current check essentially just validates if something is bound and listening to the port or if a TCP handshake works, but all of this might happen much before the port is actually ready to be used.

And so we need to improve this check. But also, improve our interface to our now separate waitForPort method. What if, we define an interface like so:

interface WaitForPortOptions { // Check mode (default: 'http') mode?: 'http' | 'tcp'; // HTTP mode options path?: string; // default '/' -> this is useful because even for the sandbox sdk, the health check port does not determine if the interpreter is up, due to delayed startup status?: number; // default 200-399 -> in case the health check endpoint actually returns 201 or something // Timing timeout?: number; // default no timeout interval?: number; // default 500 }

For TCP, we could just do Bun.connect and for HTTP we can do a fetch

AND expose a third method waitForCommand that covers all advanced use-cases:

interface WaitForCommandOptions { // this command is 'successful' if it exits with code 0 command: string; // Timing timeout?: number; // default no timeout interval?: number; // default 500 }

useful to do things like redis-cli ping | grep PONG. And we can even run this in the same session.

And most importantly - all of these ideally are implemented within sandbox-container, i.e. on the container side and exposed as new simple API(s) that the DO side polls.

For this PR though - we can just do waitForPort and keep waitForCommand as a future enhancement, to prevent scope creep. And might keep the PR clean too, because waitForCommand will use the session (although I don't think it'll be messy at all).

ghostwriternr · 2025-12-04T17:36:11Z

and changeset too pls

claude

Claude Code Review

All previous review threads are resolved. The implementation is solid with good test coverage and proper error handling.

One critical issue in the pattern matching debouncing logic needs addressing. The other issues are less urgent but should be fixed.

Critical: Debouncing Race Condition

Pattern matching uses 100ms debouncing (sandbox.ts:1350), but if a pattern spans multiple chunks that arrive within 100ms, the final chunk may never trigger a check if no more data arrives afterward. The pending check at exit (lines 1387-1390) catches this, but only when the stream ends.

Scenario: Chunk 1 "Server listen" arrives, check fails. 50ms later, chunk 2 "ing on port 3000" arrives (pattern now complete), but debounce skips the check. If no more chunks arrive, the pattern won't be detected until stream ends (process exit).

Fix: Schedule a deferred check when debouncing:

if (now - lastCheckTime >= DEBOUNCE_MS) {
  lastCheckTime = now;
  pendingCheck = false;
  const result = checkPattern();
  if (result) return result;
} else if (pendingCheck) {
  // Schedule check for when debounce window expires
  const remaining = DEBOUNCE_MS - (now - lastCheckTime);
  setTimeout(() => {
    if (pendingCheck) {
      const result = checkPattern();
      if (result) /* need async handling here */
    }
  }, remaining);
}

Or simplify by reducing debounce to 50ms and checking more frequently.

Other Issues

TCP port check missing timeout (port-service.ts:439): Bun.connect() has no timeout. If port is in weird state (SYN sent, no SYN-ACK), could hang.

Process status check latency (sandbox.ts:1481): Every 3rd iteration checks process status, but this duration isn't subtracted from sleep time like port check duration is (line 1512).

Terminal status check fragile (sandbox.ts:1483): Manual list 'completed' || 'failed' || 'killed' should use a type guard. Also missing 'error' status?

Global regex not handled (sandbox.ts:1547): User could pass /pattern/g which has stateful issues. Strip global flag or document restriction.

claude · 2025-12-04T23:31:20Z

packages/sandbox/src/sandbox.ts

+            pendingCheck = true;
+
+            // Debounce pattern matching - check at most every 100ms
+            const now = Date.now();


Critical: Pattern may be missed if chunks arrive within debounce window

If pattern spans chunks that arrive <100ms apart and no more data follows, the pending check won't run until stream ends. Lines 1387-1390 catch this at exit, but the pattern won't be detected until process exits.

Schedule a deferred check when debouncing:

if (now - lastCheckTime >= DEBOUNCE_MS) { lastCheckTime = now; pendingCheck = false; const result = checkPattern(); if (result) return result; } else if (pendingCheck && !deferredCheckScheduled) { const remaining = DEBOUNCE_MS - (now - lastCheckTime); deferredCheckScheduled = true; setTimeout(() => { deferredCheckScheduled = false; if (pendingCheck) { pendingCheck = false; const result = checkPattern(); // Handle async return } }, remaining); }

claude · 2025-12-04T23:31:21Z

packages/sandbox-container/src/services/port-service.ts

+
+  private async checkTcpReady(port: number): Promise<PortCheckResponse> {
+    try {
+      const socket = await Bun.connect({


Missing timeout on TCP connection

Bun.connect() has no timeout. If port is in weird state (e.g., SYN sent but no SYN-ACK response), this could hang indefinitely. The HTTP check has a 5s timeout (line 473), but TCP doesn't.

Add timeout via Promise.race or Bun API if available.

claude · 2025-12-04T23:31:21Z

packages/sandbox/src/sandbox.ts

+
+      // Check process status less frequently (every 3rd iteration) to reduce latency
+      if (checkCount % 3 === 0) {
+        const processInfo = await this.getProcess(processId);


Process status check duration not accounted for

Port check duration is subtracted from sleep time (line 1512), but process status check duration isn't. Every 3rd iteration adds 50-200ms latency that isn't compensated.

Track both:

let operationDuration = 0; if (checkCount % 3 === 0) { const start = Date.now(); const processInfo = await this.getProcess(processId); operationDuration += Date.now() - start; // ... } const checkStart = Date.now(); const result = await this.client.ports.checkPortReady(checkRequest); operationDuration += Date.now() - checkStart; const sleepTime = Math.max(0, targetInterval - operationDuration);

claude · 2025-12-04T23:31:21Z

packages/sandbox/src/sandbox.ts

+      if (checkCount % 3 === 0) {
+        const processInfo = await this.getProcess(processId);
+        if (
+          !processInfo ||


Fragile terminal status check

Manual list 'completed' || 'failed' || 'killed' will miss new terminal states. Also, should this include 'error' status?

Create type guard:

function isTerminalStatus(status: ProcessStatus): boolean { return status === 'completed' || status === 'failed' || status === 'killed' || status === 'error'; }

claude · 2025-12-04T23:31:21Z

packages/sandbox/src/sandbox.ts

+      }
+    } else {
+      // Regex match
+      const match = text.match(pattern);


Global regex flag not handled

User could pass /pattern/g which has stateful issues with lastIndex. While not reused here, it's fragile.

Strip global flag:

const safePattern = new RegExp(pattern.source, pattern.flags.replace('g', '')); const match = text.match(safePattern);

The `Process` object now includes methods to detect readiness based on port and log patterns, enhancing process management.

claude

Claude Code Review

All 4 issues from the previous review have been resolved. The implementation is solid:

Fixed issues:

✅ Debounce timing bug resolved (reduced to 50ms with terminal checks)
✅ TCP timeout added (5s matching HTTP)
✅ Process status check duration now accounted for in sleep calculations
✅ Terminal status check uses isTerminalStatus() type guard

Minor observations:

Debounce logic could be micro-optimized for very high-throughput scenarios, but current approach prioritizes correctness
Edge case exists where process exit detection can lag up to 1.5s (3 iterations × 500ms), but this is acceptable given the performance trade-offs

Test coverage is comprehensive - 770 lines of unit tests + 495 lines of E2E tests covering string/regex patterns, timeouts, exit handling, HTTP/TCP modes, pattern spanning chunks, and stderr detection.

Architecture compliance - follows SDK patterns correctly: proper error classes, types in shared package, no any types, client architecture pattern.

Looks good to merge.

Documents new waitForLog() and waitForPort() methods for detecting process readiness. Updates Commands API reference and Background Processes guide with practical examples. Related to cloudflare/sandbox-sdk#273 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

whoiskatrin added 2 commits December 3, 2025 21:04

Introduced a new /server endpoint to start a server and return its …

39d6f61

…preview URL; added `waitFor` method to processes, allowing them to wait for specific conditions (log patterns or port availability)

minor fix

b45627e