Skip to content

Conversation

@whoiskatrin
Copy link
Collaborator

@whoiskatrin whoiskatrin commented Dec 3, 2025

fixes this #270

…preview URL; added `waitFor` method to processes, allowing them to wait for specific conditions (log patterns or port availability)
@changeset-bot
Copy link

changeset-bot bot commented Dec 3, 2025

🦋 Changeset detected

Latest commit: 2c70430

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@cloudflare/sandbox Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link

pkg-pr-new bot commented Dec 3, 2025

Open in StackBlitz

npm i https://pkg.pr.new/cloudflare/sandbox-sdk/@cloudflare/sandbox@273

commit: bb7006a

@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

🐳 Docker Images Published

Default (no Python):

FROM cloudflare/sandbox:0.0.0-pr-273-bcab50c

With Python:

FROM cloudflare/sandbox:0.0.0-pr-273-bcab50c-python

Version: 0.0.0-pr-273-bcab50c

Use the -python variant if you need Python code execution.

claude[bot]

This comment was marked as outdated.

claude[bot]

This comment was marked as outdated.

claude[bot]

This comment was marked as outdated.

@whoiskatrin whoiskatrin marked this pull request as ready for review December 3, 2025 21:27
claude[bot]

This comment was marked as outdated.

claude[bot]

This comment was marked as outdated.

Copy link
Member

@ghostwriternr ghostwriternr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I absolutely love the overall DX win here. Before diving deep into the underlying changes, the biggest comment in my head is how many controls we offer the user. I'll use this snippet as reference:

// Pattern 1: Inline readiness - blocks until pattern appears (recommended)
const proc = await sandbox.startProcess('npm start', {
  ready: 'Server listening on port 3000',
  readyTimeout: 30000
});

// Pattern 2: Sequential waits - for multiple conditions
const proc = await sandbox.startProcess('npm start');
await proc.waitFor('Database connected');
await proc.waitFor(3000); // Wait for port 3000 to be available

// Pattern 3: Server shorthand - start, wait, and expose in one call
const { url, process } = await sandbox.serve('npm start', {
  port: 3000,
  hostname: 'example.com'
});

I'm in favour of keeping only pattern 2, and not doing patterns 1 & 3.

  1. I want to avoid 3 because we already have too many methods to run commands on sandbox and I'd prefer to get everything working using the existing paradigms - which this PR already does. We've seen many questions from users already on when to use exec vs execStream vs startProcess, and this might add to the dilemma.
  2. Both 1 & 3 also offer lesser programmatic control overall, because there will certainly be cases where the user might want to look for a PORT to be healthy AND for a specific log to show up. By giving users primarily .waitFor, we can let them chain any number of these (like in your pattern 2 example) and the flow can continue after all the conditions are met.

And that being said, I think we can also be a lot more explicit with the parameters for waitFor - mainly so it becomes very clear and obvious what should be specified. Determining the intent based on the parameter feels flaky. We could perhaps switch to:
a. Explicit waitForPort and waitForLog methods - I like this more
b. waitFor({ports: string[], logs: number[]}) - I like this less because it implicitly encourages adding everything in one go and doesn't give the same flexibility as:

const proc = await sandbox.startProcess('npm start');
await proc.waitForLog('DB startup complete');
await setupConnectionPool();
await proc.waitFor(3000);
startExecuting();

What do you think of this? The TL;DR is wondering whether keeping this change a bit more narrow in terms of what we offer lets us provide a straightforward and more programmable approach by default.

claude[bot]

This comment was marked as outdated.

agents-git-bot bot pushed a commit to cloudflare/cloudflare-docs that referenced this pull request Dec 4, 2025
Adds documentation for waitForLog() and waitForPort() methods that allow
waiting for processes to be ready before proceeding. Updates examples
throughout the docs to use these new simpler patterns instead of manual
delays or streaming logs.

Changes:
- API reference: Document waitForLog() and waitForPort() on Process interface
- Background processes guide: Show new readiness methods as primary pattern
- Expose services guide: Update all examples to use waitForPort()

Synced from cloudflare/sandbox-sdk#273

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
claude[bot]

This comment was marked as outdated.

agents-git-bot bot pushed a commit to cloudflare/cloudflare-docs that referenced this pull request Dec 4, 2025
Add documentation for new waitForLog() and waitForPort() methods added
to the Process interface. These methods allow users to wait for
processes to become ready before continuing execution.

Changes:
- Add Process method reference documentation in commands.mdx
- Update background-processes guide with readiness checking examples
- Show error handling for ProcessReadyTimeoutError
- Demonstrate regex pattern matching for log analysis
- Include timeout configuration examples

Related to cloudflare/sandbox-sdk#273

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Comment on lines 101 to 104
export type {
ProcessExitedBeforeReadyContext,
ProcessReadyTimeoutContext
} from './errors';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do end users need these?

Comment on lines 1489 to 1492
const result = await this.exec(
`bash -c 'echo > /dev/tcp/localhost/${port}' 2>/dev/null`,
{ timeout: execTimeoutMs }
);
Copy link
Member

@ghostwriternr ghostwriternr Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay these 3 lines are a bit deceptive but perhaps force us to think about this design some more. So when developers say waitForPort, they're basically saying - "Block until this port is serving requests properly".

That could mean different things, based on what developers are starting. For example, properly could mean a health endpoint returning 200, or a database is now accepting connections, or some other custom logic too. But the current check essentially just validates if something is bound and listening to the port or if a TCP handshake works, but all of this might happen much before the port is actually ready to be used.

And so we need to improve this check. But also, improve our interface to our now separate waitForPort method. What if, we define an interface like so:

interface WaitForPortOptions {
  // Check mode (default: 'http')
  mode?: 'http' | 'tcp';

  // HTTP mode options
  path?: string;              // default '/' -> this is useful because even for the sandbox sdk, the health check port does not determine if the interpreter is up, due to delayed startup
  status?: number; // default 200-399 -> in case the health check endpoint actually returns 201 or something

  // Timing
  timeout?: number;           // default no timeout
  interval?: number;          // default 500
}

For TCP, we could just do Bun.connect and for HTTP we can do a fetch

AND expose a third method waitForCommand that covers all advanced use-cases:

interface WaitForCommandOptions {
  // this command is 'successful' if it exits with code 0
  command: string;

  // Timing
  timeout?: number;           // default no timeout
  interval?: number;          // default 500
}

useful to do things like redis-cli ping | grep PONG. And we can even run this in the same session.

And most importantly - all of these ideally are implemented within sandbox-container, i.e. on the container side and exposed as new simple API(s) that the DO side polls.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR though - we can just do waitForPort and keep waitForCommand as a future enhancement, to prevent scope creep. And might keep the PR clean too, because waitForCommand will use the session (although I don't think it'll be messy at all).

@ghostwriternr
Copy link
Member

and changeset too pls

Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

All previous review threads are resolved. The implementation is solid with good test coverage and proper error handling.

One critical issue in the pattern matching debouncing logic needs addressing. The other issues are less urgent but should be fixed.

Critical: Debouncing Race Condition

Pattern matching uses 100ms debouncing (sandbox.ts:1350), but if a pattern spans multiple chunks that arrive within 100ms, the final chunk may never trigger a check if no more data arrives afterward. The pending check at exit (lines 1387-1390) catches this, but only when the stream ends.

Scenario: Chunk 1 "Server listen" arrives, check fails. 50ms later, chunk 2 "ing on port 3000" arrives (pattern now complete), but debounce skips the check. If no more chunks arrive, the pattern won't be detected until stream ends (process exit).

Fix: Schedule a deferred check when debouncing:

if (now - lastCheckTime >= DEBOUNCE_MS) {
  lastCheckTime = now;
  pendingCheck = false;
  const result = checkPattern();
  if (result) return result;
} else if (pendingCheck) {
  // Schedule check for when debounce window expires
  const remaining = DEBOUNCE_MS - (now - lastCheckTime);
  setTimeout(() => {
    if (pendingCheck) {
      const result = checkPattern();
      if (result) /* need async handling here */
    }
  }, remaining);
}

Or simplify by reducing debounce to 50ms and checking more frequently.

Other Issues

TCP port check missing timeout (port-service.ts:439): Bun.connect() has no timeout. If port is in weird state (SYN sent, no SYN-ACK), could hang.

Process status check latency (sandbox.ts:1481): Every 3rd iteration checks process status, but this duration isn't subtracted from sleep time like port check duration is (line 1512).

Terminal status check fragile (sandbox.ts:1483): Manual list 'completed' || 'failed' || 'killed' should use a type guard. Also missing 'error' status?

Global regex not handled (sandbox.ts:1547): User could pass /pattern/g which has stateful issues. Strip global flag or document restriction.

pendingCheck = true;

// Debounce pattern matching - check at most every 100ms
const now = Date.now();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical: Pattern may be missed if chunks arrive within debounce window

If pattern spans chunks that arrive <100ms apart and no more data follows, the pending check won't run until stream ends. Lines 1387-1390 catch this at exit, but the pattern won't be detected until process exits.

Schedule a deferred check when debouncing:

if (now - lastCheckTime >= DEBOUNCE_MS) {
  lastCheckTime = now;
  pendingCheck = false;
  const result = checkPattern();
  if (result) return result;
} else if (pendingCheck && !deferredCheckScheduled) {
  const remaining = DEBOUNCE_MS - (now - lastCheckTime);
  deferredCheckScheduled = true;
  setTimeout(() => {
    deferredCheckScheduled = false;
    if (pendingCheck) {
      pendingCheck = false;
      const result = checkPattern();
      // Handle async return
    }
  }, remaining);
}


private async checkTcpReady(port: number): Promise<PortCheckResponse> {
try {
const socket = await Bun.connect({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing timeout on TCP connection

Bun.connect() has no timeout. If port is in weird state (e.g., SYN sent but no SYN-ACK response), this could hang indefinitely. The HTTP check has a 5s timeout (line 473), but TCP doesn't.

Add timeout via Promise.race or Bun API if available.


// Check process status less frequently (every 3rd iteration) to reduce latency
if (checkCount % 3 === 0) {
const processInfo = await this.getProcess(processId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Process status check duration not accounted for

Port check duration is subtracted from sleep time (line 1512), but process status check duration isn't. Every 3rd iteration adds 50-200ms latency that isn't compensated.

Track both:

let operationDuration = 0;
if (checkCount % 3 === 0) {
  const start = Date.now();
  const processInfo = await this.getProcess(processId);
  operationDuration += Date.now() - start;
  // ...
}

const checkStart = Date.now();
const result = await this.client.ports.checkPortReady(checkRequest);
operationDuration += Date.now() - checkStart;

const sleepTime = Math.max(0, targetInterval - operationDuration);

if (checkCount % 3 === 0) {
const processInfo = await this.getProcess(processId);
if (
!processInfo ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fragile terminal status check

Manual list 'completed' || 'failed' || 'killed' will miss new terminal states. Also, should this include 'error' status?

Create type guard:

function isTerminalStatus(status: ProcessStatus): boolean {
  return status === 'completed' || status === 'failed' || status === 'killed' || status === 'error';
}

}
} else {
// Regex match
const match = text.match(pattern);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global regex flag not handled

User could pass /pattern/g which has stateful issues with lastIndex. While not reused here, it's fragile.

Strip global flag:

const safePattern = new RegExp(pattern.source, pattern.flags.replace('g', ''));
const match = text.match(safePattern);

claude[bot]

This comment was marked as outdated.

The `Process` object now includes methods to detect readiness based on port and log patterns, enhancing process management.
Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

All 4 issues from the previous review have been resolved. The implementation is solid:

Fixed issues:

  • ✅ Debounce timing bug resolved (reduced to 50ms with terminal checks)
  • ✅ TCP timeout added (5s matching HTTP)
  • ✅ Process status check duration now accounted for in sleep calculations
  • ✅ Terminal status check uses isTerminalStatus() type guard

Minor observations:

  • Debounce logic could be micro-optimized for very high-throughput scenarios, but current approach prioritizes correctness
  • Edge case exists where process exit detection can lag up to 1.5s (3 iterations × 500ms), but this is acceptable given the performance trade-offs

Test coverage is comprehensive - 770 lines of unit tests + 495 lines of E2E tests covering string/regex patterns, timeouts, exit handling, HTTP/TCP modes, pattern spanning chunks, and stderr detection.

Architecture compliance - follows SDK patterns correctly: proper error classes, types in shared package, no any types, client architecture pattern.

Looks good to merge.

agents-git-bot bot pushed a commit to cloudflare/cloudflare-docs that referenced this pull request Dec 4, 2025
Documents new waitForLog() and waitForPort() methods for detecting
process readiness. Updates Commands API reference and Background
Processes guide with practical examples.

Related to cloudflare/sandbox-sdk#273

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@ghostwriternr ghostwriternr merged commit 8cf6b2f into main Dec 5, 2025
8 of 9 checks passed
@ghostwriternr ghostwriternr deleted the process-readiness branch December 5, 2025 10:49
@github-actions github-actions bot mentioned this pull request Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants