-
Notifications
You must be signed in to change notification settings - Fork 46
process readiness #273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
process readiness #273
Conversation
…preview URL; added `waitFor` method to processes, allowing them to wait for specific conditions (log patterns or port availability)
🦋 Changeset detectedLatest commit: 2c70430 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
commit: |
🐳 Docker Images PublishedDefault (no Python): FROM cloudflare/sandbox:0.0.0-pr-273-bcab50cWith Python: FROM cloudflare/sandbox:0.0.0-pr-273-bcab50c-pythonVersion: Use the |
ghostwriternr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I absolutely love the overall DX win here. Before diving deep into the underlying changes, the biggest comment in my head is how many controls we offer the user. I'll use this snippet as reference:
// Pattern 1: Inline readiness - blocks until pattern appears (recommended)
const proc = await sandbox.startProcess('npm start', {
ready: 'Server listening on port 3000',
readyTimeout: 30000
});
// Pattern 2: Sequential waits - for multiple conditions
const proc = await sandbox.startProcess('npm start');
await proc.waitFor('Database connected');
await proc.waitFor(3000); // Wait for port 3000 to be available
// Pattern 3: Server shorthand - start, wait, and expose in one call
const { url, process } = await sandbox.serve('npm start', {
port: 3000,
hostname: 'example.com'
});I'm in favour of keeping only pattern 2, and not doing patterns 1 & 3.
- I want to avoid 3 because we already have too many methods to run commands on sandbox and I'd prefer to get everything working using the existing paradigms - which this PR already does. We've seen many questions from users already on when to use
execvsexecStreamvsstartProcess, and this might add to the dilemma. - Both 1 & 3 also offer lesser programmatic control overall, because there will certainly be cases where the user might want to look for a PORT to be healthy AND for a specific log to show up. By giving users primarily
.waitFor, we can let them chain any number of these (like in your pattern 2 example) and the flow can continue after all the conditions are met.
And that being said, I think we can also be a lot more explicit with the parameters for waitFor - mainly so it becomes very clear and obvious what should be specified. Determining the intent based on the parameter feels flaky. We could perhaps switch to:
a. Explicit waitForPort and waitForLog methods - I like this more
b. waitFor({ports: string[], logs: number[]}) - I like this less because it implicitly encourages adding everything in one go and doesn't give the same flexibility as:
const proc = await sandbox.startProcess('npm start');
await proc.waitForLog('DB startup complete');
await setupConnectionPool();
await proc.waitFor(3000);
startExecuting();What do you think of this? The TL;DR is wondering whether keeping this change a bit more narrow in terms of what we offer lets us provide a straightforward and more programmable approach by default.
Adds documentation for waitForLog() and waitForPort() methods that allow waiting for processes to be ready before proceeding. Updates examples throughout the docs to use these new simpler patterns instead of manual delays or streaming logs. Changes: - API reference: Document waitForLog() and waitForPort() on Process interface - Background processes guide: Show new readiness methods as primary pattern - Expose services guide: Update all examples to use waitForPort() Synced from cloudflare/sandbox-sdk#273 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Add documentation for new waitForLog() and waitForPort() methods added to the Process interface. These methods allow users to wait for processes to become ready before continuing execution. Changes: - Add Process method reference documentation in commands.mdx - Update background-processes guide with readiness checking examples - Show error handling for ProcessReadyTimeoutError - Demonstrate regex pattern matching for log analysis - Include timeout configuration examples Related to cloudflare/sandbox-sdk#273 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
packages/sandbox/src/index.ts
Outdated
| export type { | ||
| ProcessExitedBeforeReadyContext, | ||
| ProcessReadyTimeoutContext | ||
| } from './errors'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do end users need these?
packages/sandbox/src/sandbox.ts
Outdated
| const result = await this.exec( | ||
| `bash -c 'echo > /dev/tcp/localhost/${port}' 2>/dev/null`, | ||
| { timeout: execTimeoutMs } | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay these 3 lines are a bit deceptive but perhaps force us to think about this design some more. So when developers say waitForPort, they're basically saying - "Block until this port is serving requests properly".
That could mean different things, based on what developers are starting. For example, properly could mean a health endpoint returning 200, or a database is now accepting connections, or some other custom logic too. But the current check essentially just validates if something is bound and listening to the port or if a TCP handshake works, but all of this might happen much before the port is actually ready to be used.
And so we need to improve this check. But also, improve our interface to our now separate waitForPort method. What if, we define an interface like so:
interface WaitForPortOptions {
// Check mode (default: 'http')
mode?: 'http' | 'tcp';
// HTTP mode options
path?: string; // default '/' -> this is useful because even for the sandbox sdk, the health check port does not determine if the interpreter is up, due to delayed startup
status?: number; // default 200-399 -> in case the health check endpoint actually returns 201 or something
// Timing
timeout?: number; // default no timeout
interval?: number; // default 500
}For TCP, we could just do Bun.connect and for HTTP we can do a fetch
AND expose a third method waitForCommand that covers all advanced use-cases:
interface WaitForCommandOptions {
// this command is 'successful' if it exits with code 0
command: string;
// Timing
timeout?: number; // default no timeout
interval?: number; // default 500
}useful to do things like redis-cli ping | grep PONG. And we can even run this in the same session.
And most importantly - all of these ideally are implemented within sandbox-container, i.e. on the container side and exposed as new simple API(s) that the DO side polls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this PR though - we can just do waitForPort and keep waitForCommand as a future enhancement, to prevent scope creep. And might keep the PR clean too, because waitForCommand will use the session (although I don't think it'll be messy at all).
|
and changeset too pls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Claude Code Review
All previous review threads are resolved. The implementation is solid with good test coverage and proper error handling.
One critical issue in the pattern matching debouncing logic needs addressing. The other issues are less urgent but should be fixed.
Critical: Debouncing Race Condition
Pattern matching uses 100ms debouncing (sandbox.ts:1350), but if a pattern spans multiple chunks that arrive within 100ms, the final chunk may never trigger a check if no more data arrives afterward. The pending check at exit (lines 1387-1390) catches this, but only when the stream ends.
Scenario: Chunk 1 "Server listen" arrives, check fails. 50ms later, chunk 2 "ing on port 3000" arrives (pattern now complete), but debounce skips the check. If no more chunks arrive, the pattern won't be detected until stream ends (process exit).
Fix: Schedule a deferred check when debouncing:
if (now - lastCheckTime >= DEBOUNCE_MS) {
lastCheckTime = now;
pendingCheck = false;
const result = checkPattern();
if (result) return result;
} else if (pendingCheck) {
// Schedule check for when debounce window expires
const remaining = DEBOUNCE_MS - (now - lastCheckTime);
setTimeout(() => {
if (pendingCheck) {
const result = checkPattern();
if (result) /* need async handling here */
}
}, remaining);
}Or simplify by reducing debounce to 50ms and checking more frequently.
Other Issues
TCP port check missing timeout (port-service.ts:439): Bun.connect() has no timeout. If port is in weird state (SYN sent, no SYN-ACK), could hang.
Process status check latency (sandbox.ts:1481): Every 3rd iteration checks process status, but this duration isn't subtracted from sleep time like port check duration is (line 1512).
Terminal status check fragile (sandbox.ts:1483): Manual list 'completed' || 'failed' || 'killed' should use a type guard. Also missing 'error' status?
Global regex not handled (sandbox.ts:1547): User could pass /pattern/g which has stateful issues. Strip global flag or document restriction.
| pendingCheck = true; | ||
|
|
||
| // Debounce pattern matching - check at most every 100ms | ||
| const now = Date.now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Pattern may be missed if chunks arrive within debounce window
If pattern spans chunks that arrive <100ms apart and no more data follows, the pending check won't run until stream ends. Lines 1387-1390 catch this at exit, but the pattern won't be detected until process exits.
Schedule a deferred check when debouncing:
if (now - lastCheckTime >= DEBOUNCE_MS) {
lastCheckTime = now;
pendingCheck = false;
const result = checkPattern();
if (result) return result;
} else if (pendingCheck && !deferredCheckScheduled) {
const remaining = DEBOUNCE_MS - (now - lastCheckTime);
deferredCheckScheduled = true;
setTimeout(() => {
deferredCheckScheduled = false;
if (pendingCheck) {
pendingCheck = false;
const result = checkPattern();
// Handle async return
}
}, remaining);
}|
|
||
| private async checkTcpReady(port: number): Promise<PortCheckResponse> { | ||
| try { | ||
| const socket = await Bun.connect({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing timeout on TCP connection
Bun.connect() has no timeout. If port is in weird state (e.g., SYN sent but no SYN-ACK response), this could hang indefinitely. The HTTP check has a 5s timeout (line 473), but TCP doesn't.
Add timeout via Promise.race or Bun API if available.
|
|
||
| // Check process status less frequently (every 3rd iteration) to reduce latency | ||
| if (checkCount % 3 === 0) { | ||
| const processInfo = await this.getProcess(processId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Process status check duration not accounted for
Port check duration is subtracted from sleep time (line 1512), but process status check duration isn't. Every 3rd iteration adds 50-200ms latency that isn't compensated.
Track both:
let operationDuration = 0;
if (checkCount % 3 === 0) {
const start = Date.now();
const processInfo = await this.getProcess(processId);
operationDuration += Date.now() - start;
// ...
}
const checkStart = Date.now();
const result = await this.client.ports.checkPortReady(checkRequest);
operationDuration += Date.now() - checkStart;
const sleepTime = Math.max(0, targetInterval - operationDuration);
packages/sandbox/src/sandbox.ts
Outdated
| if (checkCount % 3 === 0) { | ||
| const processInfo = await this.getProcess(processId); | ||
| if ( | ||
| !processInfo || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fragile terminal status check
Manual list 'completed' || 'failed' || 'killed' will miss new terminal states. Also, should this include 'error' status?
Create type guard:
function isTerminalStatus(status: ProcessStatus): boolean {
return status === 'completed' || status === 'failed' || status === 'killed' || status === 'error';
}
packages/sandbox/src/sandbox.ts
Outdated
| } | ||
| } else { | ||
| // Regex match | ||
| const match = text.match(pattern); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Global regex flag not handled
User could pass /pattern/g which has stateful issues with lastIndex. While not reused here, it's fragile.
Strip global flag:
const safePattern = new RegExp(pattern.source, pattern.flags.replace('g', ''));
const match = text.match(safePattern);The `Process` object now includes methods to detect readiness based on port and log patterns, enhancing process management.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Claude Code Review
All 4 issues from the previous review have been resolved. The implementation is solid:
Fixed issues:
- ✅ Debounce timing bug resolved (reduced to 50ms with terminal checks)
- ✅ TCP timeout added (5s matching HTTP)
- ✅ Process status check duration now accounted for in sleep calculations
- ✅ Terminal status check uses
isTerminalStatus()type guard
Minor observations:
- Debounce logic could be micro-optimized for very high-throughput scenarios, but current approach prioritizes correctness
- Edge case exists where process exit detection can lag up to 1.5s (3 iterations × 500ms), but this is acceptable given the performance trade-offs
Test coverage is comprehensive - 770 lines of unit tests + 495 lines of E2E tests covering string/regex patterns, timeouts, exit handling, HTTP/TCP modes, pattern spanning chunks, and stderr detection.
Architecture compliance - follows SDK patterns correctly: proper error classes, types in shared package, no any types, client architecture pattern.
Looks good to merge.
Documents new waitForLog() and waitForPort() methods for detecting process readiness. Updates Commands API reference and Background Processes guide with practical examples. Related to cloudflare/sandbox-sdk#273 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
fixes this #270