feat(backend): add database stress-test seed script | ENG-203#300
feat(backend): add database stress-test seed script | ENG-203#300
Conversation
… docs - Add seed-stress-test.ts with tiered data generation (small/medium/large) - Seed workflows, versions, runs, traces, node I/O, schedules, webhooks, human input requests, artifacts, agent traces, MCP data, secrets, API keys - Ensure temporal consistency across all entity relationships - Include load testing plan and first audit report - Add load-audit skill for automated frontend performance audits Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4d1ef31e6c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| // Delete in reverse FK order | ||
| const deletions = [ | ||
| `DELETE FROM agent_trace_events WHERE workflow_run_id IN (SELECT run_id FROM workflow_runs WHERE organization_id = '${ORG_ID}')`, |
There was a problem hiding this comment.
Parameterize organization id in cleanup/update SQL
SEED_ORG_ID is read from the environment and then interpolated directly into raw SQL (for example in the cleanup DELETE statements and the later UPDATE workflows ... organization_id = '${ORG_ID}'), so an org id containing a quote will break the script and a crafted value can broaden the affected rows beyond the intended tenant. This turns a seeding utility into a potential destructive query path; bind parameters (or at least sqlVal) should be used consistently for ORG_ID in all raw SQL strings.
Useful? React with 👍 / 👎.
|
|
||
| for (const run of agentRuns) { | ||
| const agentNodeRefs = run.nodeRefs.slice(0, Math.max(1, Math.floor(run.nodeRefs.length / 3))); | ||
| const agentRunId = `agent_${shortUUID()}`; |
There was a problem hiding this comment.
Generate collision-resistant agent run ids
agentRunId is currently built from shortUUID() (agent_${8-hex}), which only provides ~32 bits of entropy; in medium/large seeds this can collide, and the agent trace API reads rows by agent_run_id alone, so collisions merge events from different workflow runs and return incorrect metadata/transcripts. Using a full UUID (or including the workflow run id) avoids silent cross-run data corruption in load-test datasets.
Useful? React with 👍 / 👎.
Summary
backend/scripts/seed-stress-test.tswith tiered data generation (small/medium/large) for stress testing all major database tablesKey design decisions
randomDate()for related entities)--cleanflag or automatic cleanup before re-seeding prevents data accumulationTest plan
bun backend/scripts/seed-stress-test.ts --tier smalland verify data in DBbun backend/scripts/seed-stress-test.ts --tier small --cleanto verify cleanup