Fix Reassure CI flakiness caused by noisy sub-10ms baselines by abzokhattab · Pull Request #745 · Expensify/react-native-onyx

abzokhattab · 2026-03-01T23:43:51Z

Explanation of Change

Root Cause Analysis

After PR #689 removed unstable_batchedUpdates, several Onyx functions dropped to sub-millisecond baselines (0.3-2ms). On shared CI runners (ubuntu-24.04-v4), system jitter alone introduces 10-15ms of variance per measurement. When a 0.5ms function measures at 12ms due to jitter, Reassure's Z-test correctly flags this as statistically significant — but it's noise, not a regression.

This manifests as two distinct failure modes:

Failure Mode 1: Delta check thresholds too strict
The stability check thresholds were raised to 20ms/40% in PR #727, but the delta check still used 10ms/20%. On shared CI, ~10-15ms of jitter is normal, so a 10ms absolute threshold catches noise as "regressions."

Failure Mode 2: Boolean('false') bug
IS_VALIDATING_STABILITY was parsed with Boolean(getInputOrEnv(...)). Since getInputOrEnv returns a string, Boolean('false') === true, causing delta check failures to be misreported as stability check failures. This made debugging harder by producing misleading error messages.

Changes

1. Raise delta check thresholds (reassurePerfTests.yml)
ALLOWED_DURATION_DEVIATION: 10 → 20ms
ALLOWED_RELATIVE_DURATION_DEVIATION: 20 → 40%

Matches the stability check thresholds already merged in PR #727. On shared CI, jitter alone accounts for ~10-15ms, so 10ms was too tight for the absolute threshold.

2. Fix Boolean('false') bug (validateReassureOutput.ts)

// Before (always true for any non-empty string):
const isValidatingStability = Boolean(getInputOrEnv('IS_VALIDATING_STABILITY'));

// After:
const isValidatingStability = getInputOrEnv('IS_VALIDATING_STABILITY') === 'true';

Fixed Issues

$ Expensify/App#80320
PROPOSAL:

Tests

Offline tests

QA Steps

// TODO: These must be filled out, or the issue title must include "[No QA]."

Verify that no errors appear in the JS console

PR Author Checklist

Screenshots/Videos

Android: Native

Android: mWeb Chrome

iOS: Native

iOS: mWeb Safari

MacOS: Chrome / Safari

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 53e3ca2a7e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-02T00:03:13Z

.github/actions/javascript/validateReassureOutput/validateReassureOutput.ts

+            const effectiveAbsoluteThreshold = baselineDuration < MIN_BASELINE_FOR_DEFAULT_THRESHOLD_MS ? 100 : allowedDurationDeviation;
+
+            const isMeasurementRelevant = Math.abs(durationDeviation) > effectiveAbsoluteThreshold;


Keep sub-10ms checks from bypassing real regressions

Using a hardcoded 100 ms absolute threshold for every benchmark with baseline.meanDuration < 10 causes large slowdowns to be ignored before the relative check runs. In this path, a measurement can regress from single-digit milliseconds to tens of milliseconds (for example 8ms→80ms, +900%) and still be treated as not relevant, so the performance gate no longer protects many of the fastest code paths that recently moved under 10ms.

Useful? React with 👍 / 👎.

Yeahh @abzokhattab I dont think we should do 3. Adaptive absolute threshold for sub-10ms baselines (validateReassureOutput.ts) for now. Changes 1 and 2 make sense to me however.

@fabioh8010 i just reverted it but looking at some of the failed pipleines i see that some them were hitting more %1000 so i think that still could occur after merging ... what do you think?

Run ID Test Deviation Baseline URL

22388332008 doAllCollectionItemsBelongToSameParent 22.44ms (1838%) ~1ms https://github.com/Expensify/react-native-onyx/actions/runs/22388332008/job/64803831835

22388332008 isValidNonEmptyCollectionForMerge 10.84ms (1193%) ~1ms same run

22388863959 doAllCollectionItemsBelongToSameParent 23.46ms (2237%) ~1ms https://github.com/Expensify/react-native-onyx/actions/runs/22388863959/job/64805540337

22388863959 isValidNonEmptyCollectionForMerge 10.63ms (1171%) ~1ms same run

The issue is that by allowing so big threshold we would also mask real regresssions if they happen in these funcitions

fabioh8010 · 2026-03-02T08:44:48Z

@abzokhattab After addressing comments could you run npm run gh-actions-build again? I'm getting different output from yours.

Two changes: 1. Raise delta check thresholds from 10ms/20% to 20ms/40% to match stability check thresholds — shared CI runners introduce ~10-15ms of jitter which makes the previous 10ms absolute threshold too strict. 2. Fix Boolean('false') bug in IS_VALIDATING_STABILITY parsing — the string 'false' was coerced to true, causing delta check failures to be misreported as stability check failures.

abzokhattab · 2026-03-03T14:10:20Z

Done

abzokhattab requested a review from a team as a code owner March 1, 2026 23:43

melvin-bot bot assigned abzokhattab Mar 1, 2026

melvin-bot bot requested review from arosiclair and removed request for a team March 1, 2026 23:44

abzokhattab changed the title ~~Fix Reassure CI flakiness caused by noisy sub-10ms baselines~~ WIP: Fix Reassure CI flakiness caused by noisy sub-10ms baselines Mar 1, 2026

abzokhattab marked this pull request as draft March 1, 2026 23:45

abzokhattab force-pushed the abzokhattab/fix-reassure-ci-flakiness branch 2 times, most recently from 9fd08b2 to 53e3ca2 Compare March 1, 2026 23:57

abzokhattab marked this pull request as ready for review March 1, 2026 23:59

abzokhattab changed the title ~~WIP: Fix Reassure CI flakiness caused by noisy sub-10ms baselines~~ Fix Reassure CI flakiness caused by noisy sub-10ms baselines Mar 1, 2026

abzokhattab mentioned this pull request Mar 2, 2026

[$375] [Onyx] Investigate Reassure stability on CI Expensify/App#80320

Open

chatgpt-codex-connector bot reviewed Mar 2, 2026

View reviewed changes

abzokhattab force-pushed the abzokhattab/fix-reassure-ci-flakiness branch from 53e3ca2 to da22e40 Compare March 3, 2026 00:36

abzokhattab force-pushed the abzokhattab/fix-reassure-ci-flakiness branch from da22e40 to e21e635 Compare March 3, 2026 00:40

fabioh8010 approved these changes Mar 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Reassure CI flakiness caused by noisy sub-10ms baselines#745

Fix Reassure CI flakiness caused by noisy sub-10ms baselines#745
abzokhattab wants to merge 1 commit intoExpensify:mainfrom
abzokhattab:abzokhattab/fix-reassure-ci-flakiness

abzokhattab commented Mar 1, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 2, 2026

Uh oh!

fabioh8010 Mar 2, 2026

Uh oh!

abzokhattab Mar 3, 2026 •

edited

Loading

Uh oh!

fabioh8010 Mar 3, 2026

Uh oh!

fabioh8010 commented Mar 2, 2026

Uh oh!

abzokhattab commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		const effectiveAbsoluteThreshold = baselineDuration < MIN_BASELINE_FOR_DEFAULT_THRESHOLD_MS ? 100 : allowedDurationDeviation;

		const isMeasurementRelevant = Math.abs(durationDeviation) > effectiveAbsoluteThreshold;

Run ID	Test	Deviation	Baseline	URL
22388332008	`doAllCollectionItemsBelongToSameParent`	22.44ms (1838%)	~1ms	https://github.com/Expensify/react-native-onyx/actions/runs/22388332008/job/64803831835
22388332008	`isValidNonEmptyCollectionForMerge`	10.84ms (1193%)	~1ms	same run
22388863959	`doAllCollectionItemsBelongToSameParent`	23.46ms (2237%)	~1ms	https://github.com/Expensify/react-native-onyx/actions/runs/22388863959/job/64805540337
22388863959	`isValidNonEmptyCollectionForMerge`	10.63ms (1171%)	~1ms	same run

Conversation

abzokhattab commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Explanation of Change

Root Cause Analysis

Changes

Fixed Issues

Tests

Offline tests

QA Steps

PR Author Checklist

Screenshots/Videos

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

fabioh8010 Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

abzokhattab Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fabioh8010 Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

fabioh8010 commented Mar 2, 2026

Uh oh!

abzokhattab commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

abzokhattab commented Mar 1, 2026 •

edited

Loading

abzokhattab Mar 3, 2026 •

edited

Loading