perf: parallelize HashedPostStateSorted::from_reverts hashing/sorting #20148

Andrurachi · 2025-12-05T21:42:02Z

Summary

Introduces a parallelized path for HashedPostStateSorted::from_reverts using rayon, gated behind a new parallel-from-reverts feature flag.

This optimization targets CPU bottlenecks caused by deep reorgs or blocks with skewed storage distributions. It implements an Account Count Threshold (2,500) to ensure no regressions on small or standard blocks.

Motivation

Closes #20049

Processing large state reverts involves two distinct phases:

DB Cursor Walk (I/O Bound): Sequential reading of account/storage changesets.
Sorting & Hashing (CPU Bound): Organizing the raw data into a sorted state.

While the DB walk must remain sequential, the sorting phase becomes a bottleneck when thousands of accounts (or accounts with massive storage slots) need processing. This PR parallelizes the CPU-bound sorting phase, reducing wall-time for heavy blocks.

Implementation Details

Feature Flag: Added parallel-from-reverts (opt-in).
Strategy: Parallelize the outer loop (iterating over accounts) rather than the inner loop (sorting slots). Benchmarks proved that inner-loop parallelization (par_sort_unstable) introduced too much overhead for typical slot counts (< 1,000).
Threshold: A threshold of 2,500 Accounts was established via micro-benchmarking.
- < 2,500: Always Sequential (avoids Rayon overhead).
- >= 2,500: Parallel (distributes load, handles skewed accounts).
Logic Preservation: Preserves the existing eager hashing logic (hashing keys during collection) to ensure cache locality and consistency with the sequential path.

Benchmarks

1. Micro-Benchmarks (`sorting_par_exp`)

Measured purely the sorting/hashing overhead (excluding DB reads).

Scenario	Accounts	Slots/Acc	Seq Time	Par Time	Speedup
`Acc_Low`	100	4	8 µs	124 µs	-15x
`Acc_Med`	1,000	4	73 µs	122 µs	-0.6x
`Acc_High`	10,000	4	790 µs	612 µs	1.3x
`Skewed_10k`	10k	Mixed	64 ms	14 ms	4.6x

Skewed Distribution: 95% accounts have 4 slots, 5% have 2,000 slots.*

2. Integration Benchmarks (`integration_bench`)

Measured full lifecycle: DB Read -> Hashing -> Sorting -> Allocation.

Scenario	Description	Seq Time	Par Time	Delta
`Small`	1k Acc (Below Threshold)	61.2 ms	61.0 ms	0% (No Regression)
`Large_Uniform`	10k Acc (20 slots)	270 ms	267 ms	~1.2%
`Large_Skewed`	10k Acc (Whale Dist.)	1,439 ms	1,363 ms	~5.3%

Since the total runtime is dominated by DB I/O, this actually represents a solid optimization of the available CPU-bound work. The threshold ensures no regression for small blocks.

Checklist

Added rayon dependency (optional).
Implemented parallel-from-reverts feature flag.
Added benchmarks covering Uniform, Skewed, and Threshold scenarios.
Verified no regressions for inputs below the 2,500 account threshold.

crates/trie/db/src/state.rs

mediocregopher · 2025-12-09T12:09:08Z

@yongkangc let's rbc this as well to confirm there's no regression 🙏

yongkangc · 2025-12-10T02:23:04Z

hey @Andrurachi thanks for the pr, could u rebase / merge main in so that we can bench this as well?

the reason is because we recently introduced a breaking change to MerkleChangeSets that requires merge of main for benching

Andrurachi · 2025-12-10T04:46:58Z

@yongkangc I've rebased on main to bring the breaking changes in MerkleChangeSets. The PR is compiling and benchmarks are passing. I'm using cargo bench -p reth-trie-db --bench integration_bench and cargo bench -p reth-trie-db --bench integration_bench --features parallel-from-reverts to run benchmarks.

Regarding moving the feature flag to std: I tried replacing parallel-from-reverts with the standard default = ["std"] pattern, but I hit a wall of Zepter errors regarding feature propagation.

To avoid pushing broken config or making you wait more than needed, I've kept parallel-from-reverts for this push. If required, could you please provide some advise on the correct Cargo.toml setup to satisfy Zepter for this specific crate?

Andrurachi · 2025-12-11T02:10:04Z

Hey @yongkangc, quick question before I push a ready-to-merge commit:

Following the suggestion, I moved the parallelized path behind std instead of introducing a new feature flag. The sequential and parallel code paths are both active inside the std build depending on thresholds, and the old parallel-from-reverts flag has been removed.

One point to confirm: do you want the benchmarks comparing sequential vs parallel included in the final commit, or should they be removed before merge?

Also note: because running benches without std is not supported on this crate, I wasn’t able to exercise the >2500 sequential path under a true no_std configuration. The cfg(std) gating is in place and compiles, but I want to double-check if that is acceptable.

Happy to push immediately once I have your preference.

yongkangc · 2025-12-11T04:19:52Z

Hey @Andrurachi thanks for clarifying, I think that they should be removed from the merge so we don't have to maintain them

The results from this pr is good enough!

…hold

Parallelizes the hashing/sorting step using Rayon when account count exceeds a threshold (2500). This alleviates CPU bottlenecks during large state reverts or deep reorgs. Closes paradigmxyz#20049

Andrurachi · 2025-12-11T05:10:59Z

Got it, I’ve removed the benchmarks and moved the parallelized path behind std.
Thanks for the feedback!

Also, I really enjoyed working on this perf issue. If any similar performance-related tasks come up in the future, feel free to ping me. Happy to help :)

mediocregopher

LGTM 👍

yongkangc · 2025-12-12T01:57:08Z

@mediocregopher currently running benchmarks on this for base (r0) and mainnet (r4)

yongkangc · 2025-12-12T02:39:59Z

@Andrurachi as well in the future, feel free to grab from #17920 !

these are experiments meaning 1. they have impact that might be good / bad 2. might have a degree of challenge for implementation

yongkangc · 2025-12-12T04:37:27Z

Bench Results:

Base

Mainnet

cc @mediocregopher

yongkangc · 2025-12-12T04:42:34Z

@Andrurachi current thoughts right now is that the pr probably has scheduling overhead from parallelization. The tradeoff in complexity right now might not be worth the result.
However there could be more investigation to find the bottleneck to speed it up better. That requires more work in profiling and tracing to find out the bottleneck .

e.g
Random Sample of traces on mainnet:

feature: https://till-janet-manga-scientists.trycloudflare.com/trace/da5f76ba44a1a652971b3d47446b4a01

baseline: https://till-janet-manga-scientists.trycloudflare.com/trace/419e46433b6f6acc9fef8950a38a5f49

observation:
Multiproof::run for baseline is faster
save_cache is faster

thoughts? @mediocregopher @mattsse

Andrurachi requested review from Rjected, gakonst, mediocregopher and shekhirin as code owners December 5, 2025 21:42

github-project-automation bot added this to Reth Tracker Dec 5, 2025

github-project-automation bot moved this to Backlog in Reth Tracker Dec 5, 2025

Andrurachi force-pushed the perf-parallel-reverts-sorting-20049 branch from fad60a2 to b2011f6 Compare December 6, 2025 06:02

mediocregopher reviewed Dec 9, 2025

View reviewed changes

crates/trie/db/src/state.rs Outdated Show resolved Hide resolved

chore: add micro-benchmarks for sorting threshold optimization

04b2d38

Andrurachi force-pushed the perf-parallel-reverts-sorting-20049 branch from b2011f6 to 1843db5 Compare December 10, 2025 04:26

Andrurachi added 2 commits December 10, 2025 23:44

chore: add integration benchmark for sorting after implementing thres…

cc3fadf

…hold

perf(trie): parallelize HashedPostStateSorted::from_reverts

ec4334c

Parallelizes the hashing/sorting step using Rayon when account count exceeds a threshold (2500). This alleviates CPU bottlenecks during large state reverts or deep reorgs. Closes paradigmxyz#20049

Andrurachi force-pushed the perf-parallel-reverts-sorting-20049 branch from 1843db5 to ec4334c Compare December 11, 2025 04:59

mediocregopher approved these changes Dec 11, 2025

View reviewed changes

github-project-automation bot moved this from Backlog to In Progress in Reth Tracker Dec 11, 2025

yongkangc self-assigned this Dec 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: parallelize HashedPostStateSorted::from_reverts hashing/sorting #20148

perf: parallelize HashedPostStateSorted::from_reverts hashing/sorting #20148

Andrurachi commented Dec 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

mediocregopher commented Dec 9, 2025

Uh oh!

yongkangc commented Dec 10, 2025 •

edited

Loading

Uh oh!

Andrurachi commented Dec 10, 2025

Uh oh!

Andrurachi commented Dec 11, 2025

Uh oh!

yongkangc commented Dec 11, 2025

Uh oh!

Andrurachi commented Dec 11, 2025

Uh oh!

mediocregopher left a comment

Uh oh!

yongkangc commented Dec 12, 2025

Uh oh!

yongkangc commented Dec 12, 2025 •

edited

Loading

Uh oh!

yongkangc commented Dec 12, 2025

Uh oh!

yongkangc commented Dec 12, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perf: parallelize HashedPostStateSorted::from_reverts hashing/sorting #20148

Are you sure you want to change the base?

perf: parallelize HashedPostStateSorted::from_reverts hashing/sorting #20148

Conversation

Andrurachi commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Implementation Details

Benchmarks

1. Micro-Benchmarks (sorting_par_exp)

2. Integration Benchmarks (integration_bench)

Checklist

Uh oh!

Uh oh!

mediocregopher commented Dec 9, 2025

Uh oh!

yongkangc commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Andrurachi commented Dec 10, 2025

Uh oh!

Andrurachi commented Dec 11, 2025

Uh oh!

yongkangc commented Dec 11, 2025

Uh oh!

Andrurachi commented Dec 11, 2025

Uh oh!

mediocregopher left a comment

Choose a reason for hiding this comment

Uh oh!

yongkangc commented Dec 12, 2025

Uh oh!

yongkangc commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yongkangc commented Dec 12, 2025

Uh oh!

yongkangc commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Andrurachi commented Dec 5, 2025 •

edited

Loading

1. Micro-Benchmarks (`sorting_par_exp`)

2. Integration Benchmarks (`integration_bench`)

yongkangc commented Dec 10, 2025 •

edited

Loading

yongkangc commented Dec 12, 2025 •

edited

Loading

yongkangc commented Dec 12, 2025 •

edited

Loading