Skip to content

feat: block Pruning#2984

Merged
julienrbrt merged 44 commits intomainfrom
pierrick/prunning
Feb 13, 2026
Merged

feat: block Pruning#2984
julienrbrt merged 44 commits intomainfrom
pierrick/prunning

Conversation

@pthmas
Copy link
Contributor

@pthmas pthmas commented Jan 16, 2026

Overview

Closes: #3063
Closes: #2093

  • Adds a height-based pruning mechanism to the ev-node store:
    • Prunes /h/{height}, /d/{height}, /c/{height}, /i/{hash} up to a target height.
    • Also prunes per-height DA metadata (height → DA height mapping) for pruned heights.
    • Leaves /s/{height} (state), /t (current height), and global metadata keys untouched.
    • Tracks progress via a last-pruned-block-height metadata key so pruning is monotonic and idempotent.
  • Integrates pruning into the DA inclusion loop:
    • As DA inclusion advances, at configurable intervals, the submitter computes a prune target and calls store.PruneBlocks.
    • The prune target is capped by both:
      • the DA-included height (DAIncludedHeightKey), so only DA-included blocks are ever pruned, and
      • the current store height, to avoid pruning beyond what we actually have locally.
    • If there are not enough fully DA-included blocks beyond PruningKeepRecent, pruning is skipped.
  • Adds pruning for execution metadata:
    • Execution stores (EVM, ev-abci, etc.) expose PruneExecMeta to delete per-height ExecMeta entries up to the same target height.
    • The executor implements an optional ExecMetaPruner interface, and the submitter calls it from the DA inclusion pruning hook.
    • EV-ABCI PR: feat: pruning ev-abci#332
  • Optimizes header/data tail lookups:
    • The StoreAdapter’s Tail() now uses the last-pruned-block-height metadata to skip fully pruned ranges instead of linearly scanning from genesis.
    • This keeps tail lookups efficient even after long-term pruning.

Config

New node config fields / flags:

  • --evnode.node.pruning_enabled
  • --evnode.node.pruning_keep_recent
  • --evnode.node.pruning_interval

Pruning actually runs only when all three are set to non-trivial values (enabled, keep_recent > 0, interval > 0).

Design trade-offs

  • Best-effort pruning:
    If pruning fails (either store or exec metadata), we log the error but do not halt DA inclusion. This avoids stopping the node but means disk usage can grow silently if pruning keeps failing; operators will need to watch logs/metrics.

  • Scope:
    This PR prunes:

    • ev-node’s own block store (headers, data, signatures, hash index, and per-height DA metadata),
    • execution metadata via the ExecMetaPruner hook when available.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 16, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedFeb 13, 2026, 5:21 PM

@claude
Copy link
Contributor

claude bot commented Jan 16, 2026

Claude finished @julienrbrt's task in 2m 38s —— View job


Comprehensive Code Review Complete

I've completed a thorough code review of the block pruning implementation. Here's my detailed analysis:

Summary

This PR successfully implements height-based pruning for ev-node's block store and execution metadata. The overall design is solid and follows Go best practices. However, there are several areas that need attention before merging, particularly around error handling, batch size safety, and test coverage.


Critical Issues

1. Batch Size Safety Concern (block/internal/pruner/pruner.go:168)

The calculateBatchSize() uses min(blocksPerInterval*4, 10000) which provides an upper bound, but there's a potential issue:

return min(blocksPerInterval*4, 10000)

Problem: For very fast block times (e.g., 10ms blocks with 15min interval), this gives ~360,000 blocks * 4 = 1.44M blocks capped at 10k. However, if blockTime is 0 or very small, blocksPerInterval could overflow or become extremely large.

Recommendation: Add bounds checking:

if p.blockTime == 0 {
    return 10000 // safe default
}
blocksPerInterval := uint64(p.cfg.Interval.Duration / p.blockTime)
if blocksPerInterval > 10000 { // prevent overflow
    return 10000
}
return min(blocksPerInterval*4, 10000)

Fix this →

2. Error Return Value Issue (block/internal/pruner/pruner.go:147-149)

if p.execPruner != nil {
    if err := p.execPruner.PruneExec(p.ctx, batchEnd); err != nil && !errors.Is(err, ds.ErrNotFound) {
        return err  // ❌ Returns error but block pruning already succeeded
    }
}

Problem: If store.PruneBlocks succeeds (line 141-144) but execPruner.PruneExec fails (line 147-149), the function returns an error even though block data was already pruned. This creates an inconsistent state where blocks are pruned but exec metadata isn't, and the next pruning iteration won't retry those blocks.

Recommendation: Either:

  1. Make pruning atomic (prune exec first, then blocks), or
  2. Log the error but don't return it (matching the PR's "best-effort" design), or
  3. Track exec pruning failures separately for retry

Fix this →

3. Inconsistent Metadata Key Heights (block/internal/pruner/pruner.go:236-238)

lastPrunedBlockHeight := binary.LittleEndian.Uint64(lastPrunedBlockHeightBz)
if lastPrunedBlockHeight == 0 {
    return 0, fmt.Errorf("invalid last pruned block height")
}

Problem: This treats height 0 as invalid, but height 0 is a valid "not yet pruned" state. The function correctly returns 0 for ErrNotFound (line 228), but then rejects 0 from stored data.

Recommendation: Remove this check or only validate that the length is correct:

lastPrunedBlockHeight := binary.LittleEndian.Uint64(lastPrunedBlockHeightBz)
return lastPrunedBlockHeight, nil

Fix this →


High Priority Issues

4. Missing Context Usage (pkg/store/store.go:349-434)

The PruneBlocks function takes a context.Context parameter but never checks ctx.Done() during the potentially long-running loop (lines 375-422). For large batch sizes (10k blocks), this could take significant time.

Recommendation: Add periodic context checks:

for h := lastPruned + 1; h <= height; h++ {
    if (h-lastPruned)%100 == 0 { // Check every 100 blocks
        select {
        case <-ctx.Done():
            return fmt.Errorf("pruning interrupted at height %d: %w", h, ctx.Err())
        default:
        }
    }
    // ... existing pruning logic
}

Fix this →

5. Silent Failures in Best-Effort Pruning (block/internal/pruner/pruner.go:92-98)

case config.PruningModeAll:
    if err := p.pruneBlocks(); err != nil {
        p.logger.Error().Err(err).Msg("failed to prune blocks")
    }

Problem: Pruning failures are only logged at Error level. Operators have no metrics, no alerting signals, and no way to know pruning is failing until disk fills up. This violates the "fail loud" principle for operational systems.

Recommendations:

  1. Add a pruning failure metric that operators can alert on
  2. Consider exponential backoff to avoid log spam
  3. Add a "time since last successful prune" metric
  4. Log progress/success cases at Info level for visibility

Fix this →

6. Race Condition in Height Tracking (block/internal/pruner/pruner.go:189-191)

if lastPrunedBlock, err := p.getLastPrunedBlockHeight(p.ctx); err == nil && lastPrunedBlock > lastPrunedState {
    lastPrunedState = lastPrunedBlock
}

Problem: This reads lastPrunedBlock to initialize lastPrunedState, but if pruneBlocks() and pruneMetadata() run concurrently (different goroutines with same pruner instance), this could lead to inconsistent state. While the current code only runs one mode at a time (L90-99), the code structure doesn't prevent future bugs.

Recommendation: Add documentation or assertions about single-threaded execution, or use a mutex if concurrent pruning is ever needed.

7. Memory Efficiency in Large Prunes (pkg/store/store.go:375-422)

The pruning loop builds a batch with potentially thousands of deletions (10k blocks * 5 keys each = 50k operations) before committing. This could cause memory pressure.

Recommendation: Consider committing in smaller sub-batches:

const batchCommitSize = 1000
for h := lastPruned + 1; h <= height; h++ {
    // ... existing deletion logic ...
    
    if (h-lastPruned)%batchCommitSize == 0 {
        if err := batch.Commit(ctx); err != nil {
            return fmt.Errorf("failed to commit sub-batch at height %d: %w", h, err)
        }
        batch, err = s.db.Batch(ctx)
        if err != nil {
            return fmt.Errorf("failed to create new batch: %w", err)
        }
    }
}

Code Quality & Maintainability

8. Magic Numbers (block/internal/pruner/pruner.go:168)

return min(blocksPerInterval*4, 10000)

Issue: The multiplier 4 and cap 10000 are unexplained magic numbers.

Recommendation: Make these configurable or document as constants:

const (
    // PruningCatchupMultiplier determines how fast we catch up when behind.
    // A value of 4 means we prune 3x faster than blocks are produced.
    PruningCatchupMultiplier = 4
    
    // MaxPruningBatchSize caps batch size to prevent excessive memory/disk usage
    MaxPruningBatchSize = 10_000
)

9. Key Length Assumption (pkg/store/store.go:362-367)

} else if len(meta) == heightLength {
    lastPruned, err = decodeHeight(meta)
    if err != nil {
        return fmt.Errorf("failed to decode last pruned height: %w", err)
    }
}

Good: The code validates len(meta) == heightLength before decoding, which prevents panics.

However: This silently treats invalid-length metadata as "no data" (lastPruned=0). Consider logging a warning for unexpected metadata formats to catch data corruption early.

10. Error Message Quality (pkg/store/store.go:383)

return fmt.Errorf("failed to get header at height %d during pruning: %w", h, err)

Good: Error messages include context (height, operation). This makes debugging much easier.

11. Comment Quality (execution/evm/store.go:148-150)

// PruneExec removes ExecMeta entries up to and including the given height.
// It is safe to call this multiple times with the same or increasing heights;
// previously pruned ranges will be skipped based on the last-pruned marker.

Excellent: Clear documentation of idempotency guarantees and safety properties.


Testing & Coverage

12. Insufficient Test Coverage (29.23% patch coverage per codecov)

Missing test cases:

  1. pruner.go (82 missing lines):

    • No tests for pruneBlocks() function
    • No tests for calculateBatchSize() with edge cases (zero blockTime, overflow)
    • No tests for concurrent pruning attempts
    • No tests for pruning when DAIncludedHeightKey is missing/invalid
    • No tests for partial failure (store succeeds, exec fails)
  2. store.go (38 missing lines):

    • No tests for PruneBlocks() with very large height ranges
    • No tests for context cancellation during pruning
    • No tests for batch commit failures mid-pruning
    • No tests verifying metadata pruning (GetHeightToDAHeightHeaderKey)
  3. store_adapter.go (5 missing lines):

    • Edge case in Tail() when pruning metadata exists but is corrupted

Recommendations:

  • Add comprehensive unit tests for pruner.go:pruneBlocks()
  • Add integration tests that prune 10k+ blocks and verify performance
  • Add failure injection tests (DB errors during pruning)
  • Add tests for concurrent access patterns
  • Target at least 70% coverage for the pruning package

Fix this →


Security Considerations

13. No DOS Protection

Issue: A misconfigured node with pruning_keep_recent=1 and pruning_interval=1s would create excessive disk I/O, potentially DOSing itself.

Recommendation: Add sensible validation in config.go:320-342:

const MinKeepRecent = 100 // Keep at least 100 blocks for safety
const MinPruningInterval = 1 * time.Minute

if c.KeepRecent < MinKeepRecent {
    return fmt.Errorf("pruning_keep_recent must be >= %d for safety", MinKeepRecent)
}
if c.Interval.Duration < MinPruningInterval {
    return fmt.Errorf("pruning_interval must be >= %v to avoid excessive I/O", MinPruningInterval)
}

14. Prune Before DA Inclusion Safety (block/internal/pruner/pruner.go:108-115)

currentDAIncludedBz, err := p.store.GetMetadata(p.ctx, store.DAIncludedHeightKey)
if err == nil && len(currentDAIncludedBz) == 8 {
    currentDAIncluded = binary.LittleEndian.Uint64(currentDAIncludedBz)
} else {
    // if we cannot get the current DA height, we cannot safely prune, so we skip pruning until we can get it.
    return nil
}

Good: The code correctly refuses to prune if DA included height is unavailable. This prevents accidentally pruning blocks that haven't been included in DA yet.

However: The silent return nil means operators won't know why pruning isn't happening. Consider logging at Warn level when this condition occurs.


Performance Considerations

15. Tail Lookup Optimization (pkg/store/store_adapter.go:407-414)

Excellent: The PR optimizes Tail() lookups by using LastPrunedBlockHeightKey to skip fully-pruned ranges:

if lastPrunedBlockHeightBz, err := a.store.GetMetadata(ctx, LastPrunedBlockHeightKey); err == nil && len(lastPrunedBlockHeightBz) == heightLength {
    if lastPruned, err := decodeHeight(lastPrunedBlockHeightBz); err == nil {
        if candidate := lastPruned + 1; candidate > startHeight {
            startHeight = candidate
        }
    }
}

This is a smart optimization that prevents O(n) scans after pruning.

16. Batch Size Calculation (block/internal/pruner/pruner.go:156-169)

The 4x multiplier approach is clever - it ensures pruning catches up 3x faster than block production when behind. However, consider making this configurable for operators who want to tune the catch-up speed vs. I/O impact tradeoff.


Documentation

17. Missing Operational Docs

While the PR adds config documentation in docs/learn/config.md:298+, it's missing:

  • Recommended values for different chain speeds
  • Disk space savings expectations
  • Recovery procedures if pruning fails
  • How to monitor pruning health
  • Performance impact guidance

Recommendation: Add an operational guide section covering these topics.

18. Interface Documentation (core/execution/execution.go:165-173)

Excellent: The ExecPruner interface is well-documented:

// ExecPruner is an optional interface that execution clients can implement
// to support height-based pruning of their execution metadata.
type ExecPruner interface {
    // PruneExec should delete execution metadata for all heights up to and
    // including the given height. Implementations should be idempotent and track
    // their own progress so that repeated calls with the same or decreasing
    // heights are cheap no-ops.
    PruneExec(ctx context.Context, height uint64) error
}

Clear idempotency and safety guarantees.


Architecture & Design

19. Best-Effort Design Choice

Observation: The PR uses "best-effort" pruning where failures are logged but don't halt the node (per PR description).

Trade-offs:

  • Pro: Node stays operational even if pruning fails
  • Pro: Simpler error handling
  • Con: Silent disk space exhaustion if pruning consistently fails
  • Con: No backpressure to fix pruning issues

Recommendation: This is acceptable for v1, but add metrics/alerts so operators know when pruning is failing.

20. Pruning Modes Design (pkg/config/config.go:302-312)

const (
    PruningModeDisabled = "disabled"
    PruningModeAll      = "all"
    PruningModeMetadata = "metadata"
)

Good: Three clear modes give operators flexibility.

Question: What's the use case for metadata mode (pruning state but keeping blocks)? Consider documenting this in config.go or removing if unused.


Minor Issues

21. Typo in Comment (pkg/store/store_adapter.go:437)

// shoud never happen

Should be "should". While minor, these add up.

22. Inconsistent Error Handling Style

Some places wrap errors with context:

return fmt.Errorf("failed to get store height for pruning: %w", err)

Others don't:

return err

Recommendation: Be consistent - always add context to wrapped errors for better debugging.

23. Key Size Optimization (pkg/store/keys.go:27-31)

Good: The PR shortened metadata keys:

LastPrunedBlockHeightKey = "lst-prnd-b"
LastPrunedStateHeightKey = "lst-prnd-s"

This reduces storage overhead. Good attention to detail.


Positive Highlights

  1. Idempotency: Excellent tracking of last-pruned-height enables safe retries
  2. Batching: Smart use of batched deletes for performance
  3. Safety: Correct check to only prune DA-included blocks
  4. Modularity: Clean separation of concerns (pruner, store, execution)
  5. Testing: Good test structure in pruner_test.go and store_test.go
  6. Interface Design: Optional ExecPruner interface is well-designed
  7. Error Context: Most errors include useful debugging context

Recommendations Summary

Before Merging:

  1. ✅ Fix critical issues ABCI interface #1-3 (batch size safety, error handling, metadata validation)
  2. ✅ Add context cancellation to PruneBlocks loop (Fraud proofs gossiping #4)
  3. ✅ Add metrics for pruning success/failure (Mempool #5)
  4. ✅ Increase test coverage to at least 70% (Initial project setup #12)
  5. ✅ Add DOS protection validation (Drop-in replacement of Tendermint Node #13)

Post-Merge:

  • Add operational documentation (Add peer discovery #17)
  • Consider making batch catch-up multiplier configurable (Node address format #16)
  • Add telemetry/tracing for pruning operations
  • Monitor in production and tune batch sizes based on real-world performance

Conclusion

This is a well-structured implementation of block pruning with good architectural choices. The main concerns are around error handling edge cases, test coverage, and operational observability. With the critical issues addressed, this will be a valuable feature for long-running nodes.

Recommendation: Request changes to address critical issues #1-5 and #12 before approval.


@pthmas pthmas changed the title Block Prunning feat: block Prunning Jan 26, 2026
@pthmas pthmas changed the title feat: block Prunning feat: block Pruning Jan 26, 2026
@pthmas pthmas force-pushed the pierrick/prunning branch from 3e1e8e0 to ed66fe6 Compare January 26, 2026 15:01
@pthmas pthmas force-pushed the pierrick/prunning branch 7 times, most recently from 26628f5 to 4d65b1e Compare January 28, 2026 15:41
@codecov
Copy link

codecov bot commented Jan 28, 2026

Codecov Report

❌ Patch coverage is 27.53036% with 179 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.25%. Comparing base (6a7c84c) to head (4419111).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
block/internal/pruner/pruner.go 27.11% 78 Missing and 8 partials ⚠️
pkg/store/store.go 22.44% 25 Missing and 13 partials ⚠️
pkg/store/tracing.go 0.00% 21 Missing ⚠️
pkg/config/config.go 31.57% 11 Missing and 2 partials ⚠️
block/components.go 47.05% 5 Missing and 4 partials ⚠️
pkg/store/cached_store.go 0.00% 7 Missing ⚠️
pkg/store/store_adapter.go 54.54% 4 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2984      +/-   ##
==========================================
- Coverage   61.89%   61.25%   -0.64%     
==========================================
  Files         111      112       +1     
  Lines       11113    11355     +242     
==========================================
+ Hits         6878     6956      +78     
- Misses       3496     3630     +134     
- Partials      739      769      +30     
Flag Coverage Δ
combined 61.25% <27.53%> (-0.64%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@pthmas pthmas force-pushed the pierrick/prunning branch 4 times, most recently from 7aef549 to 4d31269 Compare February 2, 2026 17:35
@pthmas pthmas marked this pull request as ready for review February 2, 2026 18:38
@pthmas pthmas force-pushed the pierrick/prunning branch from 3e07144 to 26d0327 Compare February 3, 2026 19:07
@pthmas pthmas force-pushed the pierrick/prunning branch 2 times, most recently from 94f37d2 to ddc064a Compare February 4, 2026 12:31
@julienrbrt julienrbrt self-assigned this Feb 12, 2026
Copilot AI and others added 8 commits February 12, 2026 15:31
* Initial plan

* Add recovery history retention pruning

Co-authored-by: julienrbrt <29894366+julienrbrt@users.noreply.github.com>

* feat: add configurable recovery history retention

Co-authored-by: julienrbrt <29894366+julienrbrt@users.noreply.github.com>

* Refactor recovery pruning into pruner component

Co-authored-by: julienrbrt <29894366+julienrbrt@users.noreply.github.com>

* Address pruner review feedback

Co-authored-by: julienrbrt <29894366+julienrbrt@users.noreply.github.com>

* Update evm test module dependency

Co-authored-by: julienrbrt <29894366+julienrbrt@users.noreply.github.com>

* Address code review feedback

Co-authored-by: julienrbrt <29894366+julienrbrt@users.noreply.github.com>

* Refine pruner checks and docs

Co-authored-by: julienrbrt <29894366+julienrbrt@users.noreply.github.com>

* Clarify PruneExecMeta comment

Co-authored-by: julienrbrt <29894366+julienrbrt@users.noreply.github.com>

* Rename recovery history setting

Co-authored-by: julienrbrt <29894366+julienrbrt@users.noreply.github.com>

* Adjust pruner interval and defaults

Co-authored-by: julienrbrt <29894366+julienrbrt@users.noreply.github.com>

* fixes

* updates

* updates

* updates

* comment

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: julienrbrt <29894366+julienrbrt@users.noreply.github.com>
Co-authored-by: Julien Robert <julien@rbrt.fr>

When pruning is enabled, the pruner runs at the configured interval and removes data beyond the retention window (`pruning_keep_recent`). The system uses intelligent batching to avoid overwhelming the node:

- **Batch sizes are automatically calculated** based on your `pruning_interval` and `block_time`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in testing was there any degradation to the performance of the node when pruning?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i restored a snapshot and tried against Eden. there was only at the beginning.
I am still testing. I can post pprof before/after and system usage. it won't be a go benchmark.

Copy link
Contributor

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks good, it would still be good to see benchmarks or have some numbers on how/if this slows down operation of node

Copy link
Contributor

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good now, thank you

@julienrbrt julienrbrt enabled auto-merge February 13, 2026 17:25
@julienrbrt julienrbrt added this pull request to the merge queue Feb 13, 2026
Merged via the queue into main with commit b016bed Feb 13, 2026
64 of 70 checks passed
@julienrbrt julienrbrt deleted the pierrick/prunning branch February 13, 2026 19:22
@github-actions
Copy link
Contributor

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-02-13 19:23 UTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Storage Optimization [EPIC]: Node Pruning

5 participants