modflowpy · wpbonelli · Jan 26, 2026 · Jan 28, 2026 · Jan 28, 2026 · Jan 28, 2026
diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml
diff --git a/.github/workflows/codspeed.yml b/.github/workflows/codspeed.yml
@@ -0,0 +1,78 @@
+name: CodSpeed Benchmarks
+
+on:
+  push:
+    branches:
+      - develop
+      - main
+  pull_request:
+  workflow_dispatch:
+
+# Required for OIDC authentication
+permissions:
+  contents: read
+  actions: read
+  id-token: write
+
+jobs:
+  benchmarks:
+    # Benchmarks are sharded by functionality:
+    # - input-io: model/simulation input file I/O
+    # - output-io: model output file readers
+    # - pre-post: pre/postprocessing, grids, rasters, arrays, export
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        shard:
+          - name: "input-io"
+            files: >-
+              benchmark_mf6_input.py
+              benchmark_mf2005_input.py
+          - name: "output-io"
+            files: >-
+              benchmark_cellbudgetfile.py
+              benchmark_zonebudget.py
+              benchmark_mf6listbudget.py
+              benchmark_headfile.py
+              benchmark_headufile.py
+              benchmark_formattedfile.py
+              benchmark_pathlinefile.py
+              benchmark_endpointfile.py
+              benchmark_mtlistfile.py
+              benchmark_sfroutputfile.py
+              benchmark_ucnfile.py
+              benchmark_mflistbudget.py
+              benchmark_mfusglistbudget.py
+          - name: "pre-post"
+            files: >-
+              benchmark_gridintersect.py
+              benchmark_grids.py
+              benchmark_rasters.py
+              benchmark_arrays.py
+              benchmark_export.py
+              benchmark_postprocessing.py
+    name: "benchmarks (${{ matrix.shard.name }})"
+    steps:
+      - name: Checkout repo
+        uses: actions/checkout@v4
+
+      - name: Setup Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v4
+        with:
+          cache-dependency-glob: "**/pyproject.toml"
+
+      - name: Install FloPy with test dependencies
+        run: uv sync --all-extras
+
+      - name: Run benchmarks with CodSpeed
+        uses: CodSpeedHQ/action@v3
+        with:
+          run: cd autotest/benchmarks && uv run pytest ${{ matrix.shard.files }} --codspeed
+        env:
+          CODSPEED_SHARD_NAME: ${{ matrix.shard.name }}
diff --git a/DEVELOPER.md b/DEVELOPER.md
@@ -370,45 +370,81 @@ To allow optional separation of performance from correctness concerns, performan
 
 #### Benchmarking
 
-Any test function can be turned into a benchmark by requesting the `benchmark` fixture (i.e. declaring a `benchmark` argument), which can be used to wrap any function call. For instance:
+FloPy includes a benchmark suite to track performance over time and validate optimization efforts. Benchmarking focuses on I/O operations, data structure manipulations, and utility functions.
 
-```python
-def test_benchmark(benchmark):
-    def sleep_1s():
-        import time
-        time.sleep(1)
-        return True
+**Note**: Benchmarks test FloPy code performance only, not the runtime of the various executables FloPy drives.
+
+Benchmarks use [CodSpeed](https://codspeed.io) to automatically track performance in CI. Benchmarks written using `pytest-benchmark` syntax are compatible. Benchmarks are organized in `autotest/benchmarks/` by functional area.
+
+##### Running Benchmarks
+
+```bash
+# Run all benchmarks
+pytest autotest/benchmarks --benchmark-only
+
+# Run specific benchmark file
+pytest autotest/benchmarks/benchmark_io_mf6.py --benchmark-only
 
-    assert benchmark(sleep_1s)
+# Run benchmarks by marker
+pytest -m "benchmark and not slow" --benchmark-only
+
+# Save results to file
+pytest autotest/benchmarks --benchmark-only --benchmark-autosave
+
+# Compare against saved baseline
+pytest autotest/benchmarks --benchmark-only --benchmark-compare
 ```
 
-Arguments can be provided to the function as well:
+##### Writing Benchmarks
+
+Any test function can be turned into a benchmark by requesting the `benchmark` fixture:
 
 ```python
-def test_benchmark(benchmark):
-    def sleep_s(s):
-        import time
-        time.sleep(s)
-        return True
+@pytest.mark.benchmark
+def test_model_load_time(benchmark, function_tmpdir):
+    """
+    Benchmark model loading time.
+
+    Measures time to load a MODFLOW model from disk.
+    """
+    model = create_test_model(function_tmpdir)
+    model.write_input()
 
-    assert benchmark(sleep_s, 1)
+    benchmark(lambda: Modflow.load(f"{model.name}.nam", model_ws=function_tmpdir))
 ```
 
-Rather than alter an existing function call to use this syntax, a lambda can be used to wrap the call unmodified:
+**Best Practices:**
+- Use descriptive test names (e.g., `test_mf6_sim_load_large`, not `test_load1`)
+- Include docstrings explaining what is benchmarked and why
+- Use fixtures for setup (not timed)
+- Mark all benchmarks `@pytest.mark.benchmark`
+- Mark slow benchmarks with `@pytest.mark.slow`
+
+##### Advanced Usage
+
+Arguments can be provided to benchmark functions:
 
 ```python
-def test_benchmark(benchmark):
-    def sleep_s(s):
-        import time
-        time.sleep(s)
-        return True
+def test_benchmark_with_args(benchmark):
+    benchmark(some_function, arg1, arg2)
+```
+
+For fine-grained control over iterations and rounds:
 
-    assert benchmark(lambda: sleep_s(1))
+```python
+def test_benchmark_controlled(benchmark):
+    benchmark.pedantic(some_function, iterations=10, rounds=5)
 ```
 
-This can be convenient when the function call is complicated or passes many arguments.
+Lambda functions are convenient for wrapping complex calls:
+
+```python
+def test_complex_benchmark(benchmark):
+    result = benchmark(lambda: complex_function(many, different, args))
+    assert result is not None
+```
 
-Benchmarked functions are repeated several times (the number of iterations depending on the test's runtime, with faster tests generally getting more reps) to compute summary statistics. To control the number of repetitions and rounds (repetitions of repetitions) use `benchmark.pedantic`, e.g. `benchmark.pedantic(some_function(), iterations=1, rounds=1)`.
+##### Configuration
 
 Benchmarking is incompatible with `pytest-xdist` and is disabled automatically when tests are run in parallel. When tests are not run in parallel, benchmarking is enabled by default. Benchmarks can be disabled with the `--benchmark-disable` flag.
 

diff --git a/autotest/benchmarks/__init__.py b/autotest/benchmarks/__init__.py