Skip to content

perf/direct zarr io#92

Open
d-v-b wants to merge 64 commits intoEOPF-Explorer:mainfrom
d-v-b:perf/direct-zarr-io
Open

perf/direct zarr io#92
d-v-b wants to merge 64 commits intoEOPF-Explorer:mainfrom
d-v-b:perf/direct-zarr-io

Conversation

@d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Dec 8, 2025

  • eagerly compute multiscales
  • directly copy chunk bytes and metadata documents

@codecov-commenter
Copy link

codecov-commenter commented Dec 11, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 92.85714% with 19 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/eopf_geozarr/s2_optimization/s2_multiscale.py 88.88% 7 Missing ⚠️
src/eopf_geozarr/zarrio.py 93.54% 6 Missing ⚠️
src/eopf_geozarr/cli.py 0.00% 4 Missing ⚠️
src/eopf_geozarr/s2_optimization/s2_converter.py 97.64% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Comment on lines +1162 to +1164
s2_parser.add_argument(
"--omit-nodes", help="The names of groups or arrays to skip.", default="", type=str
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this argument solves #81. You would pass --omit-nodes "quality/l2a_quicklook" to omit that group

cc @emmanuelmathot

@d-v-b
Copy link
Contributor Author

d-v-b commented Dec 17, 2025

@emmanuelmathot the redundant multiscales calculation is now fixed, and the chunk sizes / sharding are now consistent with the design goal (use as few objects as possible).

On my local system, using dask for rechunking was much slower than what I am currently doing (plain assignment via zarr python).

@d-v-b
Copy link
Contributor Author

d-v-b commented Dec 17, 2025

since we are re-encoding the zarr groups here, I can also handle the NaN conversion in this branch, unless that's better in a separate branch @emmanuelmathot

@d-v-b
Copy link
Contributor Author

d-v-b commented Dec 17, 2025

with a1375b7 we have an option (defaulting to false) of allowing invalid values (nan and inf) in the output. When set to false (the default), any NaN or inf or -inf values in attributes field are replaced with string equivalents.

@emmanuelmathot
Copy link
Contributor

Just tested the last version of this PR: https://api.explorer.eopf.copernicus.eu/raster/collections/sentinel-2-l2a-staging/items/S2B_MSIL2A_20251115T091139_N0511_R050_T35SLU_20251115T111807/viewer
but multiscales are still missing at /measurements/reflectance group

@d-v-b
Copy link
Contributor Author

d-v-b commented Jan 5, 2026

i'll have a look later today!

@emmanuelmathot
Copy link
Contributor

@d-v-b it seems multiscales are working now but can you tell me wwhat is the fill value? It seems it has been changes (black part that should be transparent)
image

@emmanuelmathot
Copy link
Contributor

seems also that we are missing the fixedscaleoffset codec declaration. I dont understand how xarray resolves correctly the data values.
Example:
zarr.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants