Skip to content

Conversation

@keller-mark
Copy link
Contributor

@keller-mark keller-mark commented Jul 3, 2025

Fixes #799
The to_parquet function supports a geometry_encoding parameter. When geoarrow, it will be more efficient to read/parse the geometries, as the data can stay in its parquet/arrow memory layout during downstream usage. Visualization applications will benefit from this (and other applications such as data processing pipelines should too).

@codecov
Copy link

codecov bot commented Jul 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.20%. Comparing base (0731edd) to head (0637dac).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #951      +/-   ##
==========================================
+ Coverage   92.19%   92.20%   +0.01%     
==========================================
  Files          49       49              
  Lines        7561     7572      +11     
==========================================
+ Hits         6971     6982      +11     
  Misses        590      590              
Files with missing lines Coverage Δ
src/spatialdata/__init__.py 100.00% <100.00%> (ø)
src/spatialdata/_core/spatialdata.py 91.93% <100.00%> (ø)
src/spatialdata/_io/io_shapes.py 94.87% <100.00%> (+0.20%) ⬆️
src/spatialdata/config.py 100.00% <100.00%> (ø)
src/spatialdata/models/models.py 88.61% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@LucaMarconato
Copy link
Member

Hi @keller-mark, this is ready for review, correct?

@keller-mark keller-mark marked this pull request as ready for review January 5, 2026 13:48
@keller-mark
Copy link
Contributor Author

Yes, thanks for the reminder, I have just updated the status

@LucaMarconato
Copy link
Member

Thanks! Doing some light adjustments on the PR. I will push soon. Meanwhile you can find here a benchmark in Python for read-write operations with the new encoding. https://github.com/giovp/spatialdata-sandbox/blob/main/notebooks/benchmark_geoparquet_encoding.ipynb

Take home message: write operations are only slightly slower with geoarrow, but read operations are generally faster. The benchmark is done in pure geopandas and spatialdata. spatialdata has some overhead that disappears when the data is large.

@LucaMarconato
Copy link
Member

LucaMarconato commented Jan 5, 2026

Final changes are up. Key points:

  • I added some global settings to allow reducing the number of arguments to pass to functions.
  • I will keep the default as WKB. We can experiment and eventually change.
  • I keep the shapes format unchanged since APIs to read back share the same syntax
  • There is an edge case with geoarrow: mixed types force a coercion to the more general type. In practice this means that mixed columns polygons+multipolygon are written to disk as a column of multipolygon (see tests). Anyway, since downstream applications should expect the possibility of having multipolygons in the shapes layer, and in particular multipolygons containing a single polygon, this does not introduce a breaking change.

@LucaMarconato LucaMarconato merged commit 2794fb0 into scverse:main Jan 5, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Option to output GeoArrow-encoded parquet from geopandas

2 participants