Skip to content

DataArray Inheritance from Parent Dataset for Zarr Conventions #888

@emmanuelmathot

Description

@emmanuelmathot

Problem

Currently, when a Dataset has Zarr conventions declared at the dataset level, individual DataArrays within that Dataset cannot inherit the conventions. This leads to inconsistent behavior:

# Dataset has conventions at group level
ds = xr.Dataset({"var1": (["y", "x"], data)})
ds.attrs = {
    "zarr_conventions": [{"name": "proj:", ...}],
    "proj:code": "EPSG:4326"
}

print(ds.rio.crs)        # ✅ Works: reads from dataset attrs 
print(ds["var1"].rio.crs) # ❌ Returns None: DataArray doesn't inherit

According to Zarr conventions, DataArrays should inherit group-level metadata when they don't have their own conventions declared.

Root Cause

  • xarray doesn't maintain parent-child relationships between Dataset and DataArray
  • In Implement Zarr spatial and proj conventions support #883 zarr.read_crs(obj) only examines obj.attrs, not potential parent Dataset attributes
  • XRasterBase.crs property has no context about inheritance hierarchy

Proposed Solutions

Option 1: Explicit Parent Registration

Add explicit inheritance support to XRasterBase:

class XRasterBase:
    def set_parent_dataset(self, parent_dataset: xarray.Dataset) -> 'XRasterBase':
        """Enable inheritance from parent Dataset"""
        self._parent_dataset = parent_dataset
        return self
        
    def _read_with_inheritance(self, read_func):
        """Try object-level first, then parent inheritance"""
        result = read_func(self._obj)
        if (result is None and 
            isinstance(self._obj, xarray.DataArray) and 
            self._parent_dataset is not None):
            result = read_func(self._parent_dataset)
        return result

# Usage:
da = ds["var1"]
da.rio.set_parent_dataset(ds)  # Enable inheritance
print(da.rio.crs)  # Now inherits EPSG:4326

Pros: Explicit, backward compatible, extensible to all metadata types
Cons: Requires manual setup

Option 2: Enhanced Dataset Accessor

Automatically set up inheritance when accessing DataArrays from Dataset:

class RasterDataset(XRasterBase):
    def __getitem__(self, key):
        """Auto-enable inheritance for DataArray access"""
        result = self._obj[key]
        if isinstance(result, xarray.DataArray):
            result.rio.set_parent_dataset(self._obj)
        return result
        
    def get_array_with_inheritance(self, var_name: str) -> 'RasterArray':
        """Convenience method for inheritance-enabled DataArray access"""
        return self._obj[var_name].rio.set_parent_dataset(self._obj)

Pros: More automatic, better UX
Cons: Requires careful integration with xarray accessors

Implementation Considerations

  1. Inheritance Priority: Array-level attributes take precedence over dataset-level
  2. Convention Consistency: Same inheritance behavior for both CF and Zarr conventions
  3. Metadata Scope: Extend to CRS, transform, and spatial dimensions
  4. Performance: Minimal overhead for non-inheritance cases

Breaking Changes

None - inheritance would be opt-in and backward compatible.

Alternative Approaches Considered

  • Stack inspection for parent detection
  • Global context managers

Metadata

Metadata

Assignees

No one assigned

    Labels

    proposalIdea for a new feature.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions