Skip to content

Comments

Repair fix_file#2993

Open
bouweandela wants to merge 7 commits intomainfrom
fix-fix-file
Open

Repair fix_file#2993
bouweandela wants to merge 7 commits intomainfrom
fix-fix-file

Conversation

@bouweandela
Copy link
Member

@bouweandela bouweandela commented Feb 19, 2026

Description

The preprocessor function fix_file is currently broken if it actually fixes a file, it works fine if no fix is implemented. This pull request:

  • Changes the preprocessor function fix_file so its return type is compatible with what is expected by the function esmvalcore.preprocessor.preprocess. In practice, the 4 currently implemented fix_file methods all return paths.
  • Updated the code to set the attributes property on the original input file for recording provenance if fixes have been applied.
  • Tested with recipe ref/recipe_ref_scatterplot.yml.

Follow up work containing examples of a fix_file method that returns cubes in #2999

Closes #3001


Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.


To help with the number pull requests:

@codecov
Copy link

codecov bot commented Feb 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.69%. Comparing base (f64427e) to head (25db589).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2993   +/-   ##
=======================================
  Coverage   95.68%   95.69%           
=======================================
  Files         267      267           
  Lines       15726    15742   +16     
=======================================
+ Hits        15048    15064   +16     
  Misses        678      678           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@bouweandela bouweandela added the bug Something isn't working label Feb 19, 2026
@schlunma schlunma added this to the v2.14.0 milestone Feb 20, 2026
@bouweandela bouweandela marked this pull request as ready for review February 23, 2026 14:10
@bouweandela bouweandela requested a review from schlunma February 23, 2026 14:10
operations that are more efficient with other packages (e.g., loading files
with lots of variables is much faster with Xarray than Iris).

Warning
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this as users cannot do anything about this, it seems more like a note for developers.

@schlunma
Copy link
Contributor

schlunma commented Feb 23, 2026

I am trying to understand why this change is necessary. Has this been broken since #2579?

I am also not too happy to drop support for fix_file to return xarray or ncdata obejcts. This kind of makes it pointless that load supports them.

Maybe the clearer solution would be to not treat fix_file and load (and maybe also save) as ordinary preprocessor functions that can be used within esmvalcore.preprocessor.preprocess? All other preprocessor functions take one or multiple cubes as arguments and return one or multiple cubes.

EDIT: I forgot about the multi-model preprocessor functions. They may ingest and return PreprocessorFile objects in addition to cubes. This should not be problematic though.

Copy link
Contributor

@schlunma schlunma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more quick comment

@bouweandela
Copy link
Member Author

I am trying to understand why this change is necessary. Has this been broken since #2579?

Partly: returning an xarray or ncdata dataset has never worked because esmvalcore.preprocessor.preprocess expects a PreprocessorFile, Cube, or DataElement or a sequence thereof (was a PreprocessorFile, Cube, str or Path at the time that was implemented). However, we never noticed the problem because all fix_file methods that are currently implemented return a pathlib.Path.

When I changed the thing that gets passed into esmvalcore.preprocessor.fix_file in #2765 from always a Path to a DataElement (which is a Path when it is a esmvalcore.io.local.LocalFile, but not when it is an esmvalcore.io.intake_esgf.IntakeESGFDataset), things broke for datasets that have a fix_file fix implemented. We did not notice at the time because hardly any datasets need this, only CMIP6 CESM2 and CESM2-WACCM cloud variables with malformed parametric vertical coordinates use this feature (and EMAC and IPSL native model readers, but that data is not publicly available, so I can't test it).

I am also not too happy to drop support for fix_file to return xarray or ncdata obejcts. This kind of makes it pointless that load supports them.

Allowing fix_file to return all kinds of objects, which are subsequently normalized again to a (list of) cubes by load seems more complicated than allowing a smaller list of return types. We can then do the conversion inside the fix_file method using the functionality implemented in #2579 and avoid having to support these extra types in esmvalcore.preprocessor. So not at all pointless, but just used in a different place. For example: https://github.com/ESMValGroup/ESMValCore/pull/2999/changes#diff-4a658faa0d3947ea3ce4a96b9389cb5a6cc3803252c76e7c1104bba9a4f02882

Maybe the clearer solution would be to not treat fix_file and load (and maybe also save) as ordinary preprocessor functions that can be used within esmvalcore.preprocessor.preprocess? All other preprocessor functions take one or multiple cubes as arguments and return one or multiple cubes.

I agree that we need a better solution, but let's look at that for the next release. I thought I would keep the amount of changes here to a minimum, therefore this only reinstates support for using fix_file with esmvalcore.io.local.LocalFiles and intake-esgf users will have to wait.

@schlunma
Copy link
Contributor

All right, that makes sense. I will test this Thursday once Levante is back again!

Regarding load: Similar to the return types fix_files, it probably would have also made sense to limit the argument types (i.e., remove xr.Dataset | ncdata.NcData). But changing this would be backwards-incompatible.

Looking forward to a clean solution to this for the next release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handling a PosixPath in preprocess

2 participants