add download from ploughshare by andrpie · Pull Request #192 · NNPDF/pinecards

andrpie · 2026-03-02T13:50:13Z

Addresses pinefarm#102, i.e. adds a script that downloads and converts the full color grids from ploughshare for each of the 5 dijet datasets.

The scripts work well on my device (MacOS), but of course haven't been tested otherwise.

Thank you @achiefa for the CMS 13TEV script!

achiefa · 2026-03-03T10:59:38Z

Given that we need to download also the grids for single-inclusive jets, I can take advantage of this PR and add the relative scripts for the single-jet cases. I'll start with NNPDF/nnpdf#2407.

scarlehoff · 2026-03-04T14:43:02Z

Thank you very much for this. I have just a two (global) comments:

RE the download itself. Would it be possible to separate it from the per-grid script? In particular I am thinking you could add the list of links in something like a .yaml or .json (or .txt for all I care) file, in such a way that you don't need to repeat

FILENAME="applfast-atlas-dijets-v2-fc-fnlo-arxiv-1711.02692"
wget "https://ploughshare.web.cern.ch/ploughshare/db/applfast/$FILENAME/${FILENAME}.tgz"

in every script.
Then the download, instead of doing it in bash, do it in pinefarm as a small python downloader (note that there's e.g. no wget in macos).
Downloading should not need conda or anything fancy and, for this particular case, they are applgrids but in other cases they will be directly pineappl grids so I think it is better if the download and the conversion are separated.

RE the conversion: I'm ok with this part. However, please remove the conda dependent piece from the script. The installation of pineappl should've been taken care of by pineappl. Leave only the part that is actually needed so that someone looking at the script in the future doesn't need to break it down to understand it.
Also rename the scripts to postrun.sh so that they are in sync with all the other postrun scripts in the repository.

Btw, in particular I'm very much against semi-inclusive checks like this:

if [ -z "$CONDA_BASE" ]; then
    for candidate in \
        "$HOME/miniforge3" \
        "$HOME/mambaforge" \
        "$HOME/miniconda3" \
        "$HOME/anaconda3" \
        "/opt/conda" \
        "/opt/miniforge3" \
        "/opt/miniconda3" \
        "/opt/anaconda3" \
        "/usr/local/miniconda3" \
        "/usr/local/anaconda3"
    do
        if [ -f "$candidate/etc/profile.d/conda.sh" ]; then
            CONDA_BASE="$candidate"
            break
        fi
    done
fi
if [ -z "$CONDA_BASE" ]; then
    echo "Error: Could not find conda installation" >&2
    exit 1
fi

It is practically impossible to be fully inclusive in possible conda installations so it is better not to even try. This might even be activating the wrong conda installation creating chaos in the target computer!

achiefa · 2026-03-04T14:57:43Z

Hi @scarlehoff, thanks for your comment. Indeed we discussed this yesterday during the code meeting. I agree that the conda part is horrible to say the least, but this was something that I wrote in my bash script that was meant to be temporarly and local. So I agree with you that the conda part must be removed alltogether.

Downloading should not need conda or anything fancy and, for this particular case, they are applgrids but in other cases they will be directly pineappl grids so I think it is better if the download and the conversion are separated.

I see your point. However, I don't want to fall in the rabbit hole and build complex abstractions for such simple problems. In the end, this is meant to be a simple script that downloads and converts the grids, with some renaming conventions which must be set case-by-case. @andrpie and I will look into this, but at the moment this doesn't have my highest priority.

RE the conversion: I'm ok with this part. However, please remove the conda dependent piece from the script. The installation of pineappl should've been taken care of by pineappl. Leave only the part that is actually needed so that someone looking at the script in the future doesn't need to break it down to understand it.
Also rename the scripts to postrun.sh so that they are in sync with all the other postrun scripts in the repository.

RE this I just want to ask for clarifications. It's not clear to me how these scripts will be run. Are they meant to be run individually by hand? or they enter an automatised workflow that runs all the scripts for which grids should be "produced". Honestly I haven't thought about this when I wrote the script because this wasn't clear to me. The goal was just having something where I could log the steps so that I didn't forget.

scarlehoff · 2026-03-04T15:25:46Z

I see your point. However, I don't want to fall in the rabbit hole and build complex abstractions for such simple problems

I hate complex abstractions, I'm happy if you do a simple one :_)
Also because if something changes in plougshare (e.g., you need to use a mozilla agent in wget/curl to avoid being confused by a LLM bot) you need to do it to all of them.

RE this I just want to ask for clarifications. It's not clear to me how these scripts will be run. Are they meant to be run individually by hand? or they enter an automatised workflow that runs all the scripts for which grids should be "produced". Honestly I haven't thought about this when I wrote the script because this wasn't clear to me. The goal was just having something where I could log the steps so that I didn't forget.

In the ideal world I was thinking the following:

I download pinefarm and pinecards. Then I go and do

pinefarm run ATLAS_2JET_13TEV_DIF_MJJ-Y <and some theory file I guess...>

then pinefarm will read the plougshare_links.txt which is just a txt with a link per line.
When pinefarm sees there's is a plougshare_links.txt file in the pinecard then it does the download. Pinefarm will loop over these links and download them using only python primitives or at worst curl.

Then it will run the pineappl import if necessary.

Then, after the download has finished it automatically runs the postrun.sh script, which will do all the ~~complicated crap~~ reorganization and conversion.

After that the metadata.txt will be burned into the grids.

So this is why I'd like to separate download and re-organization. The pineappl import part I'd put with the download because it seems to be quite painless as well? But it'd be just a call to subprocess so it doesn't make much of a difference to leave it as part of the postrun.sh.

This is just the picture I had in mind. The leading order thing for me is not to repeat the wget piece in every script tbh.

add download from ploughshare

c3e5737

andrpie requested review from achiefa and scarlehoff March 2, 2026 13:50

felixhekhorn mentioned this pull request Mar 4, 2026

Suggestions for small improvements NNPDF/pineappl#377

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add download from ploughshare#192

add download from ploughshare#192
andrpie wants to merge 1 commit intomasterfrom
ploughshare_dijets_download

andrpie commented Mar 2, 2026

Uh oh!

achiefa commented Mar 3, 2026

Uh oh!

scarlehoff commented Mar 4, 2026

Uh oh!

achiefa commented Mar 4, 2026

Uh oh!

scarlehoff commented Mar 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

andrpie commented Mar 2, 2026

Uh oh!

achiefa commented Mar 3, 2026

Uh oh!

scarlehoff commented Mar 4, 2026

Uh oh!

achiefa commented Mar 4, 2026

Uh oh!

scarlehoff commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

scarlehoff commented Mar 4, 2026 •

edited

Loading