Skip to content

docs: revise best practices and contributing guide#293

Merged
a-frantz merged 8 commits intomainfrom
docs/best-practices
Feb 6, 2026
Merged

docs: revise best practices and contributing guide#293
a-frantz merged 8 commits intomainfrom
docs/best-practices

Conversation

@a-frantz
Copy link
Member

@a-frantz a-frantz commented Feb 3, 2026

Describe the problem or feature in addition to a link to the issues.

closes #277

Before submitting this PR, please make sure:

  • You have added a few sentences describing the PR here.
  • The code passes all CI tests without any errors or warnings.
  • You have added tests (when appropriate).
  • You have added an entry in any relevant CHANGELOGs (when appropriate).
  • If you have made any changes to the scripts/ or docker/ directories, please ensure any image versions have been incremented accordingly!
  • You have updated the README or other documentation to account for these changes (when appropriate).

@a-frantz a-frantz self-assigned this Feb 3, 2026
@a-frantz a-frantz requested a review from adthrasher February 3, 2026 20:59
@a-frantz a-frantz marked this pull request as ready for review February 3, 2026 21:59
@a-frantz a-frantz changed the title WIP[docs: revise best practices and contributing guide] docs: revise best practices and contributing guide Feb 3, 2026
Comment on lines +38 to +39
- See `template/common-parameter-meta.txt` for common description strings.
- If applicable, use the same parameter name and help text as the underlying tool called by the task.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all related to this PR, but I wonder if there is a way to tell Copilot about this file. It'd be great if its autocomplete suggestions would consider that file as a set of options when writing parameter_meta.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skimming this now, but there are instructions for doing something that sounds like this feature: https://docs.github.com/en/copilot/how-tos/configure-custom-instructions/add-repository-instructions?tool=vscode

- Contributors can mix and match the available templates, copy and pasting subsections as appropriate.
- It is allowed to have one resource allocated dynamically, and another allocated statically in the same task.
- Multi-core tasks should *always* follow the conventions laid out in the `use_all_cores_task` example (see `template/task-examples.wdl`).
- This is catering to cloud users, who may be allocated a machine with more cores than are specified by the `ncpu` parameter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to revisit this paradigm in the future. With Planetary + AKS, the container only gets allocated the requested number of cores within the VM.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q you may not have the answer to: how does nproc work in those environments? I think that nproc on the cluster is "smart" and only returns what was allocated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. With Planetary + AKS, the container is allocated n CPUs, and it can only see that many. I'm not actually confident that the idea behind nproc is sound. As long as the task is running in a container, the Docker engine will have provided some number of cores to the container. Presumably, whatever engine is queueing the Docker container will provide the cpu argument to Docker as the number of cores. The use_all_cores idea is probably better implemented at the engine-level. If the engine allocates 1 VM per task, then it should use_all_cores. If it is capable of sharing VMs between tasks, then it shouldn't use_all_cores.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the point of use_all_cores is so that you can tell called tools what level of parallelism should be used. Most bioinformatics tools will only spin up as many threads as you ask for (i.e. without specifying some kind of --thread [>1 value] everything will run in one thread regardless of how many cores the task/engine/backend/VM allocates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I meant more that the engine, itself, should set use_all_cores if it isn't capable of scheduling multiple tasks to a node.

With our experience with LSF though, I'm curious how other HPC system function (e.g. Slurm). LSF does something so that the underlying task only sees the allocated cores. If the scheduling systems reliably did that, then we could just always use_all_cores.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I meant more that the engine, itself, should set use_all_cores if it isn't capable of scheduling multiple tasks to a node.

this stumped me for a minute. I don't like the idea of the engine messing around in the inputs section, but that wouldn't be needed. 1.3 exposes both hints and requirements to command blocks, which is a paradigm shift I haven't adjusted to yet. I can see something analagous to use_all_cores being a hint that can be accessed in the command in order to determine how nproc should (or should not be) called

Comment on lines +29 to +31
- The non-empty qualifier (`+`) of arrays and maps should be avoided.
- TODO this is really just our opinion after trying to use it and finding the resulting WDL ugly. I'm not sure it can really be called a "best practice".
- This is because the non-empty qualifier can be cumbersome to deal with in WDL source code. No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually recall what led to writing this. I'm not sure we should recommend skipping use of a WDL language feature though as a best practice. If anything, the spec should be improved around the feature to improve the user experience.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We briefly had Array[*]+ on main before we had some issue with sprocket+miniwdl compatibility that made us go back and revert them all (and write this entry), although I now can't remember the specifics. It should be in our git history though, will just take some digging to find

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed for now

If anything, the spec should be improved around the feature to improve the user experience.

this is true. I'm going to leave this thread unresolved, as we should go find that git history and share the pain points we ran into and file issues in OpenWDL for ways to improve the spec

Co-authored-by: Andrew Thrasher <adthrasher@gmail.com>
a-frantz and others added 3 commits February 6, 2026 09:45
Co-authored-by: Andrew Thrasher <adthrasher@gmail.com>
@a-frantz a-frantz merged commit 21b2fd9 into main Feb 6, 2026
19 of 21 checks passed
@a-frantz a-frantz deleted the docs/best-practices branch February 6, 2026 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

revisit best-practices.md

2 participants