crimson converts non-standard bioinformatics tool outputs to JSON or YAML.
Currently it can convert outputs of the following tools:
- FastQC (
fastqc) - FusionCatcher (
fusioncatcher) - samtools flagstat (
flagstat) - Picard metrics tools (
picard) - STAR log file (
star) - STAR-Fusion hits table (
star-fusion) - Variant Effect Predictor
plain text output (
vep)
For each conversion, there are two execution options: as command line tool or as a Python
library function. The first alternative uses crimson as a command-line tool. The second one
requires importing the crimson library in your program.
crimson is available on the Python Package Index
and you can install it via pip:
$ pip install crimsonIt is also available on
BioConda, both through the
conda package manager or as a
Docker container.
For Docker execution, you may also use the GitHub Docker registry. This registry hosts the latest version, but does not host versions 1.1.0 or earlier.
docker pull ghcr.io/bow/crimsonThe general command is crimson {tool_name}. By default, the output is written to
stdout. For example, to use the picard parser, you would execute:
$ crimson picard /path/to/a/picard.metricsYou can also write the output to a file by specifying a file name. The following
command writes the output to a file named converted.json:
$ crimson picard /path/to/a/picard.metrics converted.jsonSome parsers may accept additional input formats. The FastQC parser, for example, also accepts a path to a FastQC output directory as its input:
$ crimson fastqc /path/to/a/fastqc/dirIt also accepts a path to a zipped result:
$ crimson fastqc /path/to/a/fastqc_result.zipWhen in doubt, use the --help flag:
$ crimson --help # for the general help
$ crimson fastqc --help # for the parser-specific help, in this case FastQCThe specific function to import is generally located at crimson.{tool_name}.parser. So to
use the picard parser in your program, you can do:
from crimson import picard
# You can specify the input file name as a string or path-like object...
parsed = picard.parse("/path/to/a/picard.metrics")
# ... or a file handle
with open("/path/to/a/picard.metrics") as src:
parsed = picard.parse(src)- Not enough tools use standard output formats.
- Writing and re-writing the same parsers across different scripts is not a productive way to spend the day.
Setting up a local development requires that you set up all of the supported Python versions. We use pyenv for this.
# Clone the repository and cd into it.
$ git clone https://github.com/bow/crimson
$ cd crimson
# Create your local development environment. This command also installs
# all supported Python versions using `pyenv`.
$ make dev
# Run the test and linter suite to verify the setup.
$ make lint test
# When in doubt, just run `make` without any arguments.
$ makeIf you are interested, crimson accepts the following types contribution:
- Documentation updates / tweaks (if anything seems unclear, feel free to open an issue)
- Bug reports
- Support for tools' outputs which can be converted to JSON or YAML
For any of these, feel free to open an issue in the issue tracker or submit a pull request.
crimson is BSD-licensed. Refer to the LICENSE file for the full license.