Parxy provides a powerful and flexible command-line interface (CLI) that allows you to parse documents, convert them to Markdown, and manage configuration files directly from your terminal — without writing any Python code.
Once installed, you can run the CLI via the parxy command.
The Parxy CLI lets you:
| Command | Description |
|---|---|
parxy parse |
Extract text content from documents with customizable detail levels and output formats. Process files or folders with multiple drivers. |
parxy preview |
Interactive document viewer with metadata, table of contents, and scrollable content preview |
parxy markdown |
Convert documents to Markdown files, with support for multiple drivers and folder processing |
parxy pdf:merge |
Merge multiple PDF files into one, with support for page ranges |
parxy pdf:split |
Split a PDF file into individual pages |
parxy drivers |
List available document processing drivers |
parxy env |
Generate a default .env configuration file |
parxy docker |
Create a Docker Compose configuration for running Parxy-related services |
The parse command is a powerful tool for extracting text from documents with extensive customization options. It supports processing individual files or entire folders, multiple output formats, and can use multiple drivers for comparison.
Parse a single document using the default settings (PyMuPDF driver, json output):
parxy parse document.pdfThis creates a pymupdf-document.json file in the same directory as the source file. Parxy always prefix the output file with the driver name.
Parse multiple files at once:
parxy parse doc1.pdf doc2.pdf doc3.pdfProcess all PDFs in a folder (recursively):
parxy parse /path/to/folderMix files and folders:
parxy parse document.pdf /path/to/folderControl the output format with the --mode (-m) option:
# Markdown format (default)
parxy parse document.pdf -m markdown
# Plain text
parxy parse document.pdf -m plain
# JSON (full document structure)
parxy parse document.pdf -m jsonThe file extension is automatically set based on the output mode (.md, .txt, or .json).
Specify where to save the output files with --output (-o):
parxy parse document.pdf -o output/If not specified, files are saved in the same directory as the source files.
Adjust the extraction level with the --level (-l) option:
parxy parse --level line document.pdfSupported levels are (depending on the driver):
pageblock(default)linespancharacter
Specify a driver with the --driver (-d) option:
parxy parse --driver llamaparse document.pdf
# output will be saved as llamaparse-document.jsonParse the same document(s) with multiple drivers by specifying --driver (or -d for short) multiple times:
parxy parse document.pdf -d pymupdf -d llamaparseWhen using multiple drivers, Parxy always prepend the driver name to the output filenames, e.g. pymupdf-document.json, llamaparse-document.json. This is particularly useful for comparing extraction quality across different parsers.
When processing multiple files, Parxy displays a progress bar showing:
- Files being processed
- Driver being used
- Output file location
- Number of pages extracted
Process all PDFs in a folder with two drivers, output as JSON, and save to a specific directory:
parxy parse /path/to/pdfs -d pymupdf -d llamaparse -m json -o output/The preview command provides an interactive document viewer that displays:
- Document metadata (title, author, creation date, etc.)
- Table of contents extracted from headings
- Document content rendered as markdown
This is useful for quickly inspecting a document's structure and content without creating output files.
parxy preview document.pdfThe preview is displayed in a scrollable three-panel layout.
Specify a driver:
parxy preview document.pdf --driver llamaparseAdjust extraction level:
parxy preview document.pdf --level lineThe preview uses your system's default pager (similar to less on Unix systems), allowing you to:
- Scroll up and down
- Search for text
- Exit the preview
This is ideal for quick document inspection before running a full parsing operation.
The markdown command converts documents to Markdown format, preserving structure such as headings and lists. It follows the same conventions as the parse command: output files are prefixed with the driver name and saved next to the source file by default.
parxy markdown document.pdfThis creates a pymupdf-document.md file in the same directory as the source file.
# Parse multiple files
parxy markdown doc1.pdf doc2.pdf doc3.pdf
# Parse all PDFs in a folder (non-recursive by default)
parxy markdown /path/to/folder
# Parse recursively
parxy markdown /path/to/folder --recursive
# Limit recursion depth
parxy markdown /path/to/folder --recursive --max-depth 2parxy markdown document.pdf -o output/Run the same documents through multiple drivers for comparison:
parxy markdown document.pdf -d pymupdf -d llamaparseThis produces pymupdf-document.md and llamaparse-document.md.
Use --inline with a single file to print markdown directly to stdout with a YAML frontmatter header — useful for shell pipelines:
parxy markdown document.pdf --inline
parxy markdown document.pdf --inline | your-toolOutput format:
---
file: "document.pdf"
pages: 10
---
# Document heading
...Parxy provides two powerful commands for PDF manipulation: merging multiple PDFs into one and splitting a single PDF into multiple files.
The pdf:merge command combines multiple PDF files into a single output file. You can merge entire files, specific page ranges, or folders of PDFs.
Basic merge:
parxy pdf:merge file1.pdf file2.pdf -o merged.pdfMerge with page ranges:
parxy pdf:merge doc1.pdf[1:5] doc2.pdf[3:7] -o combined.pdfPage range syntax (1-based indexing):
file.pdf[1]- Single page (page 1)file.pdf[1:5]- Pages 1 through 5file.pdf[:3]- First 3 pagesfile.pdf[5:]- From page 5 to the end
Merge entire folders:
parxy pdf:merge /path/to/pdfs -o combined.pdfMix files, folders, and page ranges:
parxy pdf:merge cover.pdf /chapters doc.pdf[10:20] appendix.pdf -o book.pdfThe pdf:split command divides a PDF file into individual pages, with each page becoming a separate PDF file.
Split into individual pages:
parxy pdf:split document.pdfThis creates a document_split/ folder containing document_page_1.pdf, document_page_2.pdf, etc.
Specify output directory and prefix:
parxy pdf:split report.pdf -o ./pages -p pageCreates page_1.pdf, page_2.pdf, etc. in the ./pages directory.
For more detailed examples and use cases, see the PDF Manipulation How-to Guide.
To view the list of supported document parsing drivers:
parxy driversThis will display all available backends (e.g., pymupdf, pdfact, llamaparse, etc.).
To create a default .env configuration file for Parxy:
parxy envIf a .env file already exists, you'll be prompted before overwriting it.
You can then edit this file to adjust driver settings, API keys, or other environment variables.
Parxy can generate a ready-to-use Docker Compose configuration for self-hosted services (e.g., parsers available via an http-based api):
parxy dockerThis creates a compose.yaml file in your working directory.
To start the services, run:
docker compose pull
docker compose up -dRun the following to see all available commands and options:
parxy --helpEach command also supports --help for detailed usage, for example:
parxy parse --helpWith the CLI, you can use Parxy as a standalone document parsing tool — ideal for quick experiments, batch conversions, or integrations in shell-based pipelines.
| Command | Purpose |
|---|---|
parxy parse |
Extract text from documents with multiple formats & drivers |
parxy preview |
Interactive document viewer with metadata and TOC |
parxy markdown |
Generate Markdown files with driver prefix naming |
parxy pdf:merge |
Merge multiple PDF files with page range support |
parxy pdf:split |
Split PDF files into individual pages |
parxy drivers |
List supported drivers |
parxy env |
Create default configuration file |
parxy docker |
Generate Docker Compose setup |