pest to chumsky migration by gerau · Pull Request #185 · BlockstreamResearch/SimplicityHL

gerau · 2025-12-18T11:12:21Z

No description provided.

apoelstra · 2025-12-18T13:21:06Z

cc @canndrew may want to keep an eye on progress here

gerau · 2026-01-12T13:16:25Z

Right now there is a working parser using the chumsky crate which replicates the behavior of the pest parser in terms of building a correct parse tree -- it should produce the same Simplicity program. This implementation also fixes #79.

Error reporting is currently broken because we need to replace the logic of parse::ParseFromStr to return multiple errors or handle recoverable errors differently, and error recovery is proving to be more overwhelming than I estimated it would be.

The code will be refactored because some parts are only half-finished (such as adding Spanned for certain names) and there are better ways to use parser combinators. However, I want to show this progress before implementing error recovery.

gerau · 2026-01-12T13:16:48Z

cc @canndrew

src/lib.rs

src/error.rs

canndrew · 2026-01-16T08:19:09Z

It's weird that the lexer is treating all our built-in macro/function/etc names as being keywords. I realize that's how the compiler currently works, so it's okay to land this PR as-is to keep the changes small. But obviously we'd want to eventually treat these as just being identifiers.

gerau · 2026-01-21T13:32:34Z

I would like to provide more context on a few points:

Some of the parsers try to recover to some "default" values, so it could continue parse and report an error. If I understand correctly, in most parsers this is implemented by adding to parsing structures error states, so analysis stage of the compiler could handle this cases correctly. I haven't done this in this PR, because it requires changing the analysis code as well. Right now, it would not progress to analysis stage if there is a parsing error.
I changed the lexer to not parse built-in types and functions as keywords, because this creates behavior, that was not in original pest parser (e.g. u1 was considered UnsignedType, even if it's defined as variable). This also does not require significant changes to parser itself, so I think we should keep this change here.
I didn't change errors too much and their printing, but I think we should consider refactor errors and use ariadne for collecting them and printing. It seems to pair fairly well with chumsky, and it would provide prettier errors than we currently have.

gerau · 2026-01-21T13:40:01Z

Also a note about performance: chumsky seems faster in general than pest parser. For example, on my machine for a large file .simf file, which was generated by simplicity-bn254, chumsky is 10 times faster than pest for parsing. But trade-off for this is slower compilation times and lag with rust-analyzer, because chumsky is type-driven.

It would be nice if we could move the parser to a different crate, so it would not affect compile time too much, and the SimplicityHL parser could be used separately from the compiler.

gerau · 2026-01-21T13:41:31Z

cc @canndrew @KyrylR

src/error.rs

apoelstra · 2026-01-23T18:47:34Z

It would be nice if we could move the parser to a different crate, so it would not affect compile time too much, and the SimplicityHL parser could be used separately from the compiler.

Strongly agreed. If we had a public somewhat-standard AST type that the parser would produce, this would also let people implement some kinds of linters and/or formatters without needing support from us. (We will likely get some pressure to preserve whitespace and comments to help with this. Maybe we actually want two AST types, one that has whitespace and comments and one that's reduced somehow.)

In any case, this is all separate from this PR.

KyrylR · 2026-01-29T13:24:09Z

Please rebase onto master

src/error.rs

src/lexer.rs

src/lib.rs

src/main.rs

src/parse.rs

KyrylR · 2026-02-02T16:28:05Z

Overall, it looks good to me. I’m still concerned about whether the Options will remain the same. Could you please take the time to add regression tests using Options, DCD, Option Offer, and Lending Contract?

gerau · 2026-02-03T11:49:50Z

Overall, it looks good to me. I’m still concerned about whether the Options will remain the same. Could you please take the time to add regression tests using Options, DCD, Option Offer, and Lending Contract?

I found a bug in my code, so thank you for pushing me to test it. I've run the tests within the clone of simplicity-contracts repo; you can run them yourself to verify the results. It simply runs all the tests in this repo using the new parser and checks if the resulting bytecode is identical in the load_program function.

KyrylR · 2026-02-03T13:06:47Z

Overall, it looks good to me. I’m still concerned about whether the Options will remain the same. Could you please take the time to add regression tests using Options, DCD, Option Offer, and Lending Contract?

I found a bug in my code, so thank you for pushing me to test it. I've run the tests within the clone of simplicity-contracts repo; you can run them yourself to verify the results. It simply runs all the tests in this repo using the new parser and checks if the resulting bytecode is identical in the load_program function.

Cool, thanks

ACK 60bd05a

apoelstra · 2026-02-03T16:46:04Z

In d9d2b1b:

There are tons of lockfile changes in this commit which have nothing to do with the chumsky addition.

As an aside, when adding dependencies alongside nontrivial code changes, it's easier for me to review when the Cargo.toml/Cargo.lock changes are in their own commit, and the "real" code is in a subsequent commit.

apoelstra · 2026-02-03T16:50:17Z

src/lexer.rs

@@ -0,0 +1,319 @@
+use chumsky::prelude::*;


In d9d2b1b:

In general I am very suspicious of wildcard imports. They make it very hard to tell where symbols are coming from, especially words like any and just. If there is only one of them in the file I guess that's okay.

In a later PR we can remove this (and add the clippy lint to eliminate all the wildcards).

apoelstra · 2026-02-03T17:59:24Z

Cargo.toml

 arbitrary = { version = "1", optional = true, features = ["derive"] }
 clap = "4.5.37"
 chumsky = "0.11.2"
+line-index = "0.1.2"


In 1ae8b0d:

This crate seems super sketchy. It uses u32 to represent lengths all over the place, deals with this mismatch with undocumented panics (your own code uses integer type casts, which are a code smell and which occur almost nowhere else in this codebase).

It seems like this is used exclusively for a single Display impl? Let's just drop this dependency tree and manually count lines.

apoelstra · 2026-02-03T18:10:15Z

src/parse.rs

-impl PestParse for Item {
-    const RULE: Rule = Rule::item;
+        let Some(tokens) = tokens else {
+            return Err(lex_errs.first().cloned().unwrap_or(RichError::new(


In 1ae8b0d:

.first().cloned() can be replaced with .swap_remove(0) or even just .pop() because it probably doesn't matter which error we're exclusively reporting.

I also kinda feel that this RichError::new construction should be replaced with a dedicated constructor on RichError. And "unknown reason" should probably be "empty parse".

Anyway these are just nits, you can ignore them.

apoelstra · 2026-02-03T18:15:16Z

src/parse.rs

+                (Token::LBrace, Token::RBrace),
+                (Token::LAngle, Token::RAngle),
+            ],
+            move |_| fallback.clone(),


In 1ae8b0d:

This clone seems totally unnecessary; we can just move the fallback into this closure. Better, we could take the closure as an argument, and that would avoid executing the .clone() calls used to construct this fallback on almost every single call to delimited_with_recovery.

apoelstra · 2026-02-03T18:27:46Z

src/parse.rs

+                I: ValueInput<'tokens, Token = Token<'src>, Span = Span>,
+            {
+                select! {
+                    Token::Ident(ident) => Self::from_str_unchecked(ident)


In 1ae8b0d:

Is this single-option select! construction really idiomatic?

It is, because select! is essentially a wrapper around a filter with a match statement, which also handles error reporting. I'm not a fan of using it either, but the alternative forces us to manually write the filtering and error reporting (which are done just fine by this macro).

apoelstra · 2026-02-03T18:33:29Z

src/error.rs

+    }
+}
+
+impl From<ErrorCollector> for String {


In 156da82:

I really don't like this. It means that you can use ? and silently convert a real error into a string. When we go to clean this up it'll be a big mess finding all the call sites and estimating how much work is left to done.

We should add a ErrorCollector::into_string method and then everywhere we're using ? to stringify this (in the next commit, 3670300 there are a few) we instead use .map_err(ErrorCollector::into_string)? which is much easier to grep for.

apoelstra · 2026-02-03T18:37:21Z

Done reviewing 60bd05a. My comments are basically just nits and cleanups and I'm happy to deal with them later except I'd like to reduce the lockfile changes and I definitely want to get rid of the bad line-index dep.

gerau · 2026-02-04T13:38:31Z

In d9d2b1b:

There are tons of lockfile changes in this commit which have nothing to do with the chumsky addition.

Sorry about that, it seems that at some point I removed the Cargo.toml, and cargo generated a new one with updated versions.

The lexer parses incoming code into tokens, which makes it simpler to process using `chumsky`.

This commit introduce multiple changes, because it full rewrite of parsing and error Changes in `error.rs`: - Change `Span` to use byte offsets in place of old `Position` - Change `Display` function to calculate line and columns with inner function - Change `RichError` implementation to use new `Span` structure - Implement `chumsky` error traits, so it can be used in error reporting of parsers - add `expected..found` error - remove unused `cmr` function for `Span` and unused error messages Changes in `parse.rs`: - Fully rewrite `pest` parsers to `chumsky` parsers. - Change `ParseFromStr` trait to use this change.

This adds `ParseFromStrWithErrors`, which would take `ErrorCollector` and return an `Option` of AST. Also changes `TemplateProgram` to use new trait with collector

it's not slow anymore

This adds tests to ensure that the compiler using the `chumsky` parser produces the same Simplicity program as when using the `pest` parser for the default examples. The programs were compiled using an old `simc` version with debug symbols into .json files, and located in `test-data/` folder.

We are no longer need this as we are no longer using the `pest` parser.

gerau · 2026-02-04T14:27:08Z

Fixed; also addressed nitpicks.

gerau mentioned this pull request Dec 26, 2025

Refactor parsing and analysis for better tooling support #191

Open

gerau force-pushed the simc/chumsky-migration branch from 6db55db to 1b1e751 Compare January 12, 2026 13:01

uncomputable reviewed Jan 12, 2026

View reviewed changes

src/lib.rs Show resolved Hide resolved

gerau force-pushed the simc/chumsky-migration branch from 1b1e751 to 1e7c61b Compare January 14, 2026 15:10

canndrew reviewed Jan 16, 2026

View reviewed changes

src/error.rs Outdated Show resolved Hide resolved

canndrew reviewed Jan 16, 2026

View reviewed changes

src/error.rs Outdated Show resolved Hide resolved

canndrew reviewed Jan 16, 2026

View reviewed changes

src/error.rs Outdated Show resolved Hide resolved

gerau force-pushed the simc/chumsky-migration branch 3 times, most recently from bd5c30f to 24a6bc6 Compare January 21, 2026 13:08

KyrylR reviewed Jan 22, 2026

View reviewed changes

src/error.rs Outdated Show resolved Hide resolved

src/error.rs Outdated Show resolved Hide resolved

src/error.rs Outdated Show resolved Hide resolved

src/error.rs Outdated Show resolved Hide resolved

gerau force-pushed the simc/chumsky-migration branch from 24a6bc6 to b200640 Compare January 27, 2026 11:50

KyrylR reviewed Jan 29, 2026

View reviewed changes

gerau force-pushed the simc/chumsky-migration branch 2 times, most recently from b88ef62 to 4bf6252 Compare January 29, 2026 13:52

KyrylR reviewed Jan 29, 2026

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

src/lib.rs Show resolved Hide resolved

src/lib.rs Show resolved Hide resolved

src/lib.rs Show resolved Hide resolved

src/lib.rs Outdated Show resolved Hide resolved

src/lib.rs Show resolved Hide resolved

KyrylR reviewed Jan 29, 2026

View reviewed changes

src/main.rs Show resolved Hide resolved

KyrylR reviewed Jan 29, 2026

View reviewed changes

gerau force-pushed the simc/chumsky-migration branch 2 times, most recently from 6a5bc25 to 7e5a257 Compare January 30, 2026 13:07

This was referenced Jan 30, 2026

Implement error states in parser #205

Open

Formatter support #206

Open

Error recovery in analysis #207

Open

gerau force-pushed the simc/chumsky-migration branch from b99a455 to 8270f13 Compare February 2, 2026 16:11

gerau mentioned this pull request Feb 2, 2026

Tests for parser #209

Open

gerau force-pushed the simc/chumsky-migration branch from 8270f13 to 1833a7a Compare February 3, 2026 11:28

gerau force-pushed the simc/chumsky-migration branch from 1833a7a to 60bd05a Compare February 3, 2026 12:53

apoelstra reviewed Feb 3, 2026

View reviewed changes

gerau force-pushed the simc/chumsky-migration branch from 60bd05a to 04d3ae2 Compare February 4, 2026 13:34

gerau added 8 commits February 4, 2026 15:51

add chumsky dependency

e75cb5e

add lexer

79561a4

The lexer parses incoming code into tokens, which makes it simpler to process using `chumsky`.

add ErrorCollector

e8b9c0e

add handling for multiple errors

b7eb350

This adds `ParseFromStrWithErrors`, which would take `ErrorCollector` and return an `Option` of AST. Also changes `TemplateProgram` to use new trait with collector

remove #[ignore] above fuzz_slow_unit_1()

d23e7f6

it's not slow anymore

remove pest dependency and minimal.pest grammar file

6026de1

We are no longer need this as we are no longer using the `pest` parser.

gerau force-pushed the simc/chumsky-migration branch from 04d3ae2 to 6026de1 Compare February 4, 2026 13:53

Conversation

gerau commented Dec 18, 2025

Uh oh!

apoelstra commented Dec 18, 2025

Uh oh!

gerau commented Jan 12, 2026

Uh oh!

gerau commented Jan 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

canndrew commented Jan 16, 2026

Uh oh!

gerau commented Jan 21, 2026

Uh oh!

gerau commented Jan 21, 2026

Uh oh!

gerau commented Jan 21, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

apoelstra commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KyrylR commented Jan 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KyrylR commented Feb 2, 2026

Uh oh!

gerau commented Feb 3, 2026

Uh oh!

KyrylR commented Feb 3, 2026

Uh oh!

apoelstra commented Feb 3, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

apoelstra commented Jan 23, 2026 •

edited

Loading