Conversation
|
cc @canndrew may want to keep an eye on progress here |
6db55db to
1b1e751
Compare
|
Right now there is a working parser using the Error reporting is currently broken because we need to replace the logic of The code will be refactored because some parts are only half-finished (such as adding |
|
cc @canndrew |
1b1e751 to
1e7c61b
Compare
|
It's weird that the lexer is treating all our built-in macro/function/etc names as being keywords. I realize that's how the compiler currently works, so it's okay to land this PR as-is to keep the changes small. But obviously we'd want to eventually treat these as just being identifiers. |
bd5c30f to
24a6bc6
Compare
|
I would like to provide more context on a few points:
|
|
Also a note about performance: It would be nice if we could move the parser to a different crate, so it would not affect compile time too much, and the SimplicityHL parser could be used separately from the compiler. |
Strongly agreed. If we had a public somewhat-standard AST type that the parser would produce, this would also let people implement some kinds of linters and/or formatters without needing support from us. (We will likely get some pressure to preserve whitespace and comments to help with this. Maybe we actually want two AST types, one that has whitespace and comments and one that's reduced somehow.) In any case, this is all separate from this PR. |
24a6bc6 to
b200640
Compare
|
Please rebase onto master |
b88ef62 to
4bf6252
Compare
6a5bc25 to
7e5a257
Compare
b99a455 to
8270f13
Compare
|
Overall, it looks good to me. I’m still concerned about whether the Options will remain the same. Could you please take the time to add regression tests using Options, DCD, Option Offer, and Lending Contract? |
8270f13 to
1833a7a
Compare
I found a bug in my code, so thank you for pushing me to test it. I've run the tests within the clone of simplicity-contracts repo; you can run them yourself to verify the results. It simply runs all the tests in this repo using the new parser and checks if the resulting bytecode is identical in the |
1833a7a to
60bd05a
Compare
Cool, thanks ACK 60bd05a |
|
In d9d2b1b: There are tons of lockfile changes in this commit which have nothing to do with the chumsky addition. As an aside, when adding dependencies alongside nontrivial code changes, it's easier for me to review when the Cargo.toml/Cargo.lock changes are in their own commit, and the "real" code is in a subsequent commit. |
src/lexer.rs
Outdated
| @@ -0,0 +1,319 @@ | |||
| use chumsky::prelude::*; | |||
There was a problem hiding this comment.
In d9d2b1b:
In general I am very suspicious of wildcard imports. They make it very hard to tell where symbols are coming from, especially words like any and just. If there is only one of them in the file I guess that's okay.
There was a problem hiding this comment.
In a later PR we can remove this (and add the clippy lint to eliminate all the wildcards).
Cargo.toml
Outdated
| arbitrary = { version = "1", optional = true, features = ["derive"] } | ||
| clap = "4.5.37" | ||
| chumsky = "0.11.2" | ||
| line-index = "0.1.2" |
There was a problem hiding this comment.
In 1ae8b0d:
This crate seems super sketchy. It uses u32 to represent lengths all over the place, deals with this mismatch with undocumented panics (your own code uses integer type casts, which are a code smell and which occur almost nowhere else in this codebase).
It seems like this is used exclusively for a single Display impl? Let's just drop this dependency tree and manually count lines.
src/parse.rs
Outdated
| impl PestParse for Item { | ||
| const RULE: Rule = Rule::item; | ||
| let Some(tokens) = tokens else { | ||
| return Err(lex_errs.first().cloned().unwrap_or(RichError::new( |
There was a problem hiding this comment.
In 1ae8b0d:
.first().cloned() can be replaced with .swap_remove(0) or even just .pop() because it probably doesn't matter which error we're exclusively reporting.
I also kinda feel that this RichError::new construction should be replaced with a dedicated constructor on RichError. And "unknown reason" should probably be "empty parse".
Anyway these are just nits, you can ignore them.
src/parse.rs
Outdated
| (Token::LBrace, Token::RBrace), | ||
| (Token::LAngle, Token::RAngle), | ||
| ], | ||
| move |_| fallback.clone(), |
There was a problem hiding this comment.
In 1ae8b0d:
This clone seems totally unnecessary; we can just move the fallback into this closure. Better, we could take the closure as an argument, and that would avoid executing the .clone() calls used to construct this fallback on almost every single call to delimited_with_recovery.
| I: ValueInput<'tokens, Token = Token<'src>, Span = Span>, | ||
| { | ||
| select! { | ||
| Token::Ident(ident) => Self::from_str_unchecked(ident) |
There was a problem hiding this comment.
In 1ae8b0d:
Is this single-option select! construction really idiomatic?
There was a problem hiding this comment.
It is, because select! is essentially a wrapper around a filter with a match statement, which also handles error reporting. I'm not a fan of using it either, but the alternative forces us to manually write the filtering and error reporting (which are done just fine by this macro).
src/error.rs
Outdated
| } | ||
| } | ||
|
|
||
| impl From<ErrorCollector> for String { |
There was a problem hiding this comment.
In 156da82:
I really don't like this. It means that you can use ? and silently convert a real error into a string. When we go to clean this up it'll be a big mess finding all the call sites and estimating how much work is left to done.
We should add a ErrorCollector::into_string method and then everywhere we're using ? to stringify this (in the next commit, 3670300 there are a few) we instead use .map_err(ErrorCollector::into_string)? which is much easier to grep for.
|
Done reviewing 60bd05a. My comments are basically just nits and cleanups and I'm happy to deal with them later except I'd like to reduce the lockfile changes and I definitely want to get rid of the bad line-index dep. |
60bd05a to
04d3ae2
Compare
Sorry about that, it seems that at some point I removed the |
This commit introduce multiple changes, because it full rewrite of parsing and error Changes in `error.rs`: - Change `Span` to use byte offsets in place of old `Position` - Change `Display` function to calculate line and columns with inner function - Change `RichError` implementation to use new `Span` structure - Implement `chumsky` error traits, so it can be used in error reporting of parsers - add `expected..found` error - remove unused `cmr` function for `Span` and unused error messages Changes in `parse.rs`: - Fully rewrite `pest` parsers to `chumsky` parsers. - Change `ParseFromStr` trait to use this change.
This adds `ParseFromStrWithErrors`, which would take `ErrorCollector` and return an `Option` of AST. Also changes `TemplateProgram` to use new trait with collector
it's not slow anymore
This adds tests to ensure that the compiler using the `chumsky` parser produces the same Simplicity program as when using the `pest` parser for the default examples. The programs were compiled using an old `simc` version with debug symbols into .json files, and located in `test-data/` folder.
We are no longer need this as we are no longer using the `pest` parser.
04d3ae2 to
6026de1
Compare
|
Fixed; also addressed nitpicks. |
No description provided.