-
-
Notifications
You must be signed in to change notification settings - Fork 17
Add versification warnings for invalid chapter or verse numbers in USFM #381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #381 +/- ##
==========================================
+ Coverage 72.73% 72.79% +0.06%
==========================================
Files 423 423
Lines 36029 36089 +60
Branches 4969 4974 +5
==========================================
+ Hits 26205 26271 +66
+ Misses 8733 8727 -6
Partials 1091 1091 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
39cf6ef to
5ca0162
Compare
ddaspit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warning messages are a bit awkward. How difficult would it be to generate messages that are more appropriate for the new warnings?
@ddaspit reviewed 2 files and all commit messages, and made 1 comment.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @Enkidu93).
pmachapman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warning messages are a bit awkward. How difficult would it be to generate messages that are more appropriate for the new warnings?
Good idea. That requires a modification to Serval - see sillsdev/serval/#871
@pmachapman made 1 comment.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @Enkidu93).
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Enkidu93 reviewed 2 files and all commit messages, and made 5 comments.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @pmachapman).
src/SIL.Machine/Corpora/UsfmVersificationErrorDetector.cs line 49 at r1 (raw file):
} public UsfmVersificationError(
Rather than adding another constructor, is there way to just check the error type in CheckError() itself?
src/SIL.Machine/Corpora/UsfmVersificationErrorDetector.cs line 94 at r1 (raw file):
return true; } if (!_verseRef.Value.Valid)
I'm surprised that these kinds of errors (especially the bad verse numbers) aren't captured in the verse ref's valid status somehow 🤔. It is what it is, but it'd certainly be nice if it were haha.
src/SIL.Machine/Corpora/UsfmVersificationErrorDetector.cs line 128 at r1 (raw file):
Type == UsfmVersificationErrorType.ExtraVerse || Type == UsfmVersificationErrorType.InvalidChapterNumber || Type == UsfmVersificationErrorType.InvalidVerseNumber
I wonder if we could supply an expected ref. We should have some idea of what verse or chapter is expected next while parsing. If we can make a reasonable guess, I think we should add it for consistency with our attempts for other types of errors, but if we can't no worries. If the bad chapter number is something like \c 1^, that's pretty easy to find without an expected ref, but what about something like \c text: Is that possible? That would not be quite so easy to track down.
src/SIL.Machine/Corpora/UsfmVersificationErrorDetector.cs line 292 at r1 (raw file):
// See whether the chapter number is invalid VerseRef verseRef = state.VerseRef.Clone();
Why do we need to do this? Does the number get changed when the state is updated so we need the raw value itself? It would be nice if we could just pass the bad verse ref to the existing constructor and check for an error in CheckError() - same with invalid verses. If we have to do it, we have to do it, but it'd be nice to have the error checking all in one place.
src/SIL.Machine/Corpora/UsfmVersificationErrorDetector.cs line 332 at r1 (raw file):
{ _errors.Add(versificationError); verseInError = true;
What's the particular duplicate error situation you're trying to avoid?
pmachapman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmachapman made 5 comments.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @Enkidu93).
src/SIL.Machine/Corpora/UsfmVersificationErrorDetector.cs line 49 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Rather than adding another constructor, is there way to just check the error type in
CheckError()itself?
I did that originally, but because we don't have an expected verse, and when I passed a verseRef it met the other conditions in CheckError(), it got confused with the other warnings. I figured another constructor made sense, as I also needed the actual string value to display to the user so they could resolve it. (If we don't show the actual string value to the user, the verse ref will only store "-1" for an invalid chapter, which isn't helpful to return in an error message)
src/SIL.Machine/Corpora/UsfmVersificationErrorDetector.cs line 94 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
I'm surprised that these kinds of errors (especially the bad verse numbers) aren't captured in the verse ref's valid status somehow 🤔. It is what it is, but it'd certainly be nice if it were haha.
_verseRef.Value.Valid will return false, but as it is not clear just what the cause is (i.e. there is no comparable enum to UsfmVersificationErrorType returned by VerseRef), I deal with the invalid chapter/verse error just above this logic.
src/SIL.Machine/Corpora/UsfmVersificationErrorDetector.cs line 128 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
I wonder if we could supply an expected ref. We should have some idea of what verse or chapter is expected next while parsing. If we can make a reasonable guess, I think we should add it for consistency with our attempts for other types of errors, but if we can't no worries. If the bad chapter number is something like
\c 1^, that's pretty easy to find without an expected ref, but what about something like\c text: Is that possible? That would not be quite so easy to track down.
I tried that, but it is hard to tell what it should have been. Chapters can be skipped in a USFM file. If we were to parse the string and guess, then we might as well not throw an error and use the parsed string?
src/SIL.Machine/Corpora/UsfmVersificationErrorDetector.cs line 292 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Why do we need to do this? Does the
numberget changed when the state is updated so we need the raw value itself? It would be nice if we could just pass the bad verse ref to the existing constructor and check for an error inCheckError()- same with invalid verses. If we have to do it, we have to do it, but it'd be nice to have the error checking all in one place.
I get the raw value and pass it into the error object so we can return it in the error message, so that the error message doesn't just say something like "invalid chapter number -1" but will say the actual value that caused the error. (the VerseRef struct does not record what the invalid chapter number value is).
src/SIL.Machine/Corpora/UsfmVersificationErrorDetector.cs line 332 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
What's the particular duplicate error situation you're trying to avoid?
A missing verse error
ddaspit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ddaspit made 1 comment.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @Enkidu93).
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Enkidu93 made 5 comments and resolved 4 discussions.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @pmachapman).
src/SIL.Machine/Corpora/UsfmVersificationErrorDetector.cs line 49 at r1 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
I did that originally, but because we don't have an expected verse, and when I passed a verseRef it met the other conditions in
CheckError(), it got confused with the other warnings. I figured another constructor made sense, as I also needed the actual string value to display to the user so they could resolve it. (If we don't show the actual string value to the user, the verse ref will only store "-1" for an invalid chapter, which isn't helpful to return in an error message)
I see, and you couldn't easily add logic there to differentiate between the cases?
That's interesting. For verses, you can still access the bad verse value via vref.Verse (even though vref.VerseNum doesn't give you any clues), but not so with chapters? It's strange to me that they are handled differently.
src/SIL.Machine/Corpora/UsfmVersificationErrorDetector.cs line 94 at r1 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
_verseRef.Value.Validwill return false, but as it is not clear just what the cause is (i.e. there is no comparable enum toUsfmVersificationErrorTypereturned byVerseRef), I deal with the invalid chapter/verse error just above this logic.
I see. So even if you grab the verse's ValidStatus enum, it doesn't clue you in? If you're getting vref.Valid == false, then ValidStatus must be set to something other than ValidStatus.Valid - I wonder what it is 🤔
src/SIL.Machine/Corpora/UsfmVersificationErrorDetector.cs line 128 at r1 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
I tried that, but it is hard to tell what it should have been. Chapters can be skipped in a USFM file. If we were to parse the string and guess, then we might as well not throw an error and use the parsed string?
Well, that is true - that could be 2.0 😆. The USFM could skip chapters or verses, but if it is skipping, then that's the sort of thing we'd want to be catching anyways since this checking should all use the versification as the gold standard. I was just thinking that we could use some of the functions in ScrVers to see what the next expected reference is (e.g. something similar to what's done in ScrVers.FirstIncludedVerse() but only looking at subsequent verses). If it's impossible to have totally uninterpretable verses/chapters like \c text or \v word123word, then I'm fine with not having an expected ref, but if it is possible, I think we ought to include something like that since I believe we can make a reasonable guess and there would be no other way for the error to be actionable without running Ctrl+F in every USFM file.
src/SIL.Machine/Corpora/UsfmVersificationErrorDetector.cs line 292 at r1 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
I get the raw value and pass it into the error object so we can return it in the error message, so that the error message doesn't just say something like "invalid chapter number -1" but will say the actual value that caused the error. (the VerseRef struct does not record what the invalid chapter number value is).
I see. Makes sense! Like I said above, it'd odd to me that the bad chapter value can't be accessed like the verses can, but we'll make do!
src/SIL.Machine/Corpora/UsfmVersificationErrorDetector.cs line 332 at r1 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
A missing verse error
OK, yeah, I see. I was just wondering if it were worth allowing both errors to register since they're capturing different problems. And if a user got two warnings like Missing MAT 1:2 and Invalid verse MAT 1:schmutz, that may be helpful in locating/understanding the problem.
Fixes: #372
Example USFM:
Example warnings:
This change is