-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Open
Description
I have searched the existing issues, both open and closed, to make sure this is not a duplicate report.
- Yes
The bug
OCR returns textboxes in correct X-order, but reversed (from bottom to top) Y-order
Consider the sample:
I expect the output text to be:
1. First line
2. Second line 3. Horizontal space
4. Bottom line
Or in ocr_search table it should be:
1. First line 2. Second line 3. Horizontal space 4. Bottom line
But instead, in ocr_search table we see:
4. Bottom line 2. Second line 3. Horizontal space 1. First line
We see that 2 and 3 are placed correctly (since they have same Y and only change is X), but 1 and 4 are swapped.
The problem is that this reduces the quality of text search, especially in cases where our query is multiple words long - we lose trigrams near word boundaries
As a side effect of this - selecting and copying all the text from boxes in web produces unusable shuffled text:
4. Bottom line2. Second line3. Horizontal space1. First line
The OS that Immich Server is running on
Ubuntu 22.04
Version of Immich Server
v2.3.1
Version of Immich Mobile App
n/a
Platform with the issue
- Server
- Web
- Mobile
Device make and model
No response
Your docker-compose.yml content
n/aYour .env content
n/aReproduction steps
- Prepare an asset with multple lines of text
- Invoke OCR on this photo
- Enable OCR overlay in web and try to select and copy all the text across all boxes
- Paste the copied to any text editor
- See the lines order reversed
Additionally:
- Get the uuid of the asset from previous steps
- Find this asset in "ocr_search" table
SELECT * FROM "ocr_search" WHERE "assetId" = '<your_asset_id>' - See the joined text with lines order reversed
Relevant log output
Additional information
Here are the boxes for the asset:
| x1 | y1 | x2 | y2 | x3 | y3 | x4 | y4 | boxScore | textScore | text |
|---|---|---|---|---|---|---|---|---|---|---|
| 0.13658537 | 0.81594205 | 0.3804878 | 0.81594205 | 0.3804878 | 0.8898551 | 0.13658537 | 0.8898551 | 0.7851095 | 0.97339 | 4. Bottom line |
| 0.13658537 | 0.46231884 | 0.3902439 | 0.46231884 | 0.3902439 | 0.5347826 | 0.13658537 | 0.5347826 | 0.80898404 | 0.98849 | 2. Second line |
| 0.4601626 | 0.45362318 | 0.800813 | 0.46811596 | 0.799187 | 0.5478261 | 0.4593496 | 0.53333336 | 0.8289639 | 0.99861 | 3. Horizontal space |
| 0.1406504 | 0.10724638 | 0.33252034 | 0.10724638 | 0.33252034 | 0.17826086 | 0.1406504 | 0.17826086 | 0.83194405 | 0.99205 | 1. First line |
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
To triage