Skip to content

OCR lines order is reversed #24340

@wndenis

Description

@wndenis

I have searched the existing issues, both open and closed, to make sure this is not a duplicate report.

  • Yes

The bug

OCR returns textboxes in correct X-order, but reversed (from bottom to top) Y-order

Consider the sample:

Image

I expect the output text to be:

1. First line
2. Second line 3. Horizontal space
4. Bottom line

Or in ocr_search table it should be:

1. First line 2. Second line 3. Horizontal space 4. Bottom line

But instead, in ocr_search table we see:

Image
4. Bottom line 2. Second line 3. Horizontal space 1. First line

We see that 2 and 3 are placed correctly (since they have same Y and only change is X), but 1 and 4 are swapped.

The problem is that this reduces the quality of text search, especially in cases where our query is multiple words long - we lose trigrams near word boundaries

As a side effect of this - selecting and copying all the text from boxes in web produces unusable shuffled text:

4. Bottom line2. Second line3. Horizontal space1. First line

The OS that Immich Server is running on

Ubuntu 22.04

Version of Immich Server

v2.3.1

Version of Immich Mobile App

n/a

Platform with the issue

  • Server
  • Web
  • Mobile

Device make and model

No response

Your docker-compose.yml content

n/a

Your .env content

n/a

Reproduction steps

  1. Prepare an asset with multple lines of text
  2. Invoke OCR on this photo
  3. Enable OCR overlay in web and try to select and copy all the text across all boxes
  4. Paste the copied to any text editor
  5. See the lines order reversed

Additionally:

  1. Get the uuid of the asset from previous steps
  2. Find this asset in "ocr_search" table
    SELECT * FROM "ocr_search" WHERE "assetId" = '<your_asset_id>'
  3. See the joined text with lines order reversed

Relevant log output

Additional information

Here are the boxes for the asset:

Image
x1 y1 x2 y2 x3 y3 x4 y4 boxScore textScore text
0.13658537 0.81594205 0.3804878 0.81594205 0.3804878 0.8898551 0.13658537 0.8898551 0.7851095 0.97339 4. Bottom line
0.13658537 0.46231884 0.3902439 0.46231884 0.3902439 0.5347826 0.13658537 0.5347826 0.80898404 0.98849 2. Second line
0.4601626 0.45362318 0.800813 0.46811596 0.799187 0.5478261 0.4593496 0.53333336 0.8289639 0.99861 3. Horizontal space
0.1406504 0.10724638 0.33252034 0.10724638 0.33252034 0.17826086 0.1406504 0.17826086 0.83194405 0.99205 1. First line

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    To triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions