GNOME Bugzilla – Bug 583809
allow only showing OCR'ed text layer instead of the image background layer
Last modified: 2018-05-22 13:33:13 UTC
Certain PDFs have horrible text quality/resolution because they have been scanned and then OCR'ed. Selecting the text makes it much more legible, showing that evince actually has access to a hidden "layer" of OCR'ed, computer-readable text. I would really love being able to tell evince to just show this layer. I don't give a darn about the original "image" layer of text, it makes things unpleasant to read.
Created attachment 135329 [details] test case This document has both "text as image" (shown as default) and computer/eye-friendly text (revealed when selected)
Created attachment 135330 [details] screenshot Comparing selected text (much better) to the text you see when not selected (horrible).
They are not layers, the thing is that poppler use different methods to render text and selected text.
But then why does acroread render the same "ugly" version of the text?
It's because it does things properly and strictly follows the spec, using embedded not very nice fonts :)
Again, sorry for being so clueless about this, but that doesn't sound quite right/match what I'm experiencing; I don't understand how embedded fonts can possibly look so horrible. Besides, I did a new experiment. I right-clicked the first-page in that document, and I had an option to save the image. Saved to PNG and it looks exactly like the crappy default output that we see. It really looks like it's using an image instead of actual text. And if there wasn't an "image", I wouldn't have this option to save it in the popup menu anyway.
Created attachment 135341 [details] saved background image from the first page
Created attachment 135343 [details] saved background image from the first page Whoops, wrong page. Here's the actual first page.
Hm, we might be wrong indeed. There might be an image and the text below it. But this case is so specific and hard to define I really wonder what should we do to help here. It could be only "Try to fix this broken ABBYY Finereader crap" tool item.
Just making sure, did you mean - add a togglable option to ignore the image layer and show the text data? - "try to fix ABBY Finereader"? if that's the case, not possible, it's proprietary.
Created attachment 147567 [details] jstor pdf. Drag the background to a new evince and it looks better. The reporter posted a pdf with a bad quality image in the background. But, if the background image is good quality, it still looks bad until you save it. All jstor.org (scholarship archive site) pdfs have this problem it seems. I have also posted this image to https://bugs.freedesktop.org/show_bug.cgi?id=5589 A workaround would be greatly appreciated until poppler is fixed (if ever, it is an old bug). I see this problem quite often.
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/evince/issues/89.