GNOME Bugzilla – Bug 669022
Highlighting text changes some characters.
Last modified: 2018-05-22 14:30:03 UTC
Evince works great but it gets really messed up when highlighting certain characters. They shift to completely different characters. PDF documents, it shouldn't be that hard to reproduce just download some PDFs with specific characters like programming code, foreign languages, mathematical signs and the like. I noticed an improvement going from 3.0 to 3.2 but there are still some characters that are badly highlighted.
Please, could you share a PDF with a test case?
Created attachment 225421 [details] This document has a lot of rendering issues when highlighting. Try the caption. The A disappears and the E gets lifted. A lot of symbols are totally messed up.
This was a rendering issue in Poppler (still there are others), but the document attached renders fine (even when the text is highlighted) with 3.6.0. I am closing this bug as fixed in stable. If you reproduce the bug with 3.6.0, please feel free to re-open the bug adding more information.
I can still mess up some characters in the attached document. The caption gets lifted for instance. A lot was fixed from 3.4 though. 3.6.1 I'll keep this report open till it's fully resolved, right?
(In reply to comment #4) > I can still mess up some characters in the attached document. The caption gets > lifted for instance. A lot was fixed from 3.4 though. May you be more specific? which characters, page, position. A screenshot with the issue would help as well.
Created attachment 236719 [details] Screenshot depicting different rendering when selecting symbols I can confirm this bug with evince 3.6.0 and the sample document above. Additional information: evince: Installed: 3.4.0-0ubuntu1.4 Candidate: 3.4.0-0ubuntu1.4 poppler-utils: Installed: 0.18.4-1ubuntu3 Candidate: 0.18.4-1ubuntu3 LSB Version: core-2.0-amd64:core-2.0-noarch:core-3.0-amd64:core-3.0-noarch:core-3.1-amd64:core-3.1-noarch:core-3.2-amd64:core-3.2-noarch:core-4.0-amd64:core-4.0-noarch Distributor ID: Ubuntu Description: Ubuntu 12.04.2 LTS Release: 12.04 Codename: precise
Created attachment 236720 [details] Symbol rendering differences in second test case I also have a test case where this bug is even more pronounced (also see attached screenshot): http://dropcanvas.com/#Q5Th37m67M6u59 I don't know if I am allowed to link this document here for educational purposes. If not, please feel free to delete this post.
I'm reading lots of PDF's currently and the majority of them renders totally messed up when highlighting. Fedora 18, latest update. Download any academic paper and you'll see lots of glitches when highlighting.
(In reply to comment #8) > I'm reading lots of PDF's currently and the majority of them renders totally > messed up when highlighting. Fedora 18, latest update. Download any academic > paper and you'll see lots of glitches when highlighting. This kind of comment is not helpful at all. Rendering problems can be from different issues, for instance, all documents with non embedded Type 3 fonts, which is already reported in Poppler's bugzilla. Different document can trigger different bugs. Some of them are even related with broken PDF's. Please, if you want to see a bug fixed, be specific as specific as possible, as Florian is.
(In reply to comment #6) > Created an attachment (id=236719) [details] > Screenshot depicting different rendering when selecting symbols > > I can confirm this bug with evince 3.6.0 and the sample document above. Thanks Florian. Indeed, I can reproduce the bug in this part of the document with Evince master. It seems a problem in poppler.
I know one solution to this bug. Google Chrome renders all PDF's exactly correct when highlighting. That's because they never actually highlight anything. They just draw a blended rectangle over the PDF. I noticed that Evince actually draws an opaque rectangle and then draws digitally stored characters over. That only works if the "hidden digital characters" actually matches the PDF. A lot of old science papers seems to be scanned by AI. The actual PDF is an image but there exists hidden digital characters that can be highlighted. Especially in math formulas i noticed the PDF differs from the digital characters very often. You could do as Google and just blend the PDF a little instead of actually redrawing the text.
Created attachment 238654 [details] Evince vs Chrome The PDF may be broken but Chrome renders it correctly anyhow. Evince fails miserably.
Created attachment 238655 [details] Evince vs Chrome The PDF may be broken but Chrome renders it correctly anyhow. Evince fails miserably.
Double post..
Created attachment 245380 [details] Sample PDF This PDF (page 3+) gets totally messed up when highlighting in Document Viewer. Chrome highlights it perfectly. What about doing the same kind of highlight as Chrome?
(In reply to comment #15) > Created an attachment (id=245380) [details] > Sample PDF > > This PDF (page 3+) gets totally messed up when highlighting in Document Viewer. > Chrome highlights it perfectly. Type3 fonts. This bug is in Poppler (the PDF library).
Yes I understand that Document Viewer is only the UI, but in the end the experience is still presented through Document Viewer. As many of my examples show, Chrome has a very good method for highlighting text. Document Viewer is good but could be made significantly better by using a better and more general highlighting method.
(In reply to comment #11) > Google Chrome renders all PDF's exactly correct when highlighting. That's > because they never actually highlight anything. They just draw a blended > rectangle over the PDF. I noticed that Evince actually draws an opaque > rectangle and then draws digitally stored characters over. That only works if > the "hidden digital characters" actually matches the PDF. I see several issues with the PDF in comment #15. The most obvious is that selection regions get drawn over each other, hiding most of the text. However, I have no idea what you mean by "hidden digital characters". Have you actually gone through the PDF and poppler source code and come to that conclusion, or is that just a guess? > You could do as Google and just blend the PDF a little instead of actually > redrawing the text. I will agree with this. I have a rough patch to evince which does this. The only disadvantage I see is that text color would no longer change when selected and I'm not sure if that is a trade off the devs would like to make.
(In reply to comment #18) > (In reply to comment #11) [...] > > You could do as Google and just blend the PDF a little instead of actually > > redrawing the text. > > I will agree with this. I have a rough patch to evince which does this. The > only disadvantage I see is that text color would no longer change when selected > and I'm not sure if that is a trade off the devs would like to make. Could this be done only when there is Type 3 fonts present in the document? AFAIU, those are the problematic ones and harder (or may have less priority?) to fix in Poppler.
Chrome just draws a blended rectangle over text so there is no change in text color, yes. It still looks very nice and works well for every PDF I have tried. (In reply to comment #18) > I have no idea what you mean by "hidden digital characters". Have you actually > gone through the PDF and poppler source code and come to that conclusion, or is > that just a guess? It seems that some of the old documents I read has been kind of scanned in as pictures, and then corresponding text has been analyzed with artificial intelligence and stored as hidden "digital" text in the document, that can be copied to clipboard. I suspect this because the document looks like a picture scan, yet I can copy text from it (and the text differs in mathematical formulas which obviously was a too tough analyze for the AI). Otherwise - I'm just guessing. Haven't read a single line of code in poppler (but that doesn't make my assumptions any less true).
Created attachment 245450 [details] [review] Evince draws selection highlight instead of poppler This patch has evince draw the selection highlight instead of relying on poppler. More copy and paste than I would like, but it works. (In reply to comment #19) > Could this be done only when there is Type 3 fonts present in the document? > AFAIU, those are the problematic ones and harder (or may have less priority?) > to fix in Poppler. Yes. I think it is possible to detect type 3 fonts and change the highlighting method, but I think it is a bad idea for evince to have different behavior for different documents for non-obvious reasons.
I also think there should be priority on consistency. Blend for all document types is my vote.
Something like this is what Jonas is doing in djvu backend, see bug https://bugzilla.gnome.org/show_bug.cgi?id=448739. I'm not sure it's possible to use this code for all backends in the view, what happens if the backend doesn't implement get_selection_region or get_text_mapping?
(In reply to comment #23) > Something like this is what Jonas is doing in djvu backend, see bug > https://bugzilla.gnome.org/show_bug.cgi?id=448739 I hadn't noticed that. Guess I need to pay more attention to the mailing list. I thought of doing it in a similar way, but I didn't because EvSelectionInterface->render_selection only provides the selection surface as an argument. I see Jonas's patch adds the page surface as well. > I'm not sure it's possible to use this code for all backends in the > view, what happens if the backend doesn't implement get_selection_region > or get_text_mapping? If get_selection_region is not implemented, then selection->covered_region will be null, nothing will be added to the path, and no highlight will be drawn. If get_text_mapping is not implemented, the only difference I see is that the cursor will not change to the text selection bar, which is the same behavior as before. I can rewrite it a bit to make it more clear what's going on. In particular, I see a find_selection_for_page function, which looks like a better fit than what I'm doing now with ev_pixbuf_cache_get_selection_list. Or if you prefer, I can do it in the pdf backend, similar to how Jonas is doing it. I don't have a preference either way.
Doing it in the view would allow us to share the code between backends implementing get_selection_region, and we don't need to change the libpdfdocument API like Jonas patch does. It looks good to me as a fallback for backends that can implement get_selection_region but not render_selection, like DJVU, but for PDF I still see current selection better and more consistent with most of the other applications.
Created attachment 246497 [details] [review] Draw selection hilight from region when selection_surface is NULL Attached is an updated patch. One thing I noticed is that just setting EvSelectionInterface->render_selection to NULL will result in a lot of extra redraws as ev_pixbuf_cache_get_selection_surface tries to render the selection surface each time it is called. So I changed function new_selection_surface_needed to only request a redraw if the scale has changed.
Review of attachment 246497 [details] [review]: I think we could try to split ev_pixbuf_cache_get_selection_surface in two different methods, one to get the surface and another to get the region. In this moment it's used in two places: - draw_one_page: we are passing NULL to the region since what we want is the surface to render. - merge_selection_region: we are ignoring the returned surface since what we want is the region selected to invalidate it. ::: libview/ev-view.c @@ +4560,3 @@ + cairo_region_get_rectangle (region, i_rect, &rect); + cairo_rectangle (cr, rect.x, rect.y, rect.width, rect.height); + } You could use gdk_cairo_region() here.
*** Bug 683787 has been marked as a duplicate of this bug. ***
Created attachment 246904 [details] [review] Draw selection highlight from region (In reply to comment #27) > I think we could try to split ev_pixbuf_cache_get_selection_surface in two > different methods, one to get the surface and another to get the region. In > this moment it's used in two places: > > - draw_one_page: we are passing NULL to the region since what we want is the > surface to render. > - merge_selection_region: we are ignoring the returned surface since what we > want is the region selected to invalidate it. > > ::: libview/ev-view.c > @@ +4560,3 @@ > + cairo_region_get_rectangle (region, i_rect, &rect); > + cairo_rectangle (cr, rect.x, rect.y, rect.width, rect.height); > + } > > You could use gdk_cairo_region() here. Submitting this as two patches. First is an update to the "Draw selection highlight from region" patch.
Created attachment 246905 [details] [review] Split ev_pixbuf_cache_get_selection_surface into two functions This patch splits ev_pixbuf_cache_get_selection_surface into two functions. One to return the surface, one to return the region.
Review of attachment 246904 [details] [review]: I've added a private function to get the selection colors that is shared between the view and the pixbuf cache and pushed the patch with some other minor cosmetic changes. Thank you very much.
Review of attachment 246905 [details] [review]: Rebased and pushed to git master too. Thanks!
Is this supposed to be fixed? Highlighting text in the PDF attached in bug 683787 still changes characters for me.
It's not fixed. I thought blending was the solution but it's still opaque.
(In reply to comment #33) > Is this supposed to be fixed? Highlighting text in the PDF attached in bug > 683787 still changes characters for me. In which page of the document you are referring to? I tried different pages, and text selected looks fine to me (evince master/poppler 0.26.1)
*** Bug 715075 has been marked as a duplicate of this bug. ***
(In reply to Debarshi Ray from comment #33) > Is this supposed to be fixed? Highlighting text in the PDF attached in bug > 683787 still changes characters for me. I tried once again with Fedora 25 (evince-3.22.1, poppler-0.45.0) and everything is fine. So, WORKSFORME! Thanks for the fixes.
I can say that now is much better than years ago, BUT not perfect: today instead of wrong character, I notice a blank space when I highlight some text. I attach two screenshots.
Created attachment 360369 [details] no-highlight
Created attachment 360370 [details] with-highlight
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/evince/issues/267.