After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 325189 - text selection doesn't follow columns
text selection doesn't follow columns
Status: RESOLVED NOTGNOME
Product: evince
Classification: Core
Component: general
2.24.x
Other All
: Normal normal
: ---
Assigned To: Evince Maintainers
Evince Maintainers
: 325457 333967 360722 372908 481825 494078 500352 507523 514150 526379 582415 588476 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2005-12-29 00:54 UTC by wim
Modified: 2015-06-23 22:36 UTC
See Also:
GNOME target: ---
GNOME version: 2.23/2.24


Attachments
A Pdf with Hebrew on Two Columns. (48.73 KB, application/pdf)
2012-02-06 11:33 UTC, oz
Details
Article with 2 columns (114.93 KB, text/unknown)
2012-02-06 17:23 UTC, oz
Details

Description wim 2005-12-29 00:54:24 UTC
Please describe the problem:
When selecting text in a pdf document with columns, text is selected in both
columns simultaniously.



Steps to reproduce:
1. Open a pdf document which contains columns (like
http://www.csr-asia.com/upload/csrasiaweeklyvol1week48a.pdf)
2. Try to select a piece of text from the left collumn



Actual results:
Text is selected in the left _and_ the right column

Expected results:
Only select text in the left column

Does this happen every time?
yes

Other information:
example pdf: http://www.csr-asia.com/upload/csrasiaweeklyvol1week48a.pdf
Comment 1 Nickolay V. Shmyrev 2005-12-29 16:22:41 UTC
Thanks, really a problem. It's certainly not easy to fix, but let's hope someone will work it out.
Comment 2 Nickolay V. Shmyrev 2006-03-09 07:28:52 UTC
*** Bug 333967 has been marked as a duplicate of this bug. ***
Comment 3 Carlos Garcia Campos 2006-11-09 14:51:35 UTC
*** Bug 372908 has been marked as a duplicate of this bug. ***
Comment 4 Carlos Garcia Campos 2006-11-09 14:56:22 UTC
*** Bug 325457 has been marked as a duplicate of this bug. ***
Comment 5 Carlos Garcia Campos 2006-11-09 15:15:21 UTC
*** Bug 360722 has been marked as a duplicate of this bug. ***
Comment 6 Carlos Garcia Campos 2007-09-30 10:47:05 UTC
*** Bug 481825 has been marked as a duplicate of this bug. ***
Comment 7 Carlos Garcia Campos 2007-11-06 10:30:09 UTC
*** Bug 494078 has been marked as a duplicate of this bug. ***
Comment 8 Carlos Garcia Campos 2007-11-29 09:17:37 UTC
*** Bug 500352 has been marked as a duplicate of this bug. ***
Comment 9 Carlos Garcia Campos 2008-01-13 09:32:01 UTC
*** Bug 507523 has been marked as a duplicate of this bug. ***
Comment 10 Nickolay V. Shmyrev 2008-02-04 06:38:24 UTC
*** Bug 514150 has been marked as a duplicate of this bug. ***
Comment 11 Jean-François Fortin Tam 2008-02-04 13:54:31 UTC
Which is the upstream bug? This one? https://bugs.freedesktop.org/show_bug.cgi?id=3188 and depending on bug #165155 ?
Comment 12 Carlos Garcia Campos 2008-04-06 13:31:20 UTC
*** Bug 526379 has been marked as a duplicate of this bug. ***
Comment 13 tomas.kloucek 2008-04-09 19:44:44 UTC
I have same problem for example with this pdf file: http://www.dehn.de/www_DE/PDF/blitzplaner08_e/Chapters/BBP_E_Chapter_07.pdf

 ... but it is almost rule, almost every pdf file which is devided to more columns text is not marked properly :(
Comment 14 Kouzinopoulos Charis 2008-11-03 09:12:28 UTC
Does this bug still exists?
Comment 15 Nickolay V. Shmyrev 2008-11-03 09:39:19 UTC
Sure, and it's pretty annoying, it's a poppler bug though.
Comment 16 Chris Sherlock 2008-11-03 12:48:09 UTC
Ah. Should we be relogging it somewhere else?
Comment 17 Nickolay V. Shmyrev 2008-11-04 21:16:53 UTC
I think it's already in poppler bugzilla.

https://bugs.freedesktop.org/show_bug.cgi?id=4006
Comment 18 Teppo Turtiainen 2008-12-27 17:30:40 UTC
Still occurs with 2.24.1.
Comment 19 Brian Ewins 2009-04-12 00:38:47 UTC
I've posted a patch for this upstream in https://bugs.freedesktop.org/show_bug.cgi?id=3188 - I can't get the jhbuilt-evince to open pdfs though (even before the patch) and had to test with epdfview. It'd be great if someone could test if the bug fix works in evince too?
Comment 20 Nickolay V. Shmyrev 2009-05-17 14:06:03 UTC
*** Bug 582415 has been marked as a duplicate of this bug. ***
Comment 21 Carlos Garcia Campos 2009-08-08 14:56:20 UTC
*** Bug 588476 has been marked as a duplicate of this bug. ***
Comment 22 Johan Brannlund 2009-11-07 00:24:23 UTC
Brian, I just tested your patch against poppler 0.12. Selecting text in evince and pasting into a text editor more or less works, but the actual selection in evince sometimes behaves a little weird. 

The effect is hard to describe, but when I move the mouse, the highlight showing the selected text sometimes behaves in ways that are unexpected to me at least.

I tested with the HE-News-Winter-2009.pdf document attached at 
https://launchpad.net/ubuntu/+source/poppler/+bug/33288.
Comment 23 Brian Ewins 2009-11-07 10:22:46 UTC
The patch makes the selection follow reading order. Whats happened with that document is that reading order has been badly misidentified (so /any/ text extraction from that document with poppler will look odd). The reading order it has inferred for page 1 is this:

Col 1: para 1, 2, 3
Col 2: para 1
Col 3: whole column, in correct order
Col 4: para 6 5 3 4 1 2 (!)
Col 2: para 2
Col 1: para 4
Remainder of Col 2, Remainder of Col 1.

You can confirm this by starting a selection in the first para and moving the mouse into each of those paragraphs - you'll see previous paras in that list remain selected.

This is obviously nonsense, but it is a separate bug from not being able to select in reading order at all. IIRC the issue with this particular document is the very ragged right justification, poppler is attempting to identify columns line by line and the varying column gap triggers this bad behaviour. The suggestion is that we take a hint from ocropus and identify gutters first, its a more robust way of finding columns.

NB there will always be pathological document examples. If we attempt to use rectangular gutters, documents that flow text around circular inclusions will not work well, for example.
Comment 24 Johan Brannlund 2009-11-07 23:19:51 UTC
Oh, I see. Another thing I just noticed is that the patch makes Evince segfault when loading pdf files with no text.
Comment 25 Brian Ewins 2009-11-09 01:19:44 UTC
Thanks for spotting that. Code was missing an 'if (!flows) return;' at the start of TextPage::visitSelection. I'll follow up with a replacement patch at freedesktop.org
Comment 26 Brian Ewins 2009-11-13 10:17:34 UTC
I've uploaded an updated patch series to https://bugs.freedesktop.org/show_bug.cgi?id=3188 , with corrections to selection and reading order. Those of you who can apply these and rebuild evince might want to give this a go? Comments over there please! For me it fixes up selection for most (but not all) of the documents on the various dupes of this bug (including the one Johan mentions above).

Caveats: doesn't cover RTL or documents with rotated text.

BTW is there a pool of test documents for evince? Poppler doesn't seem to have any unit testing going on at all, I could do with seeing some RTL docs.
Comment 27 Arand 2010-02-20 03:26:03 UTC
I've built and tested the patches on poppler_0.12.0-0ubuntu2.1 and it is
definitely an improvement.
(deb packages for ubuntu 9.10 are available from 
https://launchpad.net/~arand/+archive/poppler, (nil safety included))

For more info see the freedesktop bug linked by Brian Ewins above.
Comment 28 oz 2012-02-06 11:33:20 UTC
Created attachment 206874 [details]
A Pdf with Hebrew on Two Columns. 

The text is Free and it can be added to the test Pool
Comment 29 oz 2012-02-06 11:36:50 UTC
Please make this bug high priority. It is been here since long and I am afraid to say that anyone working with scientific articles which will bump into this bug will be frustrated with GNOME.
This little issue has prevented me from delivering Live CD to fellows of mine. Becuase, when they once get really bad impression of GNOME, it become hard to convince them that Free Software can do a good job.

Cheers, Oz
Comment 30 oz 2012-02-06 17:16:13 UTC
Ok, it seems like with evince 3.2 the situation is a bit better (libpoppler 0.16.7 on Debian). However, there are still some old PDF were text selection is not done correctly (I can do the selection properly with Okular, Foxitreader and Acroread on the same system). 
See for example the article I attached.
Comment 31 José Aliste 2012-02-06 17:20:52 UTC
this is a poppler bug, and you can see it in Okular too (just checked... you need to use the text selection tool). Unfortunately, there is not so much we can do as the code in question is just an heuristic (as the pdf spec does not involve text copy and text selection properly)
Comment 32 oz 2012-02-06 17:23:51 UTC
Created attachment 206916 [details]
Article with 2 columns

Here is an article which is currently not working correctly with Evince (v.3.2).
Comment 33 oz 2012-02-06 17:35:27 UTC
@Jose, 
Thanks for the reply. Yep you are right. The text selection tool in Okular is as dumb as Evince's. However, until now, I didn't even know it exists. I always use the "Area select tool" which does not have this buggy behavior.

I can select the text proprelly in okular (Okular
Version 0.12.5 Using KDE Development Platform 4.6.5 (4.6.5)).

Yes, I already realized that is a bug in poppler. However, selection mechanism in Okular is different. And it seems that the guys behind Foxit and Adobe know something we don't know, but their text selection tool is working.  

Until, this bug is fixed in Poppler, can we have an area selection tool in evince like in Okular ?
Comment 34 Luke Hutchison 2012-02-06 20:16:09 UTC
(In reply to comment #31)
> this is a poppler bug, and you can see it in Okular too (just checked... you
> need to use the text selection tool). Unfortunately, there is not so much we
> can do as the code in question is just an heuristic (as the pdf spec does not
> involve text copy and text selection properly)

How does the current heuristic work?

For a simple method that should work in most cases, I would suggest building a
histogram of inter-word spacings, and using the Otsu binary thresholding method
(extremely fast and parameter-free) to separate the inter-word spacing into two
peaks -- then look to see if the between-peak variance is significantly higher
than the sum of within-peak variances, and if it is, then you have N columns
with approximately equal spacing between them, and where the spacing is much
greater than the word size.

Another heuristic to apply as a last resort is to simply only select words
within the user's dragged box, and nothing outside it. That would at least let
a user select one column at a time without words from adjacent columns being
pulled in. I don't think it's unreasonable to expect a user to drag a box over
the entire text region they want to select.
Comment 35 Frédéric Parrenin 2015-06-23 02:55:27 UTC
I confirm this issue is still present using the document provided iusing comment 32 in evince 3.10 on ubuntu 14.04.
When one tries to select a piece of text in the first column, the corresponding lines in the second columns are also selected.
Indeed, foxit reader performs far better on this file so there should be a better solution than the current one.
Comment 36 Germán Poo-Caamaño 2015-06-23 22:36:51 UTC
As this has been mentioned before, this is a bug in poppler.

Closing this one as NOTGNOME.