After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 696030 - Handle spaces and hyphenation when search pdf
Handle spaces and hyphenation when search pdf
Status: RESOLVED OBSOLETE
Product: evince
Classification: Core
Component: PDF
3.6.x
Other Linux
: Normal enhancement
: ---
Assigned To: Evince Maintainers
Evince Maintainers
: 686045 750579 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2013-03-18 05:09 UTC by Sven
Modified: 2018-05-22 15:00 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Sven 2013-03-18 05:09:38 UTC
Hi,

This is similar to Bug 598759, but I would like to skip something more complex.

First of all, I have no idea what happens in the background. For example, I have no idea of how evince finds out that there is some sort of space between two letters or words. I could imagine, that PDF files don't contain any information like "there's space here" and that some (bad) heuristic is at work.

Anyhow:

1) Searching for a single word:

When searching for a single word, evince often fails to find it. I don't know why, but copy/pasting the word from the PDF reveiled, that evince things that there are some spaces between the letter. Searching for the same word and adding some spaces here and there fixes makes evince find it. But you can imagine, that I don't want to guess the locations of spaces in order to find a word.

2) Searching across lines:

Evince could ignore hyphens at the end of a line, if a word has been hyphenated.
Of course, it could be a composite word like "in-between" that has been split into "in-" and "between". So just stripping all hyphens at the end of a line won't do.

3) Searching multiple words:

When I enter "the king is dead", I guess what evince does is to search for that string in the PDF. If it is spread among multiple lines, evince won't find it. If the PDF reports that two spaces are between "the" and "king", then evince won't find it.


Well, Adobe Reader implements all of the above and probably much more.
Comment 1 José Aliste 2013-03-18 15:40:29 UTC
Hi, 

1) is certainly a clear bug that we should fix. Please attach some pdf and steps to reproduce. 

About 2) and 3), it seems to me they are the same no? if so, they are other bugs about it, so we should discuss this in the other bugs.
Comment 2 Germán Poo-Caamaño 2013-06-15 08:18:07 UTC
*** Bug 686045 has been marked as a duplicate of this bug. ***
Comment 3 Daniel Kahn Gillmor 2015-08-11 16:51:59 UTC
evince actually knows enough to collapse the hyphenation -- at least in some parts of the display.

I'm looking at: http://www.andrew.cmu.edu/user/danupam/dtd-pets15.pdf in evince 3.16.1.

If i click on the search magnifying glass button, and i type "big" in the searchbox, i see three hits in the left-hand pane:

   of bigotry. Given the pervasive… 14
   President, "Big data: Seizing o… 16
   docs/big_data_privacy_report… 16

but if i add an "o" to the end of my search term (making it "bigo") evince removes all hits, saying "Not found".  It's weird to see it correct in the left-hand pane and then watch it disappear as i make the filter more specific.

This would be a useful improvement!
Comment 4 Germán Poo-Caamaño 2015-11-14 16:17:30 UTC
*** Bug 750579 has been marked as a duplicate of this bug. ***
Comment 5 GNOME Infrastructure Team 2018-05-22 15:00:51 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/evince/issues/333.