After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 166285 - Disable text search for PDFs without searchable text
Disable text search for PDFs without searchable text
Status: RESOLVED DUPLICATE of bug 596888
Product: evince
Classification: Core
Component: PDF
git master
Other Linux
: Normal normal
: ---
Assigned To: Evince Maintainers
Evince Maintainers
: 321177 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2005-02-04 15:02 UTC by Vincent Noel
Modified: 2009-11-14 10:26 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
libgnomeprint generated PDF (45.42 KB, application/binary)
2006-02-22 21:05 UTC, Pablo Rodríguez
Details

Description Vincent Noel 2005-02-04 15:02:03 UTC
Search does not work for a few PDFs that I have, most of them work-related.
I'm trying to find a suitable PDF that I could attach as an example.

I found that search does not work either on the first attachment of bug 112506 :
attachment #16336 [details] - but I don't know if it's for the same reason (the PDFs I
have where search doesn't work are not with Type 3 fonts).
Comment 1 Bryan W Clark 2005-02-08 00:49:16 UTC
Seems like a pattern of documents that are generated from "dvips(k) 5.86
Copyright 1999 Radical Eye Software"

Comment 2 Pablo Rodríguez 2005-02-09 17:45:07 UTC
Search and text selection don't work at all in documents with type3 fonts. Even
acroread is not able to search/select text. It is not a bug, it's a feature ;-).

Sorry if I'm saying something obvious.
Comment 3 Vincent Noel 2005-02-09 18:06:25 UTC
The PDF I'm trying to search has been created (in MS Word on windows) by
PScript5.dll and produced by GNU Ghostscript 7.06 (PDF v.1.3). Fonts are
built-in TrueType. Note that I cannot search the PDF with acroread either, so it
might indeed be a feature ;)

I cannot attach it on the website, but I could send it to someone if it helps.
Comment 4 Pablo Rodríguez 2005-02-09 23:12:22 UTC
Ghostscript versions previous to 8 do embed both TrueType and Type1 fonts, but
TT fonts don't have text within. It works as expected and it is also a nice
feature ;-)

To avoid this, you should use either Ghostscript version 8.50 (AFPL) or 8.15
(GNU) to get searchable text with embedded TrueType fonts. Integration works for
GNU Ghostscript 8.15 in ESP code (http://www.cups.org/espgs/index.php) have
begun and I guess the release could be ready by the end of this month (although
probably this is only an expression of a personal wish).
Comment 5 Vincent Noel 2005-02-09 23:16:45 UTC
Evince guys : could it be possible to detect such a situation and disable the
search ? It could be annoying for the user, but it's better than letting him
believe the search is working when it is not.
Comment 6 Vincent Noel 2005-02-09 23:17:04 UTC
Pablo : thanks for the info !
Comment 7 Pablo Rodríguez 2005-02-10 00:00:44 UTC
Vincent: I'm glad of reading that it helped.

Your report points to a very interesting point, that I hope to state clearly.

Many PDF documents generated from out of there do have type3 fonts and it is not
always easy or even possible to get the tex or dvi source. I'm not sure, but
acroread rendered type3 fonts horribly before version 7 (or 6, I'm not sure).

Type1 fonts are not also interesting for better display, but mainly for
searching  and selecting text from the PDF document. And this is sometimes
essential for some PDF documents.

gpdf has implemented a display on some PDF documents that use the Computer
Modern type3 font that renders the text using the standard TrueType font.

This would be great to implement in evince (although Martin will knows the
problem better) and even the possibility to generate a copy of the file using CM
type1 fonts instead of type3 fonts.
Comment 8 Vincent Noel 2005-03-04 18:13:52 UTC
I'm changing the bug title, as it appears this is due to the way a PDF is created.
Comment 9 Marco Pesenti Gritti 2005-05-09 10:01:05 UTC
Hmm so if I get this correctly, some pdfs doesnt have text information within.
I'm not sure disabling the find menu/control would be more clear that what we
have now. Also I think checking this would be equivalent to do a search, it
could slow down things a bit.

What about displaying "The document has no text" or something like that in the
search status bar when trying to search?
Comment 10 Pablo Rodríguez 2005-05-09 10:16:57 UTC
[Sorry for repeating what I have already written] The problem with Ghostscript 
versions previous to 8 was that they had problems to handle text information 
when using TrueType fonts (this was fixed in version 8). So what you get when 
copy/paste text is garbage. Adobe Reader has search/copy text with this 
documents enabled (and it doesn't seem to be problematic).

What it seems more interesting for me was a feature that (I think) I saw in 
previous versions of gpdf (Martin sure knows about this) that rendered a type3 
document using type1 or truetype fonts. This would be very interesting not to 
display the characters but to handle the text information more properly.
Comment 11 Bryan W Clark 2005-05-16 19:51:52 UTC
I like marco's idea better than disabling the item.

> What about displaying "The document has no text" or something like that in
> the search status bar when trying to search?

I'd go with something like "The document text cannot be searched"

Of course it would be better if it worked instead of not working :)
Comment 12 Nickolay V. Shmyrev 2005-11-10 20:55:40 UTC
*** Bug 321177 has been marked as a duplicate of this bug. ***
Comment 13 Nickolay V. Shmyrev 2005-11-10 20:57:24 UTC
Note that broken documents that evince can't searh properly should also display
this message. Look at attachment to 321177 for example of such document.
Comment 14 Federico Mena Quintero 2005-11-10 21:10:19 UTC
Can we identify documents produced by a buggy Ghostscript?  In that case, is
there some magic we can do to make the documents searchable?

Would documents produced by later Ghostscripts be searchable?

Why are some documents not searchable - do they store glyph IDs within a font
and there's no way to get back the characters?

If this all sounds very naive, it's because I don't know how PDF works :)
Comment 15 Pablo Rodríguez 2006-02-22 21:05:47 UTC
Created attachment 59956 [details]
libgnomeprint generated PDF
Comment 16 Pablo Rodríguez 2006-02-22 21:06:13 UTC
Federico, I don't know what is wrong with GS < 8 and the embedding of non-standard TrueType fonts.

But GS > 8 only produces PDF documents with searchable text in Windows. In GNOME (and ESP Ghostscript 8.15.1), using the “Create PDF document” from the Print dialog of gedit generates a PDF document with no searchable text (see attached gedit-output.pdf).

I thought it was a ESP GS bug and I filled a bug (http://www.cups.org/espgs/str.php?L1325+P0+S-2+C0+I10+E0+Q). I was asked to provide the commands to generate the PDF file. Since it is libgnomeprint the one that invokes GS, I can't provide them.

Is there any way to know how libgnomeprint invokes GS in order to report this to ESP GS developers and check whether it is ESP GS or libgnomeprint the buggy application?

Thanks,


Pablo
Comment 17 Emmanuel Fleury 2009-11-14 10:26:29 UTC

*** This bug has been marked as a duplicate of bug 596888 ***