After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 742559 - Printing djvu file to PS file produces an awkward PS file
Printing djvu file to PS file produces an awkward PS file
Status: RESOLVED OBSOLETE
Product: evince
Classification: Core
Component: printing
3.14.x
Other Linux
: Normal normal
: ---
Assigned To: Evince Maintainers
Evince Maintainers
Depends on:
Blocks:
 
 
Reported: 2015-01-07 21:59 UTC by madbiologist
Modified: 2018-05-22 16:03 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description madbiologist 2015-01-07 21:59:34 UTC
Printing the attached file to a physical printer on Evince 3.14.1-0ubuntu1 takes 1 hour and 20 minutes.

Printing to a PS file only takes a few seconds.

However, the conversion of this PS file to PDF via either Evince or ps2pdf takes 1 hour and 17 minutes.

In Windows Vista, if one takes this resultant PDF and prints to XPS via Adobe Reader X, that takes ~10 minutes.

If one opens the resultant PDF in Evince and attempts to print to file (PDF) the preparing to print message took minutes to make 1 page of progress. However, it ultimately finished within 10 minutes.

Overall, given the resultant PDF is so massive for a PDF (100MB+) it's not a shocker it's taking so long to permute through all these different printing methods.

Running the "top" command while ps2pdf is running shows that gs from the ubuntu USER is at the top of the list with 100% use of one of my four CPU cores (single CPU state/separate CPU states can be toggled by pressing "1" while top is running). SHaRed memory was fixed at 3776 KiB, while VIRTual memory size and RESident memory size steadily increased to over 100000 KiB each after 10 minutes and continued to steadily increase to over 200000 KiB before the conversion completed.

Sending the PostScript file directly to a native PostScript printer, for example using the command

nc -w1 <IP address of printer> 9100 < Grammar\ 4.ps

leads to a printout in a reasonable time (2-3 pages quickly one after each other, then 5 seconds pause, 2-3 pages again and so forth).

It would seem that evince generates an awkward PostScript file AND GhostScript is really TOO slow, meaning that there is room for improvement/fixing in Evince and Ghostscript.

Is it possible to make the djvu software used by evince generate better, easier to process PostScript?

I will file a separate bug for the alternative solution of evince being able to directly convert djvu into PDF.  After all, PDF is THE standard format for printable documents under all operating systems.
Comment 1 madbiologist 2015-01-07 22:29:25 UTC
Bug 742561 filed for alternative solution of evince being able to
directly convert djvu into PDF.
Comment 2 madbiologist 2015-01-13 14:17:29 UTC
Analysis by Ken Sharp, a ghostscript developer:

The PostScript is pretty nearly a pathological case for pdfwrite. It seems that for every page a new (type 3) font is created, and at the end of the page specifically discarded (this is almost unheard of in PostScript programming). The page, which seems to be originally a bitmap, is then reconstructed by drawing each 'glyph' (in reality a bitmap). This includes all the page 'furniture' such as boxes and images, as well as actual text.

The glyphs are shown using the 'glyphshow' operator, which is ordinarily a rarely used operator (though this is the second Linux application which makes extensive use of it that I've seen). Basically this is laziness on the part of the PostScript producer. Rather than produce properly encoded fonts and use the various show operators, they just pull glyphs directly from a huge font.

Now for PostScript that's fine, and although its lazy and ugly it will work. The problem for PDF is that there is *NO* equivalent to glyphshow in PDF. This means comparisons against PostScript rendering aren't useful.

The basic problem is that fonts in PDF *must* be accessed by an Encoding which limits them to 255 glyphs, while the glypshow operator can use arbitrarily large fonts. So we need to create multiple PDF fonts to reproduce the PostScript usage. I see approximately 1500 fonts being created for 150 pages.
Comment 3 madbiologist 2015-01-13 14:19:27 UTC
And from Ray Johnston:

Since djvu does text with glyphshow rather than doing OCR and producing a valid encoding, then using show, the text in the resulting PDF is totally unsearchable and copy/paste from the PDF will be garbage.

I don't know if this djvu file has the optional "OCR" layer, but if so, it isn't making it into the PostScript, since that would have to use show and define the appropriate Encoding.
Comment 4 José Aliste 2015-01-13 14:30:03 UTC
where is the djvu file?
Comment 5 madbiologist 2015-01-13 15:28:34 UTC
Oops, sorry.  I didn't notice that the attachment had failed due to the file being too large.  Trying to compress it with gzip made no difference in size and trying to compress it with bzip2 actually increased the size.  The file is available from https://bugs.launchpad.net/ubuntu/+source/evince/+bug/525161/+attachment/4284552/+files/Grammar%204.djvu

If that link is unsuitable, do you have an ftp server that I can upload the file to?
Comment 6 James Cloos 2015-01-13 23:19:17 UTC
That djvu, according to djvudump(1), appears to be just one bilevel JB2 image per page.

Converting to ps ought to convert each image to a CCITFax image.

It may grow compared to JB2, but should not grow too much.
Comment 7 Germán Poo-Caamaño 2015-01-13 23:29:59 UTC
(In reply to comment #6)
> That djvu, according to djvudump(1), appears to be just one bilevel JB2 image
> per page.
> 
> Converting to ps ought to convert each image to a CCITFax image.

If this is the case, then this recent (discussion with patches) about JBIG2 support in CairoOutputDev might help:
http://lists.freedesktop.org/archives/poppler/2015-January/011204.html
(although that is specific to Poppler)
Comment 8 madbiologist 2015-01-14 15:12:36 UTC
Thanks Germán.  That actually looks relevant to the other bug I filed for the alternative solution of evince being able to directly convert djvu into PDF - bug 742561
Comment 9 James Cloos 2015-01-14 16:40:14 UTC
Unfortunately, after I sent that, and after some further digging, I found that djvups creates the same glyphshow-style output and that djvu’s JB2 is not (quite) Jbig2.  It was the original proposal for jbig2, but the latter is different.

Since Evince needs to decompress the page to view it, it would be best if it were to do such decompression and re-compress when using cairo to generate pdf or ps. At least for bi-level djvu pages.

OTOH the current ps is fine for printing.  Perhaps the above only should be done when converting to pdf?
Comment 10 GNOME Infrastructure Team 2018-05-22 16:03:13 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/evince/issues/548.