After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 504913 - Memory usage on certain PDFs makes Evince unusable
Memory usage on certain PDFs makes Evince unusable
Status: RESOLVED FIXED
Product: evince
Classification: Core
Component: general
git master
Other All
: Urgent critical
: ---
Assigned To: Evince Maintainers
Evince Maintainers
: 469235 474477 475612 487223 512438 516067 521036 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2007-12-21 18:30 UTC by Alon Zakai (kripken)
Modified: 2009-04-28 14:12 UTC
See Also:
GNOME target: ---
GNOME version: 2.19/2.20


Attachments
Patch to cache unseen pages adaptively, depending on free memory (10.28 KB, patch)
2007-12-21 18:32 UTC, Alon Zakai (kripken)
none Details | Review

Description Alon Zakai (kripken) 2007-12-21 18:30:32 UTC
Please describe the problem:
On certain PDFs - in particular, PDFs with scanned pages or images -- Evince spikes memory very highly when rendering. Into the order of hundreds of megabytes. After rendering it stabilizes on a lower but still high number, also hundreds of megabytes.

Steps to reproduce:
I would attach an example PDF, but sadly they are copyrighted (academic journal papers). From discussion, it appears that many other people have seen this, so I hope such a file is not needed. If that is not the case, contact me, and I will make an effort to find a file.

1. Load a PDF with many scanned pages.
2. View memory usage.


Actual results:
200+ MB of RAM can be used on even small PDFs (under 1MB in size).

Expected results:
I would hope to see something more similar to the memory usage of xPDF or Acrobat. Both do ok on these types of files. In fact I am forced to use them now because Evince simply will not run without using up all my memory (I have 512MB).

Does this happen every time?
On the same file, yes.

Other information:
I have talked to Carlos Garcia Campos about this, and it appears caching is the issue. Evince currently caches 2 pages forward and back, making it so you have at least 5 pages rendered in memory at all times. If each page is ~50-100MB, this can add up to quite a lot.

Consequently I tried to write a patch to address this issue. I will attach it below.

The patch just makes the current system adaptive. It doesn't implement
a new LRU caching method or anything like that. What it does, is let
you pick 'adaptive caching' or not. If not, then the old method is
used. If you do pick adaptivity, then there are a number of slots for
caching ahead and back. These will only be used if memory allows. When
memory is low, few or none of these slots will be used. The method
will also prioritize them, that is, if you have 4 cached pages
backwards (you are scrolling forwards), then it will prefer to remove
a page 4 backwards in order to allocate a new one forward. It will
also dynamically free cached pages if memory is very tight. (Details
appear in comments in the code.)

There are some things that should be improved. One is that figuring
out how much memory is needed is very hard. This is a hardcoded
constant right now in the code.

Importantly, this patch, even if accepted, is only partial help. There is still an issue if rendering a page of a 1MB PDF takes 50-100MB.
Comment 1 Alon Zakai (kripken) 2007-12-21 18:32:19 UTC
Created attachment 101431 [details] [review]
Patch to cache unseen pages adaptively, depending on free memory
Comment 2 Alon Zakai (kripken) 2008-01-03 17:55:38 UTC
Ok, I believe I have found the underlying cause of this bug. (The patch above helps, but only to a factor of 2-3 or so. The real issue causes memory usage of 10 times the normal amount, or more.)

When a page is loaded, a few data structures are created. One of them is the 'image mapping', which is a list of rectangles and Gtk.Images appearing in them. This is used when the user rightclicks; if the rightclick is on an image, then it can be copied to the clipboard. The relevant code is in ev-image.c and ev-view.c (search for image_mapping). For PDFs specifically, the code generating the image mapping is in ev-poppler.cc (but the issue is relevant to all document types, I think).

Now, the Images are all created beforehand, when the page is loaded, near the end of ev_job_render_run() in ev-jobs.c. Thus, they take up memory even if the user doesn't rightclick on the page. This is IMO suboptimal, since rightclicks would be the rare occurance, not the frequent one. Memory is being used that is, most of the time, of no importance.

But the issue becomes even worse in certain types of PDFs. In a PDF with scanned pages, each page contains an image with the page's bitmapped data. These images are of the size of the _original scanned page_; for example, I have a PDF here whose images are about 4000x4000 pixels. That means that each page rendered in Evince takes 64 MB (16M pixels, 4 channels per pixel). Caching 2 pages ahead and back in addition to the currently-shown one, and we have over 300MB of RAM used by Evince.

My recommendation: images should not be cached. When a page is loaded, a list should be made of the rectangles containing images, _without_ the image itself; only after an actual request is made to copy the image to the clipboard should the image be actually generated. The only disadvantage to this is a slightly slower copy operation.

Note: Currently, Poppler only knows how to return a list containing rectangles _and_ images. Given that, it would still be better to discard the images from the list when a page is loaded; while memory would spike as the list is created in Poppler, it would be immediately freed, and not accumulate to e.g. 300MB+ as currently occurs. Then the list can later be re-created when a copy operation is made. It would be nicer, of course, if Poppler could return either a list with just rectangles or a list with both rectangles and images.


If there is agreement as to this solution, I would be glad to help code it.
Comment 3 Carlos Garcia Campos 2008-01-03 20:31:15 UTC
Yes, that's a problem, I already thought about it when I wrote the images stuff, but I don't think it's the most important problem. Anyway, I agree we should fix it. Extracting an image is not a common action so we could do it on demand instead. We can also fix the poppler side, but I wouldn't like to break te api again. 
Comment 4 Reinier Heeres 2008-01-06 14:49:38 UTC
For OLPC (the $100 laptop) we are using a slightly modified version of evince, so that it can be embedded in a python app. The large memory usage is a pretty big problem there, since the resources are limited. An example of a file that crashes the system (since there is no swap) is at http://dev.laptop.org/~rwh/Coquille.pdf

On my gutsy system I can get up to about 400MB mem usage for this 4MB file; quite a lot! We are trying to see if we can find a way to fix this; if someone has ideas/pointers, that would be very helpful!
Comment 5 Alon Zakai (kripken) 2008-01-06 15:23:25 UTC
Hello Reinier,

Based on what I said above, there is a simple temporary option: disabling the image copying capability. I'm copy&pasting a patch at the end of this comment - I'm not attaching it as a normal bugzilla patch because I do not expect the Evince project to consider it - this is only meant for you.

With patched Evince, I get 23MB of usage on your file, of which 11MB is shared (so, even less than 23MB, effectively), instead of your 400MB.

The downside to this patch is that image copying is disabled. That is, you can view PDFs normally, but you cannot right-click and copy to the clipboard an image from inside a PDF (well, actually the patch disables it in all Evince documents; would be easy to make it PDF-specific, though, if you want).

I am not an Evince dev, so I have no idea what the situation is with actually resolving this bug. But I hope this patch is helpful in the meantime. Let me know if I can be of any other assistance.



FEATURE-DISABLING PATCH:

Index: ev-jobs.c
===================================================================
--- ev-jobs.c	(revision 2793)
+++ ev-jobs.c	(working copy)
@@ -404,10 +404,10 @@
 			job->form_field_mapping =
 				ev_document_forms_get_form_fields (EV_DOCUMENT_FORMS (EV_JOB(job)->document),
 								   job->rc->page);
-		if (job->include_images && EV_IS_DOCUMENT_IMAGES (EV_JOB (job)->document))
+/*		if (job->include_images && EV_IS_DOCUMENT_IMAGES (EV_JOB (job)->document))
 			job->image_mapping =
 				ev_document_images_get_images (EV_DOCUMENT_IMAGES (EV_JOB (job)->document),
-							       job->rc->page);
+							       job->rc->page);                                          */
 		EV_JOB (job)->finished = TRUE;
 	}
 
Comment 6 Reinier Heeres 2008-01-06 19:15:15 UTC
Thanks! Applying this fixes part of the issue. There's still the problem that when zooming in the whole image is scaled instead of just the visible part, resulting in 270MB being used when zooming to 400%. This is of course less than ideal :) However, it uses about 3 times less memory than before applying this!
Comment 7 Alon Zakai (kripken) 2008-01-07 07:12:01 UTC
Reinier,

Yes, high memory usage when zooming in is an issue. I think the Evince project intends to eventually implement 'tiled' rendering - only the visually-accessible parts - but I have no idea when or at what priority.

Meanwhile, you can perhaps not use PDFs for documents that contain just images, like the PDF you gave before. What I mean is, for example, to convert such a PDF to another image format line PNG (using a conversion utility), then viewing the image in a standard image viewing program. Of course this is hardly a good solution, but if automated in some way (when detecting excessive memory usage in Evince or a system crash due to it) it might be a temporary workaround of sorts. Sorry to not have better advice here.
Comment 8 Carlos Garcia Campos 2008-01-19 17:24:17 UTC
I've just fixed the images problem in svn trunk. Now, images are rendered on demand instead of during page rendering. I had to change poppler too, so it requires poppler from git master. 
Comment 9 Akhil Laddha 2008-05-06 04:46:21 UTC
*** Bug 475612 has been marked as a duplicate of this bug. ***
Comment 10 Alexander Kojevnikov 2008-05-06 04:58:03 UTC
These bugs are probably duplicates:
 * bug 474477
 * bug 487223
 * bug 487223
 * bug 516067
 * bug 521036
Comment 11 Akhil Laddha 2008-05-06 05:01:27 UTC
*** Bug 516067 has been marked as a duplicate of this bug. ***
Comment 12 Akhil Laddha 2008-05-06 05:01:59 UTC
*** Bug 521036 has been marked as a duplicate of this bug. ***
Comment 13 Maciej (Matthew) Piechotka 2008-05-06 14:39:11 UTC
*** Bug 474477 has been marked as a duplicate of this bug. ***
Comment 14 Akhil Laddha 2008-05-08 14:12:16 UTC
*** Bug 487223 has been marked as a duplicate of this bug. ***
Comment 15 Carlos Garcia Campos 2008-07-20 11:07:43 UTC
This is already fixed, it requires poppler >= 0.8. 

Thanks. 
Comment 16 Miguel Martinez 2008-07-20 18:23:37 UTC
Does the bugfix just require to update poppler to 0.8.4 (current as of today)? Or does it also need patches presented here? It's a shame if a full poppler 0.8 is required, since this will probably leave Ubuntu Hardy users in the dust. Oh, well, there's always xpdf.
Comment 17 Carlos Garcia Campos 2008-07-20 19:09:00 UTC
(In reply to comment #16)
> Does the bugfix just require to update poppler to 0.8.4 (current as of today)?

poppler 0.8.0 is enough, even poppler 0.7.x (not sure what x)

> Or does it also need patches presented here?

The evince patches were already committed, so you only need to upgrade poppler. 

> It's a shame if a full poppler 0.8
> is required, since this will probably leave Ubuntu Hardy users in the dust.

I know, I asked ubuntu guys to include poppler 0.8 instead of 0.6 in hardy, but it seems it was too late and they didn't want to update every package depending on poppler. I'm sorry, but I can't do anything else. 

> Oh,
> well, there's always xpdf.
> 

You can build poppler from sources or even change your distro :-P I think other distros like fedora or mandriva are shipping poppler 0.8 in their latest stable releases. 
Comment 18 Miguel Martinez 2008-07-20 19:30:35 UTC
Thanks for the notes, Carlos, and don't worry about Ubuntu, it's just that I'm pretty happy with how it works on my laptop. On the other hand, the Debian Testing PC I have at university does have Poppler 0.8.4, so I will be able to confirm the fix tomorrow.
Comment 19 Miguel Martinez 2008-07-22 09:30:45 UTC
Testing an old scanned PRL paper, it seems to work fine now on Debian Lenny (evince 2.22.2 and poppler 0.8.4). Thank you very much for the bugfix, Carlos.
Comment 20 Alexander Kojevnikov 2009-04-28 13:57:36 UTC
*** Bug 512438 has been marked as a duplicate of this bug. ***
Comment 21 Alexander Kojevnikov 2009-04-28 14:12:52 UTC
*** Bug 469235 has been marked as a duplicate of this bug. ***