After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 738704 - Huge spike in CPU and memory usage by tracker extractor due to rogue file
Huge spike in CPU and memory usage by tracker extractor due to rogue file
Status: RESOLVED NOTGNOME
Product: tracker
Classification: Core
Component: Extractor
1.2.x
Other Linux
: Normal normal
: ---
Assigned To: tracker-extractor
Depends on:
Blocks:
 
 
Reported: 2014-10-17 17:45 UTC by Atri
Modified: 2014-10-19 11:45 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
The other suspected file (tarballed because of size constraints) causing tracker to go rogue (412.47 KB, application/x-xz)
2014-10-17 17:49 UTC, Atri
Details

Description Atri 2014-10-17 17:45:09 UTC
The original bug report was opened against tracker 1.0.x here
https://bugzilla.opensuse.org/show_bug.cgi?id=898323

But, the issue is also confirmed with tracker 1.2.2. Essentially the presence of a rogue file (most likely one out of the two attached files) causes a huge memory spike in tracker-extractor (try running /usr/lib/tracker-extractor to test) until finally it complains "Out of Memory" when run from the terminal. The same happens when any use of tracker-extractor is made, e.g., when searching in shell overview. 
In addition, tracker-control -F reports:
---------------------------
Store:
17 Oct 2014, 10:34:56:  ✓     Store                 - Idle 

Miners:
17 Oct 2014, 10:34:56:  ✓     File System           - Idle 
17 Oct 2014, 10:34:56:  ✓     Applications          - Idle 
17 Oct 2014, 10:34:56:  ✓     Userguides            - Idle 
17 Oct 2014, 10:34:56:  ✗     Extractor             - Not running or is a disabled plugin
---------------------------


Remove these two files from the indexed directories and do tracker-control -s, then try tracker-control -F again and it reports:
---------------------------
Store:
17 Oct 2014, 10:29:08:  ✓     Store                 - Idle 

Miners:
17 Oct 2014, 10:29:08:  ✓     File System           - Idle 
17 Oct 2014, 10:29:08:  ✓     Applications          - Idle 
17 Oct 2014, 10:29:08:  ✓     Userguides            - Idle 
17 Oct 2014, 10:29:08:  ✓     Extractor             - Idle 
Press Ctrl+C to stop
---------------------------
and the memory leak issue also stops completely. Restore the files back and the issue returns again.
Comment 1 Atri 2014-10-17 17:49:45 UTC
Created attachment 288776 [details]
The other suspected file (tarballed because of size constraints) causing tracker to go rogue
Comment 2 Atri 2014-10-17 17:59:40 UTC
The first suspected file causing tracker issues (cannot attach 2 MiB pdf, so posting a link to file instead)
https://web-dc1.spideroak.com/share/MJQWI43IMFUDIMBQL5ZWQYLSMUYQ/Bugs/home/badshah/SpiderOak%20Hive/Misc/Bug%20Reports/patt_su3_40.pdf
Comment 3 Atri 2014-10-19 09:36:14 UTC
(In reply to comment #2)
> The first suspected file causing tracker issues (cannot attach 2 MiB pdf, so
> posting a link to file instead)
> https://web-dc1.spideroak.com/share/MJQWI43IMFUDIMBQL5ZWQYLSMUYQ/Bugs/home/badshah/SpiderOak%20Hive/Misc/Bug%20Reports/patt_su3_40.pdf

Please get the offending pdf file from here
https://drive.google.com/file/d/0B-1aTg5_gkVTSG1FMUhKdlRsWjQ/view?usp=sharing
instead. I wish to remove it completely from my system.

Did you know on opening Documents now evince-thumbnailer starts trying to thumbnailing this file and all hell breaks loose -- it starts eating memory until my system freezes completely (it clocked a whopping 1.8 GiB before I killed it)?

I don't know what's the matter with this file; it is admittedly a very dense plot but acroread opens it okay and yet if I try to open it in evince/Documents this happens, tracker runs into problem also apparently while running its extractor over this one.
Comment 4 Martyn Russell 2014-10-19 11:44:39 UTC
I can confirm this bug, but it's not a Tracker bug as far as I can see. We call:

  text = poppler_page_get_text (page);

and we run out of memory and it does take an age to come back from that API call.

Reassigning to poppler.
https://bugs.freedesktop.org/show_bug.cgi?id=85196
Comment 5 Martyn Russell 2014-10-19 11:45:53 UTC
Thanks for taking the time to report this bug.
However, this application does not track its bugs in the GNOME Bugzilla.