After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 730229 - tracker-extract crashing when indexing specific PDF
tracker-extract crashing when indexing specific PDF
Status: RESOLVED NOTGNOME
Product: tracker
Classification: Core
Component: Extractor
1.0.x
Other Linux
: Normal blocker
: ---
Assigned To: Martyn Russell
Depends on:
Blocks:
 
 
Reported: 2014-05-16 03:33 UTC by Michael Gratton
Modified: 2014-05-28 08:42 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
File causing the crash (464.30 KB, application/pdf)
2014-05-16 03:33 UTC, Michael Gratton
  Details
Patch to stop crash, but not a real solution (1.41 KB, patch)
2014-05-16 09:48 UTC, Martyn Russell
none Details | Review

Description Michael Gratton 2014-05-16 03:33:58 UTC
Created attachment 276649 [details]
File causing the crash

tracker-extract 1.01 is reliably crashing when extracting a specific PDF. It prevents extract from continuing its run when started.

Running "/usr/lib/tracker/tracker-extract -v 3", the following is reported just before the crash:

> Tracker-Message: Extracting metadata for 'file:///home/redacted/IJCAI13-512.pdf'
> Tracker-Message: MIME type passed to us as 'application/pdf'
> (tracker-extract:26961): Tracker-DEBUG: Using /usr/lib/x86_64-linux-gnu/tracker-1.0/extract-modules/libextract-pdf.so...
> Syntax Error (475118): Illegal character <7e> in hex string
> Syntax Error (475119): Illegal character <c7> in hex string
> Syntax Error (475121): Illegal character <7b> in hex string
> Syntax Error: Couldn't find trailer dictionary
> Syntax Error: Invalid XRef entry
> Syntax Error: Invalid XRef entry
> Syntax Error: Missing 'endstream' or incorrect stream length
> Segmentation fault (core dumped)

This is using the debs from GNOME3 PPA on Ubuntu 14.04 LTS 64-bit. Bug on lp: https://bugs.launchpad.net/ubuntu-gnome/+bug/1320055
Comment 1 Michael Gratton 2014-05-16 03:34:56 UTC
Err, version 1.0.1, of course (tracker-extract 1.0.1-2ubuntu1~trusty1).
Comment 2 Martyn Russell 2014-05-16 09:42:23 UTC
Thanks for the bug report.

The PDF seems quite broken. Even Evince gives a bunch of errors and warnings in the console.

I did some digging, it seems to be related to this call:

	g_object_get (document,
	              "title", &pd.title,
	              "author", &pd.author,
	              "subject", &pd.subject,
	              "keywords", &pd.keywords,
	              "creation-date", &creation_date,
	              "metadata", &xml,
	              NULL);

The "metadata" request is what crashes here.
If you remove that then actually the creation-date returned is bogus (0xffffffff) and there is a crash on the freeing of that, but easily remedied.

Anyway, after checking with upstream and on IRC, it is thought that this bug doesn't happen in later versions of poppler. Adrian Perez on IRC said that he would test this bug later and get back to us.

It doesn't seem to be a Tracker bug though.
Thanks for reporting.
Comment 3 Martyn Russell 2014-05-16 09:43:35 UTC
CCing Adrian.
Comment 4 Martyn Russell 2014-05-16 09:48:14 UTC
Created attachment 276658 [details] [review]
Patch to stop crash, but not a real solution

This patch stops the crash, but there is still a lot of crap and errors given along the way. Hope this helps Adrian!
Comment 5 Michael Gratton 2014-05-17 01:57:08 UTC
Thanks for following this up. I'm running libpoppler 0.24.5 (0.24.5-2ubuntu4), if that helps.
Comment 6 Martyn Russell 2014-05-28 08:42:20 UTC
Adrian, did you have a chance to look into this further to confirm your suspicions?

Mike, we suspect this is fixed in later versions (0.26) of Poppler. It certainly isn't a Tracker bug.

I am going to mark this as NOT_GNOME for now.
Adrian if there is anything we can do to fix improve this in Tracker, please let me know. The incriminating line is:

https://git.gnome.org/browse/tracker/tree/src/tracker-extract/tracker-extract-pdf.c#n407

Thanks,