Bug 659954 – GStreamer extractor very slow on certain media files

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 659954 - GStreamer extractor very slow on certain media files


Summary:	GStreamer extractor very slow on certain media files


Status:	RESOLVED OBSOLETE

Product:	tracker
Classification:	Core
Component:	Extractor
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	tracker-extractor
QA Contact:	Jamie McCracken

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2011-09-23 15:42 UTC by Sam Thursfield
Modified:	2021-05-26 22:25 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Sam Thursfield 2011-09-23 15:42:20 UTC

I've noticed the GStreamer extractor taking upwards of 20 seconds on a single music file. I've not had time to investigate yet (my initial suspicion is that embedded album art extraction is causing the slowdown).

The Discoverer backend is also noticeably slower than decodebin2; this is perhaps unavoidable but maybe we should consider whether we really need Discoverer for all media types ?

Related is getting tagreadbin into GStreamer - https://bugzilla.gnome.org/show_bug.cgi?id=564749 - this I'm sure will give extraction a performance boost.

Comment 1 Justin 2012-03-19 13:47:42 UTC

This happens even with pdf files. I do get this error for most of my files. The files are on a mounted NTFS partition on the primary harddisk.

Comment 2 Sam Thursfield 2012-03-27 17:46:20 UTC

I'm starting to think that there is no way around the fact that some files will be huge and in weird formats that are IO-intensive to extract from.

Is there anything we can do to make the initial crawl for a user who has a ~ full of huge videos and PDF's not destroy their IO performance for 3 hours?

Should we consider postponing extraction on such files during the initial crawl and letting the user trigger it at a later date ?

Comment 3 Martyn Russell 2012-03-28 15:30:25 UTC

(In reply to comment #1)
> This happens even with pdf files. I do get this error for most of my files. The
> files are on a mounted NTFS partition on the primary harddisk.

Are you able to do the same thing on a local or Linux file system to see if the file system is causing any large delays here?

PDFs are notoriously bad for indexing and we're always getting reports about how slow it is for specific cases.

(In reply to comment #2)
> I'm starting to think that there is no way around the fact that some files will
> be huge and in weird formats that are IO-intensive to extract from.

There isn't. There are so many factors in play, file system, disk, other processes using those resources too, the complexity of the extraction (i.e. PDFs), etc.
 
> Is there anything we can do to make the initial crawl for a user who has a ~
> full of huge videos and PDF's not destroy their IO performance for 3 hours?

With tracker 0.12 IIRC, we added an option to use SCHED_IDLE when the user doesn't want their performance impacted. The tracker-preferences allows control of this (see the "index Content in the background" label). I think the default is to use SCHED_IDLE on first index only and not other times.
 
> Should we consider postponing extraction on such files during the initial crawl
> and letting the user trigger it at a later date ?

That's also been proposed. It would require some work in the miner-fs to know when the user has not been at the computer for some time and do the work then. Also there is the chance the miner-fs dies or the computer is shutdown in the mean time and those files are not indexed. So some persistent list would be needed for that too.

Comment 4 Sam Thursfield 2021-05-26 22:25:53 UTC

GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org.
As part of that, we are mass-closing older open tickets in bugzilla.gnome.org
which have not seen updates for a longer time (resources are unfortunately
quite limited so not every ticket can get handled).

If you can still reproduce the situation described in this ticket in a recent
and supported software version, then please follow
  https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines
and create a new enhancement request ticket at
  https://gitlab.gnome.org/GNOME/tracker/-/issues/

Thank you for your understanding and your help.