GNOME Bugzilla – Bug 659954
GStreamer extractor very slow on certain media files
Last modified: 2021-05-26 22:25:53 UTC
I've noticed the GStreamer extractor taking upwards of 20 seconds on a single music file. I've not had time to investigate yet (my initial suspicion is that embedded album art extraction is causing the slowdown). The Discoverer backend is also noticeably slower than decodebin2; this is perhaps unavoidable but maybe we should consider whether we really need Discoverer for all media types ? Related is getting tagreadbin into GStreamer - https://bugzilla.gnome.org/show_bug.cgi?id=564749 - this I'm sure will give extraction a performance boost.
This happens even with pdf files. I do get this error for most of my files. The files are on a mounted NTFS partition on the primary harddisk.
I'm starting to think that there is no way around the fact that some files will be huge and in weird formats that are IO-intensive to extract from. Is there anything we can do to make the initial crawl for a user who has a ~ full of huge videos and PDF's not destroy their IO performance for 3 hours? Should we consider postponing extraction on such files during the initial crawl and letting the user trigger it at a later date ?
(In reply to comment #1) > This happens even with pdf files. I do get this error for most of my files. The > files are on a mounted NTFS partition on the primary harddisk. Are you able to do the same thing on a local or Linux file system to see if the file system is causing any large delays here? PDFs are notoriously bad for indexing and we're always getting reports about how slow it is for specific cases. (In reply to comment #2) > I'm starting to think that there is no way around the fact that some files will > be huge and in weird formats that are IO-intensive to extract from. There isn't. There are so many factors in play, file system, disk, other processes using those resources too, the complexity of the extraction (i.e. PDFs), etc. > Is there anything we can do to make the initial crawl for a user who has a ~ > full of huge videos and PDF's not destroy their IO performance for 3 hours? With tracker 0.12 IIRC, we added an option to use SCHED_IDLE when the user doesn't want their performance impacted. The tracker-preferences allows control of this (see the "index Content in the background" label). I think the default is to use SCHED_IDLE on first index only and not other times. > Should we consider postponing extraction on such files during the initial crawl > and letting the user trigger it at a later date ? That's also been proposed. It would require some work in the miner-fs to know when the user has not been at the computer for some time and do the work then. Also there is the chance the miner-fs dies or the computer is shutdown in the mean time and those files are not indexed. So some persistent list would be needed for that too.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new enhancement request ticket at https://gitlab.gnome.org/GNOME/tracker/-/issues/ Thank you for your understanding and your help.