After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 620182 - Indexing stalls with tracker-store using 100% CPU
Indexing stalls with tracker-store using 100% CPU
Status: RESOLVED NOTABUG
Product: tracker
Classification: Core
Component: General
0.9.x
Other Linux
: Normal normal
: ---
Assigned To: tracker-general
Jamie McCracken
Depends on:
Blocks:
 
 
Reported: 2010-05-31 20:49 UTC by Alexander Hunziker
Modified: 2010-06-01 13:56 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Example of ASCII file triggering the bug (937.00 KB, application/octet-stream)
2010-06-01 08:16 UTC, Alexander Hunziker
Details

Description Alexander Hunziker 2010-05-31 20:49:47 UTC
During indexing of my home directory using tracker 0.9.5, always at 17%, i have tracker-store using 100% CPU and indexing won't progress from there on, or at least not for a fairly long while (I have waited some five minutes). A backtrace from tracker-store:

(gdb) thread apply all bt

Thread 2 (Thread 0xb7653b70 (LWP 4907))

  • #0 IA__g_utf8_offset_to_pointer
    at /build/buildd/glib2.0-2.24.1/glib/gutf8.c line 327
  • #1 parser_next
    at tracker-parser-glib.c line 309
  • #2 tracker_parser_next
    at tracker-parser-glib.c line 616
  • #3 buildTerms
    at tracker-fts.c line 4848
  • #4 tracker_fts_update_text
    at tracker-fts.c line 7885
  • #5 tracker_data_resource_buffer_flush
    at tracker-data-update.c line 803
  • #6 tracker_data_update_buffer_flush
    at tracker-data-update.c line 844
  • #7 tracker_sparql_query_execute_insert_or_delete
    at tracker-sparql-query.c line 3132
  • #8 tracker_sparql_query_execute_update
    at tracker-sparql-query.c line 2136
  • #9 update_sparql
    at tracker-data-update.c line 2313
  • #10 pool_dispatch_cb
    at tracker-store.c line 467
  • #11 g_thread_pool_thread_proxy
    at /build/buildd/glib2.0-2.24.1/glib/gthreadpool.c line 315
  • #12 g_thread_create_proxy
    at /build/buildd/glib2.0-2.24.1/glib/gthread.c line 1893
  • #13 start_thread
    at pthread_create.c line 300
  • #14 clone
    at ../sysdeps/unix/sysv/linux/i386/clone.S line 130

I can see in the tracker-store log that the file it chokes on is an ascii file of about 1MB full of numbers (the result of some number crunching program of mine). I can attach logs upon request, but they don't seem to contain anything of interest.
Comment 1 Aleksander Morgado 2010-06-01 07:05:34 UTC
Could you possibly attach that given ASCII file in the bug report?
Comment 2 Alexander Hunziker 2010-06-01 08:16:57 UTC
Created attachment 162436 [details]
Example of ASCII file triggering the bug

This is the file on which it hung last time. I'm not positive it always hangs on the very same one, but i've observed it hang in the same directory (which has various of those data files) at least three times.
Comment 3 Aleksander Morgado 2010-06-01 10:59:37 UTC
Hi Alexander,

It seems the glib/pango parser doesn't perform very well if the file only contains numbers. It took, as you say, around 5 minutes to parse the whole contents of the 1MByte file using the glib/pango parser. In the other hand, if using libunistring or libicu based parsers with the same file, the results are pretty different: 0.023s for libunistring parser and 0.053s for libicu.

I would suggest to use either libunistring or libicu parsers instead of glib/pango one (using the --with-unicode-support=[libunistring|libicu] configure option. If available, libunistring performs generally better than the libicu parser.

Cheers.
Comment 4 Alexander Hunziker 2010-06-01 13:56:05 UTC
Turns out this is a problem with the glib/pango text extraction module. the libunistring based one performs much better, and that is also given preference if it is available. packagers should therefore ensure it is used if the system provides it.