Bug 737663 – tracker-extract is SIGABRTing due to setrlimits() - crash on start in Continuous

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 737663 - tracker-extract is SIGABRTing due to setrlimits() - crash on start in Continuous


Summary:	tracker-extract is SIGABRTing due to setrlimits() - crash on start in Continuous


Status:	RESOLVED FIXED

Product:	tracker
Classification:	Core
Component:	General
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	tracker-general
QA Contact:

URL:
Whiteboard:

Duplicates:	712142 712602 739762 742826 754365 (view as bug list)
Depends on:
Blocks:

Reported:	2014-09-30 14:27 UTC by Colin Walters
Modified:	2015-08-31 18:20 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Colin Walters 2014-09-30 14:27:44 UTC

PID: 666 (tracker-extract)
           UID: 1000 (walters)
           GID: 1000 (walters)
        Signal: 6 (ABRT)
     Timestamp: Tue 2014-08-26 14:09:22 EDT (1 months 4 days ago)
  Command Line: /usr/libexec/tracker-extract
    Executable: /usr/libexec/tracker-extract
 Control Group: /user.slice/user-1000.slice/session-2.scope
          Unit: session-2.scope
         Slice: user-1000.slice
       Session: 2
     Owner UID: 1000 (walters)
       Boot ID: 0ca4fb2b5724479bb60977245fa5c8fe
    Machine ID: 45bb3b96146aa94f299b9eb43646eb35
      Hostname: qemux86-64
       Message: Process 666 (tracker-extract) of user 1000 dumped core.
                
                Stack trace of thread 749:
                #0  0x00007f2e381b62e5 __GI_raise (libc.so.6)
                #1  0x00007f2e381b9488 __GI_abort (libc.so.6)
                #2  0x00007f2e38a9b6ab _g_log_abort (libglib-2.0.so.0)
                #3  0x00007f2e38a9b732 g_log (libglib-2.0.so.0)
                #4  0x00007f2e38a9a0c4 g_malloc0 (libglib-2.0.so.0)
                #5  0x00007f2e38a66cbf thread_memory_from_self (libglib-2.0.so.0)
                #6  0x00007f2e38aafcd5 thread_memory_from_self (libglib-2.0.so.0)
                #7  0x00007f2e38ab3b53 g_string_sized_new (libglib-2.0.so.0)
                #8  0x00007f2e38a9adfc g_log_default_handler (libglib-2.0.so.0)
                #9  0x00007f2e3b760033 tracker_log_handler (libtracker-common.so.0)
                #10 0x00007f2e38a9b4a1 g_logv (libglib-2.0.so.0)
                #11 0x00007f2e38a9b732 g_log (libglib-2.0.so.0)
                #12 0x00007f2e38a9a0c4 g_malloc0 (libglib-2.0.so.0)
                #13 0x00007f2e38a66cbf thread_memory_from_self (libglib-2.0.so.0)
                #14 0x00007f2e38ab01c9 thread_memory_from_self (libglib-2.0.so.0)
                #15 0x00007f2e38aa4a63 g_queue_pop_tail (libglib-2.0.so.0)
                #16 0x00007f2e38a6af9a g_async_queue_pop_intern_unlocked (libglib-2.0.so.0)
                #17 0x00007f2e38aba356 g_thread_pool_wait_for_new_task (libglib-2.0.so.0)
                #18 0x00007f2e38ab99f5 g_thread_proxy (libglib-2.0.so.0)
                #19 0x00007f2e38538f9b start_thread (libpthread.so.0)
                #20 0x00007f2e3826c54d __clone (libc.so.6)
                
                Stack trace of thread 670:
                #0  0x00007f2e3826015d poll (libc.so.6)
                #1  0x00007f2e38a94a14 g_main_context_poll (libglib-2.0.so.0)
                #2  0x00007f2e38a94b1c g_main_context_iteration (libglib-2.0.so.0)
                #3  0x00007f2e3396e06d dconf_gdbus_worker_thread (libdconfsettings.so)
                #4  0x00007f2e38ab99f5 g_thread_proxy (libglib-2.0.so.0)
                #5  0x00007f2e38538f9b start_thread (libpthread.so.0)
                #6  0x00007f2e3826c54d __clone (libc.so.6)
                
                Stack trace of thread 672:
                #0  0x00007f2e3826015d poll (libc.so.6)
                #1  0x00007f2e38a94a14 g_main_context_poll (libglib-2.0.so.0)
                #2  0x00007f2e38a94d3a g_main_loop_run (libglib-2.0.so.0)
                #3  0x00007f2e394c0886 gdbus_shared_thread_func (libgio-2.0.so.0)
                #4  0x00007f2e38ab99f5 g_thread_proxy (libglib-2.0.so.0)
                #5  0x00007f2e38538f9b start_thread (libpthread.so.0)
                #6  0x00007f2e3826c54d __clone (libc.so.6)
                
                Stack trace of thread 666:
                #0  0x00007f2e3826015d poll (libc.so.6)
                #1  0x00007f2e38a94a14 g_main_context_poll (libglib-2.0.so.0)
                #2  0x00007f2e38a94d3a g_main_loop_run (libglib-2.0.so.0)
                #3  0x00000000004096cf main (tracker-extract)
                #4  0x00007f2e381a2a35 __libc_start_main (libc.so.6)
                #5  0x0000000000409759 _start (tracker-extract)
                
                Stack trace of thread 675:
                #0  0x00007f2e3826015d poll (libc.so.6)
                #1  0x00007f2e38a94a14 g_main_context_poll (libglib-2.0.so.0)
                #2  0x00007f2e38a94b1c g_main_context_iteration (libglib-2.0.so.0)
                #3  0x00007f2e38a94b59 glib_worker_main (libglib-2.0.so.0)
                #4  0x00007f2e38ab99f5 g_thread_proxy (libglib-2.0.so.0)
                #5  0x00007f2e38538f9b start_thread (libpthread.so.0)
                #6  0x00007f2e3826c54d __clone (libc.so.6)
                
                Stack trace of thread 714:
                #0  0x00007f2e38267e39 syscall (libc.so.6)
                #1  0x00007f2e38ad65dc g_cond_wait (libglib-2.0.so.0)
                #2  0x00007f2e38a6b02b g_async_queue_pop_intern_unlocked (libglib-2.0.so.0)
                #3  0x00007f2e38aba3bd g_thread_pool_wait_for_new_task (libglib-2.0.so.0)
                #4  0x00007f2e38ab99f5 g_thread_proxy (libglib-2.0.so.0)
                #5  0x00007f2e38538f9b start_thread (libpthread.so.0)
                #6  0x00007f2e3826c54d __clone (libc.so.6)
                
                Stack trace of thread 715:
                #0  0x00007f2e38267e39 syscall (libc.so.6)
                #1  0x00007f2e38ad65dc g_cond_wait (libglib-2.0.so.0)
                #2  0x00007f2e38a6b02b g_async_queue_pop_intern_unlocked (libglib-2.0.so.0)
                #3  0x00007f2e38aba3bd g_thread_pool_wait_for_new_task (libglib-2.0.so.0)
                #4  0x00007f2e38ab99f5 g_thread_proxy (libglib-2.0.so.0)
                #5  0x00007f2e38538f9b start_thread (libpthread.so.0)
                #6  0x00007f2e3826c54d __clone (libc.so.6)
                
                Stack trace of thread 716:
                #0  0x00007f2e38267e39 syscall (libc.so.6)
                #1  0x00007f2e38ad65dc g_cond_wait (libglib-2.0.so.0)
                #2  0x00007f2e38a6b02b g_async_queue_pop_intern_unlocked (libglib-2.0.so.0)
                #3  0x00007f2e38aba3bd g_thread_pool_wait_for_new_task (libglib-2.0.so.0)
                #4  0x00007f2e38ab99f5 g_thread_proxy (libglib-2.0.so.0)
                #5  0x00007f2e38538f9b

           PID: 666 (tracker-extract)
           UID: 1000 (walters)
           GID: 1000 (walters)
        Signal: 6 (ABRT)
     Timestamp: Thu 2014-09-25 14:52:09 EDT (4 days ago)
  Command Line: /usr/libexec/tracker-extract
    Executable: /usr/libexec/tracker-extract
 Control Group: /user.slice/user-1000.slice/session-2.scope
          Unit: session-2.scope
         Slice: user-1000.slice
       Session: 2
     Owner UID: 1000 (walters)
       Boot ID: aed09284a1bb4f3b83dba40dfedf6d26
    Machine ID: 45bb3b96146aa94f299b9eb43646eb35
      Hostname: qemux86-64
       Message: Process 666 (tracker-extract) of user 1000 dumped core.
                
                Stack trace of thread 710:
                #0  0x00007fa23920e1b9 __GI_raise (libc.so.6)
                #1  0x00007fa2392112d8 __GI_abort (libc.so.6)
                #2  0x00007fa239af365b _g_log_abort (libglib-2.0.so.0)
                #3  0x00007fa239af36e2 g_log (libglib-2.0.so.0)
                #4  0x00007fa239af2074 g_malloc0 (libglib-2.0.so.0)
                #5  0x00007fa239abec8f thread_memory_from_self (libglib-2.0.so.0)
                #6  0x00007fa239b07c85 thread_memory_from_self (libglib-2.0.so.0)
                #7  0x00007fa239b0b943 g_string_sized_new (libglib-2.0.so.0)
                #8  0x00007fa239af2dac g_log_default_handler (libglib-2.0.so.0)
                #9  0x00007fa23c7ab0c3 tracker_log_handler (libtracker-common.so.0)
                #10 0x00007fa239af3451 g_logv (libglib-2.0.so.0)
                #11 0x00007fa239af36e2 g_log (libglib-2.0.so.0)
                #12 0x00007fa239af2074 g_malloc0 (libglib-2.0.so.0)
                #13 0x00007fa239abec8f thread_memory_from_self (libglib-2.0.so.0)
                #14 0x00007fa239b08179 thread_memory_from_self (libglib-2.0.so.0)
                #15 0x00007fa239afca13 g_queue_pop_tail (libglib-2.0.so.0)
                #16 0x00007fa239ac2f6a g_async_queue_pop_intern_unlocked (libglib-2.0.so.0)
                #17 0x00007fa239b12146 g_thread_pool_wait_for_new_task (libglib-2.0.so.0)
                #18 0x00007fa239b117e5 g_thread_proxy (libglib-2.0.so.0)
                #19 0x00007fa239590f41 start_thread (libpthread.so.0)
                #20 0x00007fa2392c118d __clone (libc.so.6)
                
                Stack trace of thread 683:
                #0  0x00007fa2392b518d poll (libc.so.6)
                #1  0x00007fa239aec9c4 g_main_context_poll (libglib-2.0.so.0)
                #2  0x00007fa239aeccea g_main_loop_run (libglib-2.0.so.0)
                #3  0x00007fa23a50fd06 gdbus_shared_thread_func (libgio-2.0.so.0)
                #4  0x00007fa239b117e5 g_thread_proxy (libglib-2.0.so.0)
                #5  0x00007fa239590f41 start_thread (libpthread.so.0)
                #6  0x00007fa2392c118d __clone (libc.so.6)
                
                Stack trace of thread 682:
                #0  0x00007fa2392b518d poll (libc.so.6)
                #1  0x00007fa239aec9c4 g_main_context_poll (libglib-2.0.so.0)
                #2  0x00007fa239aecacc g_main_context_iteration (libglib-2.0.so.0)
                #3  0x00007fa2349c506d dconf_gdbus_worker_thread (libdconfsettings.so)
                #4  0x00007fa239b117e5 g_thread_proxy (libglib-2.0.so.0)
                #5  0x00007fa239590f41 start_thread (libpthread.so.0)
                #6  0x00007fa2392c118d __clone (libc.so.6)
                
                Stack trace of thread 684:
                #0  0x00007fa2392b518d poll (libc.so.6)
                #1  0x00007fa239aec9c4 g_main_context_poll (libglib-2.0.so.0)
                #2  0x00007fa239aecacc g_main_context_iteration (libglib-2.0.so.0)
                #3  0x00007fa239aecb09 glib_worker_main (libglib-2.0.so.0)
                #4  0x00007fa239b117e5 g_thread_proxy (libglib-2.0.so.0)
                #5  0x00007fa239590f41 start_thread (libpthread.so.0)
                #6  0x00007fa2392c118d __clone (libc.so.6)
                
                Stack trace of thread 696:
                #0  0x00007fa2392bcb99 syscall (libc.so.6)
                #1  0x00007fa239b2e3cc g_cond_wait (libglib-2.0.so.0)
                #2  0x00007fa239ac2ffb g_async_queue_pop_intern_unlocked (libglib-2.0.so.0)
                #3  0x00007fa239b121ad g_thread_pool_wait_for_new_task (libglib-2.0.so.0)
                #4  0x00007fa239b117e5 g_thread_proxy (libglib-2.0.so.0)
                #5  0x00007fa239590f41 start_thread (libpthread.so.0)
                #6  0x00007fa2392c118d __clone (libc.so.6)
                
                Stack trace of thread 697:
                #0  0x00007fa2392bcb99 syscall (libc.so.6)
                #1  0x00007fa239b2e3cc g_cond_wait (libglib-2.0.so.0)
                #2  0x00007fa239ac2ffb g_async_queue_pop_intern_unlocked (libglib-2.0.so.0)
                #3  0x00007fa239b121ad g_thread_pool_wait_for_new_task (libglib-2.0.so.0)
                #4  0x00007fa239b117e5 g_thread_proxy (libglib-2.0.so.0)
                #5  0x00007fa239590f41 start_thread (libpthread.so.0)
                #6  0x00007fa2392c118d __clone (libc.so.6)
                
                Stack trace of thread 698:
                #0  0x00007fa2392bcb99 syscall (libc.so.6)
                #1  0x00007fa239b2e3cc g_cond_wait (libglib-2.0.so.0)
                #2  0x00007fa239ac2ffb g_async_queue_pop_intern_unlocked (libglib-2.0.so.0)
                #3  0x00007fa239b121ad g_thread_pool_wait_for_new_task (libglib-2.0.so.0)
                #4  0x00007fa239b117e5 g_thread_proxy (libglib-2.0.so.0)
                #5  0x00007fa239590f41 start_thread (libpthread.so.0)
                #6  0x00007fa2392c118d __clone (libc.so.6)
                
                Stack trace of thread 699:
                #0  0x00007fa2392bcb99 syscall (libc.so.6)
                #1  0x00007fa239b2e3cc g_cond_wait (libglib-2.0.so.0)
                #2  0x00007fa239ac2ffb g_async_queue_pop_intern_unlocked (libglib-2.0.so.0)
                #3  0x00007fa239b121ad g_thread_pool_wait_for_new_task (libg

Comment 1 Martyn Russell 2014-10-01 08:29:39 UTC

Hi, this is a known issue. We're currently investigating how to fix it.

Can you tell me how much memory you have on the system you're testing with?

The SIGABRT occurs because we use setrlimit() and we set the RSS max:

  /* Clamp memory between 50% of total and MAXLONG (2GB on 32-bit) */
  limit = CLAMP (total_halfed, MEM_LIMIT_MIN, G_MAXLONG);

Ubuntu has been seeing these for a while.

Why do we do this? Well, to protect the system against rogue extractors, rogue libraries being used by extractors and spiralling memory use in general. This is where most "shit" happens so we have these safe guards in place. However, we're seeing more false positives than we should be right now.

We've considered:

1. Removing it entirely.
2. Catching SIGABRT to gracefully exit.
3. Changing the limit itself.

Note, we've done #3 more than once and it's actually quite hard to guess this for all systems.

Comment 2 Philip Van Hoof 2014-10-01 08:51:49 UTC

Please also investigate (and keep monitoring this whenever code has changed) why tracker-extract is consuming so much memory. I understood from Carlos that a lot of GFile instances are right now being kept alive needlessly and a solution would be to query a bit more instead of persisting these instances in memory.

We should not use setrlimits but we should also NOT use that much memory, at all.

Comment 3 Martyn Russell 2014-10-01 08:54:06 UTC

(In reply to comment #2)
> Please also investigate (and keep monitoring this whenever code has changed)
> why tracker-extract is consuming so much memory. I understood from Carlos that
> a lot of GFile instances are right now being kept alive needlessly and a
> solution would be to query a bit more instead of persisting these instances in
> memory.
> 
> We should not use setrlimits but we should also NOT use that much memory, at
> all.

I am undecided on removing it, but it is true that it's a relic from the old days of embedded systems needing to keep memory use to a low.

Comment 4 Philip Van Hoof 2014-10-01 09:11:22 UTC

(In reply to comment #3)

> > We should not use setrlimits but we should also NOT use that much memory, at
> > all.
> 
> I am undecided on removing it, but it is true that it's a relic from the old
> days of embedded systems needing to keep memory use to a low.

We're still being used on embedded systems. But letting setrlimits crash us is not the solution: finding the memory fragmenting, leaking or excessive usage and not causing it is the only correct solution here.

Comment 5 Martyn Russell 2014-10-01 09:16:20 UTC

(In reply to comment #4)
> We're still being used on embedded systems. But letting setrlimits crash us is
> not the solution: finding the memory fragmenting, leaking or excessive usage
> and not causing it is the only correct solution here.

Well, the crash is due to us not handling SIGABRT as far as I understand. We could handle that signal and action gracefully.

Comment 6 Colin Walters 2014-10-01 12:20:17 UTC

(In reply to comment #1)
> Hi, this is a known issue. We're currently investigating how to fix it.
> 
> Can you tell me how much memory you have on the system you're testing with?

12 GiB.

> Why do we do this? Well, to protect the system against rogue extractors, rogue
> libraries being used by extractors and spiralling memory use in general. This
> is where most "shit" happens so we have these safe guards in place. However,
> we're seeing more false positives than we should be right now.

Hm, the man page for RLIMIT_RSS says:
This limit has effect only in Linux 2.4.x, x < 30, and there affects only calls to madvise(2) specifying MADV_WILLNEED.

(RHEL 7 man-pages)

What we likely want to do instead is use cgroups.

Comment 7 Carlos Garnacho 2014-10-01 13:19:52 UTC

(In reply to comment #2)
> Please also investigate (and keep monitoring this whenever code has changed)
> why tracker-extract is consuming so much memory. I understood from Carlos that
> a lot of GFile instances are right now being kept alive needlessly and a
> solution would be to query a bit more instead of persisting these instances in
> memory.

This is only potential behavior, tracker-extract keeps a GFile for the instances that were added to the store through tracker-miner-fs, and still haven't been tagged by tracker-extract as "extracted". If miner-fs inserts millions of files in one go, tracker-extract will allocate as much and flush while processing happens, but I'm not sure this is the case here.

> 
> We should not use setrlimits but we should also NOT use that much memory, at
> all.

I agree, and I agree that leaks/cpu churn should be considered bugs where it's originated.

(In reply to comment #6)
> (In reply to comment #1)
> > Hi, this is a known issue. We're currently investigating how to fix it.
> > 
> > Can you tell me how much memory you have on the system you're testing with?
> 
> 12 GiB.

And how many files in the homedir are we talking about? what does tracker-stats say there? 

Also... could you roughly tell how much memory was the process using? maybe it's just that something is very off

> 
> > Why do we do this? Well, to protect the system against rogue extractors, rogue
> > libraries being used by extractors and spiralling memory use in general. This
> > is where most "shit" happens so we have these safe guards in place. However,
> > we're seeing more false positives than we should be right now.
> 
> Hm, the man page for RLIMIT_RSS says:
> This limit has effect only in Linux 2.4.x, x < 30, and there affects only calls
> to madvise(2) specifying MADV_WILLNEED.

Tracker sets RLIMIT_AS and RLIMIT_DATA though... both should be set to 4GB in your case.

> 
> (RHEL 7 man-pages)
> 
> What we likely want to do instead is use cgroups.

I agree with that, and vote to remove the setrlimits calls, if most crash reports from tracker-extract are caused by itself, seems like a good hint we should stop doing that, and we actually conceal every "tracker-extract eats memory" bug we could potentially get.

Comment 8 Martyn Russell 2014-10-13 11:12:32 UTC

I've removed the call to setrlimit() in the libtracker-common-cleanup branch here:

  https://git.gnome.org/browse/tracker/commit/?h=libtracker-common-cleanup&id=3a7625fa5cfe795e5b35c3c32b609488ac8ce5f0

Comment 9 Debarshi Ray 2014-10-14 14:01:49 UTC

*** Bug 712142 has been marked as a duplicate of this bug. ***

Comment 10 Debarshi Ray 2014-10-14 14:04:55 UTC

(In reply to comment #2)
> Please also investigate (and keep monitoring this whenever code has changed)
> why tracker-extract is consuming so much memory. I understood from Carlos that
> a lot of GFile instances are right now being kept alive needlessly and a
> solution would be to query a bit more instead of persisting these instances in
> memory.
> 
> We should not use setrlimits but we should also NOT use that much memory, at
> all.

In bug 712142 we had tracker-extract crashing when indexing a 64K PDF [1] because poppler was extracting the entire file with all its images into memory.

[1] /usr/share/gnome-documents/getting-started/C/gnome-documents-getting-started.pdf

Comment 11 Martyn Russell 2015-01-13 15:05:52 UTC

This has been fixed now.

Comment 12 Martyn Russell 2015-01-13 15:06:10 UTC

*** Bug 712602 has been marked as a duplicate of this bug. ***

Comment 13 Martyn Russell 2015-01-13 15:06:28 UTC

*** Bug 739762 has been marked as a duplicate of this bug. ***

Comment 14 Martyn Russell 2015-01-13 15:06:37 UTC

*** Bug 742826 has been marked as a duplicate of this bug. ***

Comment 15 Xavier Claessens 2015-08-31 18:20:35 UTC

*** Bug 754365 has been marked as a duplicate of this bug. ***