GNOME Bugzilla – Bug 630293
SEGV in magazine_cache_trim()
Last modified: 2014-03-07 10:04:44 UTC
Program received signal SIGSEGV, Segmentation fault.
+ Trace 223848
Thread 140735164679952 (LWP 3485)
(gdb) t a a bt
+ Trace 223849
Thread 2559 (Thread 0x7fffa5dd5710 (LWP 3879))
Thread 2500 (Thread 0x7fff757f8710 (LWP 3485))
Thread 1 (Thread 0x7ffff4d96940 (LWP 28720))
I suspect this one is probably related... Program received signal SIGSEGV, Segmentation fault.
+ Trace 223856
Thread 140734917236496 (LWP 22528)
$1 = (ChunkLink **) 0x7fffcbe62810 (gdb) p *magazine_chunks $2 = (ChunkLink *) 0x9350f0 (gdb) p chunk $3 = (ChunkLink *) 0x7470697263736564 (gdb) p (char *)&chunk $4 = 0x9350f8 "descript\a"
I recall similar issue, which was, if I recall correctly, caused by freeing GSLice'ed memory with g_free. The above seems to me like just a random crash, so do you have any steps how to reproduce this, please? Maybe valgrind with G_SLICE=always-malloc may help here.
I've no idea how to reproduce, I'm afraid -- apart from the fact that in the Red Hat abrt bug, it seems it happened when I hit Ctrl-R to reply to a message. It seems very random. I've been running in valgrind a lot and haven't seen anything that seems related. And wouldn't valgrind complain about using g_free on GSlice allocations? Or does using G_SLICE=always-malloc make that work out OK from valgrind's point of view?
To be honest I do not know. Valgrind usually claims when you try to free something "in the middle" of an allocated memory block (where I suppose G_SLICE is one large memory block), so you might be right that using G_SLICE=always-malloc can be counter productive for such issues, though for most other it perfectly fits.
Just seen it again with current master. Had just started evolution and was reading mail. Program received signal SIGSEGV, Segmentation fault.
+ Trace 224057
Thread 140736181815056 (LWP 14337)
This looks like it's the same as bug 624081, 618128, 623822, 623246, and maybe 621314
Came down this morning and found evolution sitting at the same crash again. I don't think there's any special trick to reproducing it. Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffdddf5710 (LWP 31367)] magazine_chain_pop_head (mem_size=16) at gslice.c:492 492 (*magazine_chunks)->data = chunk->next;
What we should really do is put the appropriate valgrind client calls into gslice.c -- see http://valgrind.org/docs/manual/mc-manual.html#mc-manual.mempools Then we wouldn't need to use G_SLICE=always-malloc to get sane results out of valgrind, and our use of valgrind wouldn't hide this problem. But I'm far too lazy to do that today for both allocators, so I'm using a dirty hack to ensure that using g_free() on GSlice memory will upset valgrind even with GSLICE=always-malloc: --- gslice.c~ 2010-09-13 16:57:51.000000000 +0100 +++ gslice.c 2010-10-12 11:08:21.000000000 +0100 @@ -839,7 +839,7 @@ g_slice_alloc (gsize mem_size) g_mutex_unlock (allocator->slab_mutex); } else /* delegate to system malloc */ - mem = g_malloc (mem_size); + mem = g_malloc (mem_size + 8) + 8; if (G_UNLIKELY (allocator->config.debug_blocks)) smc_notify_alloc (mem, mem_size); @@ -904,7 +904,7 @@ g_slice_free1 (gsize mem_size, { if (G_UNLIKELY (g_mem_gc_friendly)) memset (mem_block, 0, mem_size); - g_free (mem_block); + g_free (mem_block - 8); } TRACE (GLIB_SLICE_FREE((void*)mem_block, mem_size)); } @@ -980,7 +980,7 @@ g_slice_free_chain_with_offset (gsize abort(); if (G_UNLIKELY (g_mem_gc_friendly)) memset (current, 0, mem_size); - g_free (current); + g_free (current - 8); } }
Created attachment 172216 [details] [review] valgrind annotations for gslice This adds annotations to gslice, although it's not quite right because it still keeps track of the pages allocated with memalign(). So although it mostly seems to give the right complaints, it gives the wrong allocation trace: ==2251== 1 errors in context 1 of 2: ==2251== Invalid free() / delete / delete[] ==2251== at 0x4A04D72: free (vg_replace_malloc.c:325) ==2251== by 0x400588: main (in /home/dwmw2/slice) ==2251== Address 0x4f38020 is 32 bytes inside a block of size 496 alloc'd ==2251== at 0x4A04360: memalign (vg_replace_malloc.c:532) ==2251== by 0x4A043B9: posix_memalign (vg_replace_malloc.c:660) ==2251== by 0x4C683EB: slab_allocator_alloc_chunk (gslice.c:1164) ==2251== by 0x4C69970: g_slice_alloc (gslice.c:682) ==2251== by 0x400565: main (in /home/dwmw2/slice) In this case, 0x4f38020 was *also* allocated by a call to g_slice_alloc()...
(In reply to comment #10) > valgrind annotations for gslice Well, you should rather open a bug against glib and offer them this change. They will not look for glib patches in evolution bugs for sure. On the other hand, if this patch fixes the initial issue then this bug can be safely moved to glib, instead of filling the new bug.
The patch doesn't fix anything -- just makes valgrind work a little better with gslice, so we don't have to use GSLICE=always-malloc, and running in valgrind should actually catch this bug. But it isn't finished yet; I'll submit it when I've got a response to http://www.mail-archive.com/valgrind-users@lists.sourceforge.net/msg02045.html and fixed the remaining issues.
See bug 335126 for gslice/valgrind discussion. Since implementing the offset allocation as described in comment 9 above, I still haven't managed to trigger this.
I still haven't managed to reproduce this... until last night, when I tried to connect with 2.32 when I wasn't on the VPN. (Yes, it should *know* that it needs a VPN connection, and it should ask NetworkManager to make one before that particular account comes online, but that's about three separate RFEs for another day). This wasn't the box with my debugging version of glib, but it does make me strongly suspect that this is a manifestation of bug 631290 and bug 632212.
(In reply to comment #14) > This wasn't the box with my debugging version of glib, but it does make me > strongly suspect that this is a manifestation of bug 631290 and bug 632212. Do you propose to close this one in favour of one of these bugs? Maybe also part of bug #631804? I saw a strange crash in mail_msg_free, but after a workaround on this it didn't crash for a day (at other user's machine).
Dunno. In comment 7 I said this looked very similar to a number of other bugs, and I'm not sure that *all* of those are going to be the same as the imapx connect failure one. I suspect there may be a few memory corruptors, and we'll really need the gslice/valgrind stuff working to properly get to the bottom of all of them.
Comment on attachment 172216 [details] [review] valgrind annotations for gslice Patch "isn't finished yet" according to comment 12 --> needs-work
I'm closing this as obsolete.