GNOME Bugzilla – Bug 335126
Add support for valgrind hints to glib (GSlice)
Last modified: 2018-05-24 10:42:37 UTC
Attaching draft patch that adds support for valgrind hints to gslice. As recommended by valgrind docs, the two needed header files are copied into glib/glib/valgrind. The patch itself is pretty simple, it marks memory chunks that gslice allocates as unaccessible and in g_slice_alloc/alloc0/free marks the slices appropriately. It also uses the residual of size (chunk_size - mem_size) at the end of slices as red zones. The runtime overhead is negligible. Should be harmless to have on all the time, but we probably want to disable them with debug=off (only).
Created attachment 61546 [details] [review] gslice patch The patch
Created attachment 61547 [details] the glib/glib/valgrind tarball The included headers the Makefile.am. Extract it into glib/glib.
Created attachment 64020 [details] [review] gmem patch This adds a new public variable g_mem_in_checker that applications like ORBit can use to better behave under memory checkers. The patch initializes it to TRUE if running under valgrind.
Created attachment 64021 [details] [review] GArray patch This adds valgrind hints to GArray. Not tested beyond running tests/array-test under valgrind. It also modifies GArray implementation to shrink the allocation too. That's done very conservatively such that the average time for operations remains constant. A similar patch for GPtrArray is needed.
Behdad, wouldn't it make sense to rework your patch to hint valgrind only if g_mem_in_checker==TRUE? and, maybe g_mem_in_memcheck; would be a slightly better name for that variable... and last but not least, i really wonder whether it's worth adding a subdir for two headers. it's probably better to either: - put them under glib/glib/ directly, or - add a directory glib/memcheck/ or maybe even glib/contrib/ where future header files and similar things could go. (e.g. API definitions like ladspa.h or similar things from XDG.)
Created attachment 64031 [details] [review] correct GArray patch Correct GArray patch.
Tim, The valgrind manual says the hints are lightweight enough to leave in, but yes, wrapping in G_UNLIKELY (g_mem_in_memcheck) is even better. Ok, now that seems you are not against adding these, I'll apply your comments all and submit another patch. :)
For variable names g_mem_in_checker is ok, better than g_mem_in_memcheck IMO, personally I prefer g_mem_checker_running since it conveys a state and is more obvious to me when reading it.
I'd just like to point out that Behdad is a total stud for writing this.
Can this be turned off at runtime? Having the program change behaviour when valrinded ("valground"?) is probably desireable as a default, but is going to make address-dependent bugs very hard to find.
We can add a G_DEBUG option to set/unset g_mem_in_memcheck (or whatever that will be called.)
Morten, i don't understand your comment. what behaviour chanegs are you talking about? Behdad, you're still working on the new patch, right?
timj: Behdad's suggested patches would make the program do different things when run under valgrind compared to when run normally. That's the point of them and it is normally a good thing, but it is going to drive you nuts when hunting for address dependent bugs. Thus I would like the ability to somehow override.
Created attachment 65644 [details] [review] complete revised patch Ok, here is a much revised and almost complete, tested patch. The missing part is docs for g_mem_checker_running and updated docs for G_DEBUG. Here is a breakdown of what it does: - Adds valgrind headers into glib/contrib. Previously I was using valgrind HEAD headers, but apparently they don't work with older valgrinds, so now I'm using headers available on FC5. - gutils.c: Make g_parse_debug_string() match '-' and '_', and also accept space and comma as separators too (Fed's faves). - gmem.[ch]: Define new public variable g_mem_checker_running, that is automatically set to TRUE if running inside valgrind. Also define new G_DEBUG flags no-gc-friendly, checker-running, and no-checker-running. - garray.c: Like before, changes g_array to shrink allocation too, and adds valgrind hints all over it. This version does it for g_ptr_array too. Search in tests/array-test.c for valgrind to find a line to uncomment, to make the valgrind hints catch a memory error that valgrind cannot catch otherwise (no matter g_slice is used or not, or gc-friendly, or anything.) - gslice.c: This is where most of the work is done. All memory that is not allocated by the user are marked unaccessible. This includes the chunk links and slab info structs too; those structs are marked as readable on demand, and marked back as unaccessible when done. This makes it possible to detect any invalid memory access, like it will be with stock malloc. Search for valgrind in tests/slice-test.c for a line to uncomment to create a memory error that valgrind can now detect thanks to the valgrind hints. Some notes: o I've tested this using slice-test, both for magazine allocation and slab allocation. All work, but it does hit an assertion in leak detection logic of valgrind (FC5). I don't think it's my fault. o I had to move SlabInfo structs from the end of allocated pages to the front, to make sure no g_slice_alloc'ed block has the same base address of the page we alloc; otherwise valgrind will be confused. o There's an exception to the unaccessibility thing: I don't make the magazine layer's four specially made ChunkLinks unaccessible while the magazine is in use. That can be done for sure, but I thought this is a good compromise; I'm already a bit worried about the overhead of the hints (which are all in if (G_UNLIKELY(.)) of course.) That's it for now.
Morten, i'm sorry, i still don't get your point. what are those "different things" that you could possibly care about? the valgrind memory region tagging opertions amount to NOPs when not run under valgrind. and what do you mean by "adress dependent bugs", please?
I think all Morten was asking for was a way to turn the hints off. G_DEBUG=no-checker-running does that in my latest patch.
Behdad, i understand that Morten asked for a way to turn hinting off, but i don't think adding that functionality makes sense, so i'm trying to understand his reasoning.
for the record, section 3.2.2. of [Bonwick94] as referenced in gslice.c: http://citeseer.ist.psu.edu/bonwick94slab.html contains the reason on why the slab info was placed at the *end* of pages, quoting: The freelist linkage resides at the end of the buffer, rather than the beginning, to facilitate debugging. This is driven by the empirical observation that the beginning of a data structure is typically more active than the end. If a buffer is modified after being freed, the problem is easier to diagnose if the heap structure (freelist linkage) is still intact.
As for Morten's case, I think he's been more thinking along the lines of regular memory debuggers that alter the allocations, making some bugs hide under the debugger. However, this is not the case with valgrind, so no worries there. Being able to turn the hints off is helpful in debugging the hints themselves, or to get better stack traces.
Oops, to get better stack traces of memory errors one has to turn G_SLICE to malloc, so the hints are irrelevant there.
Behdad, i'm also a bit worried about the overhead of this. your patch simply touches and shuffles too many code portions of the allocator that have been carefully benchmarked and tested before. also, it has a bunch of coding style issues and in some places does unneccessary renames which doesn't make it easier to read. in one case, it even introduces a subtle bug (in gslice.c: -if (!sinfo->n_allocated) +else if (!sinfo->n_allocated) ) and i think in garray.c it breaks the g_mem_gc_friendly cleaning in some places. also, could you please extend on the reasoning for moving the slab info (which albeit has to be done a bit differently if done at all)? so far you said: o I had to move SlabInfo structs from the end of allocated pages to the front, to make sure no g_slice_alloc'ed block has the same base address of the page we alloc; otherwise valgrind will be confused. can you please explain what a confused valgrind is, and why we should shuffle allocator structures for this instead of simply filing a bug against valgrind? a similar question arises for the assertion you're talking about, we should probably check with the valgrind people what's up with this. and you said you're including older valgrind headers, i don't think that's a good idea either. it's bad enough that we have to copy valgrind headers, and it's unlikely that they'll be updated very frequently in glib, so i'd rather pick the newest version available. people running valgrind will update their versions automatically anyway, and the newer headers are needed for maximum portability. and finally, i don't quite understand your comment before _G_SLICE_VALGRIND_MALLOCLIKE_BLOCK(), it might be better to discuss this on IRC though.
Created attachment 65802 [details] [review] Patch with updated docs and removed SlabInfo hints Ok, another patch with updated docs and removed SlabInfo valgrind hints.
Thanks Tim for the comments. I've attached a new patch that addresses most of your issues: (In reply to comment #21) > Behdad, i'm also a bit worried about the overhead of this. your patch simply > touches and shuffles too many code portions of the allocator that have been > carefully benchmarked and tested before. Ok, I removed the hints around SlabInfo structs, so they are readable all the time. Of course this means that we don't catch problems when user code corrupts the slab info, but that's quite unusual anyway. The current patch do make ChunkLink structs unreadable, and that's very important IMO. Without these, the first eight bytes of g_slice_free'd memory will be read/writable and invalid accesses to freed memory can go unnoticed. Hopefully this doesn't look too bad this time. > also, it has a bunch of coding style issues and in some places does > unneccessary renames which doesn't make it easier to read. Ok, fixed all. The patch to gslice.c is way shorter now. I'll be happy to fix any style issues. > in one case, it even introduces a subtle bug (in gslice.c: > -if (!sinfo->n_allocated) > +else if (!sinfo->n_allocated) > ) Not sure if that's a bug. The first conditional handles the case that an slab was empty, ie. has no free chunks (I'll call that full btw), and the second conditional handles slabs that are not used, ie. have no allocated chunks. Given that a single chunk is freed in this function and there are at least 8 chunks per slab, I don't think both conditions can be true at the same time and so and "else" doesn't change semantics. > and i think in garray.c it breaks the g_mem_gc_friendly cleaning in some > places. Rechecked. I think it's fine. I've moved a bunch of g_mem_gc_friendly cleaning into a new function g_array_shrink(). > also, could you please extend on the reasoning for moving the slab info (which > albeit has to be done a bit differently if done at all)? so far you said: > o I had to move SlabInfo structs from the end of allocated pages to the > front, to make sure no g_slice_alloc'ed block has the same base address of > the > page we alloc; otherwise valgrind will be confused. > can you please explain what a confused valgrind is, and why we should shuffle > allocator structures for this instead of simply filing a bug against valgrind? When the slab info is at the end, and for chunk_size = 8 we get a perfect slicing into chunks, and so color == 0. Which means that the first slice of the page has the same address as the page itself. Now the page has been malloc'ed, and the slice has been malloclike'd. Later the slice is freelike'd, and later on the page may be free'd. What valgrind sees may be something like that: malloc (120) -> 0x12345600 // malloc/memalign/etc internally done by gslice malloclike (8) -> 0x12345600 // g_slice_alloc() freelike (0x12345600) // g_slice_free1() free (0x12345600) // freeing the page internally used by gslice and so valgrind errs on the second free, as it looks like a double free and also a mismatched free (memory allocated using malloclike, freed using free()). To fix this, valgrind has got to keep a list of allocations for each address, but that doesn't quite solve the problem. Imagine a user code that wrongly uses free() to free a gslice. What valgrind will see is: malloc (120) -> 0x12345600 // malloc/memalign/etc internally done by gslice malloclike (8) -> 0x12345600 // g_slice_alloc() free (0x12345600) // user error This is a very unfotunate problem: glibc cannot detect this wrong free(), and valgrind cannot either, unless valgrind keeps a stack (instead of a list) of allocations per address, and make sure free/freelike operations pop off the top of the stack. But such an assumption is quite arbitrary IMO. So, if there's no major problem with moving the slab info to the front, that's a lot easier. > a similar question arises for the assertion you're talking about, we should > probably check with the valgrind people what's up with this. That one I have to try with valgrind head and see what's wrong. > and you said you're including older valgrind headers, i don't think that's a > good idea either. it's bad enough that we have to copy valgrind headers, and > it's unlikely that they'll be updated very frequently in glib, so i'd rather > pick the newest version available. people running valgrind will update their > versions automatically anyway, and the newer headers are needed for maximum > portability. Agreed, specially if the latest valgrind fixes the assertion. Then the hints will simply don't do anything when run with older buggy valgrind. I also added a like in gmem.c to print log line about glib valgrind hints being used. > and finally, i don't quite understand your comment before > _G_SLICE_VALGRIND_MALLOCLIKE_BLOCK(), it might be better to discuss this on IRC > though. Ok, this is what's happenning: valgrind has a hint named VALGRIND_MALLOCLIKE_BLOCK(mem, mem_size, red_zone_size, zeroed). What it does is to mark [mem, mem+mem_size) as accessible memory (writable if !zeroed, readable if zeroed), and mark [mem - red_zone_size) and [mem + mem_size, mem + mem_size + red_zone_size) as unaccessible. Now in gslice, red_zone_size has got to be zero, as we don't have any redzones. But there is this wasted area after allocated memory and the next chunk if mem_size < chunk_size. I want to mark that as red zone. I'll discuss why that's a good idea later. To mark that area as redzone, I do VALGRIND_MALLOCLIKE_BLOCK(mem+mem_size+(chunk_size-mem_size)>>1, 0, (chunk_size-mem_size)>>1, 0), ie, I allocate a memory block of length 0, with total red zone around it filling the wasted area: Allocated slice of size 5 bytes red zone, 2bytes |========================================|<------><------>........| Chunk of size 8 bytes In this chunk of size 8, the first 5 bytes are allocated for user, the next two bytes are marked as red zone, and the final byte is not changed by this allocation. The final byte is still unaccessible and valgrind detects a memory error if it's accessed, but the error message returned says that it's (say) 15 bytes into a block of 128 allocated when this slab was malloced by gslice. The two red zone bytes however, if accessed, cause a memory error that is reported to be inside a block of size 2 allocated when the g_slice_alloc call was called, making the error message way more useful. An alternative is to MALLOCLIKE the entire chunk, and mark the final (chunk_size - mem_size) bytes as unaccessible.
thanks Behdad for the new patch and the very good explanation in the last comment. due to a bugzilla outage, the new patch was already partly discussed on irc. i'll reiterate the issues discussed: - intermixing valgrind macros: <rambokid> behdad: are you sure that _READABLE and _WRITABLE maybe freely intermixed with MALLOCLIKE and FREELIKE? <behdad> rambokid: yeah, I'm sure. <behdad> there's in fact a test case in valgrind source code, showing how to allocate a large chunk and give away slices. does pretty much what I do... <behdad> rambokid: http://valgrind.cvs.sourceforge.net/valgrind/valgrind/memcheck/tests/custom_alloc.c?view=markup - the GSlice pages allocated via memalign() should be marked NOACCESS. in turn, REDZONE marking should be removed from the patch. - the slab_info structure doesn't strongly need protection, so protection code for it can be left out if that simplifies the patch. - adding an "else" to the GSlice logic: > -if (!sinfo->n_allocated) > +else if (!sinfo->n_allocated) changes semantics in testcases which have 1 chunk per slab. so this needs to be backed out again. - a slight optimization will be: -if (G_UNLIKELY (g_mem_checker_running && (_cond))) +if (G_UNLIKELY (g_mem_checker_running) && (_cond)) since that enables the compiler to branch out earlier. - stamps will have to be changed throughout the code from guint to gint. that eliminates the need for this change: - while (ABS (stamp - magazine_chain_uint_stamp (current)) >= allocator->config.working_set_msecs) + while (ABS ((gint)stamp - (gint)magazine_chain_uint_stamp (current)) >= allocator->config.working_set_msecs) - allow ';' as debug-key seperator, so: + q = strpbrk (p, ":, "); should become: + q = strpbrk (p, ":;, "); - the GSlice MAKE_* macros should be removed and manually expanded. - some other macros need to be renamed: s/_G_SLICE_VALGRIND_MAKE/CHECKER_HINT/
Created attachment 66163 [details] [review] Updated patch Updated patch. All Tim's comments applied. Includes valgrind headers from svn head and works with such valgrind. I spent a few hours debugging the assertion failure but didn't achieve much. The minimal test case to hit it is a program that does g_slice_alloc(1);
Seems like I just wasted half a day while the assertion failure is indeed what my first guess was when I first saw it... It's a memcheck sanity check, checking that malloc/calloc/memalign/new/new[]/malloclike'd blocks do not overlap. This works for their sample custom allocator that uses mmap(), but not for gslice that uses memalign to obtain its slab memory. Reporting to valgrind developers...
Mail composed to valgrind-user list, contains a workaround patch for valgrind: http://sourceforge.net/mailarchive/forum.php?thread_id=10501096&forum_id=32038
Seems like giving away using MALLOCLIKE from memory that is malloc/memalign'ed is not supported by valgrind... If we don't do MALLOCLIKE/FREELIKE, we cannot catch leaks. Tim, what do you think about switching to always-malloc when running on valgrind? The GArray fixes are still valuable no matter what we do in GSlice.
The valgrind developers are actually working on adding a new hint that makes it work for us...
re comment #28, i much prefer a solution that is less intrusive than switching allocators (allocation patterns, memory initialization) automagically when running valgrind. users can always switch with G_SLICE=always-malloc themselves. discussing this with the valgrind folks upstream and have them fix the assertions for us is a much better approach, thanks for spotting and reporting Behdad.
Just a quick update here. Graydon Hoare of Mozilla fame has recently sent patches to valgrind to do leak checking on mempool allocations, and made leakcheck deal with that. So, all we need here is to use the mempool client requests instead of malloclike. I still like to see a client request to get valgrind version, to be able to check whether a good enough valgrind is available and not misleadingly act like gslice allocations are being checked while they are not really because of an old valgrind...
(In reply to comment #31) > Just a quick update here. Graydon Hoare of Mozilla fame has recently sent > patches to valgrind to do leak checking on mempool allocations, and made > leakcheck deal with that. So, all we need here is to use the mempool client > requests instead of malloclike. does this also fix the valgrind bug where valgrind gets confused if memory regions are tagged as client mempool chunks, eventhough the same memory is tagged as malloclike? (since gslice allocates its mempools through malloc() or memalign()). > I still like to see a client request to get valgrind version, to be able to > check whether a good enough valgrind is available and not misleadingly act like > gslice allocations are being checked while they are not really because of an > old valgrind... what kind of checking/acting do you have in mind here specifically? in general, changing behaviour in the presence of a (memory) profiler or debugger is a bad idea.
(In reply to comment #32) > (In reply to comment #31) > > Just a quick update here. Graydon Hoare of Mozilla fame has recently sent > > patches to valgrind to do leak checking on mempool allocations, and made > > leakcheck deal with that. So, all we need here is to use the mempool client > > requests instead of malloclike. > > does this also fix the valgrind bug where valgrind gets confused if memory > regions are tagged as client mempool chunks, eventhough the same memory is > tagged as malloclike? (since gslice allocates its mempools through malloc() or > memalign()). Yes, that fixes it for mempools by removing any malloc/memalign'ed block that intersects with any pool allocation from the leak-check working set. > > I still like to see a client request to get valgrind version, to be able to > > check whether a good enough valgrind is available and not misleadingly act like > > gslice allocations are being checked while they are not really because of an > > old valgrind... > > what kind of checking/acting do you have in mind here specifically? The leak-checking! With older valgrinds, mempools are not leakchecked, with HEAD, they are. I don't want glib to pretend gslice is being leakchecked (by printing out the like that glib valgrind hints are enabled) when that's not true because the valgrind version is too old. > in general, changing behaviour in the presence of a (memory) profiler or > debugger is a bad idea.
(In reply to comment #33) > (In reply to comment #32) > > > I still like to see a client request to get valgrind version, to be able to > > > check whether a good enough valgrind is available and not misleadingly act like > > > gslice allocations are being checked while they are not really because of an > > > old valgrind... > > > > what kind of checking/acting do you have in mind here specifically? > > The leak-checking! With older valgrinds, mempools are not leakchecked, with > HEAD, they are. I don't want glib to pretend gslice is being leakchecked (by > printing out the like that glib valgrind hints are enabled) when that's not > true because the valgrind version is too old. cleared this up on irc, Behdad refers to the following two lines here: + if (G_UNLIKELY (g_mem_checker_running)) + VALGRIND_PRINTF ("GLib valgrind hints enabled"); i think that can simply be left out as long as we can't figure the valgrind version. speaking of which, Behdad can you please point out the specific valgrind version that is required for your patch now? and also please update the latest attachment to this report (attachment id=66163) which still has a couple issues: - the enum values declared in gmem.c (gc_friendly, etc.) should be all uppercase, according to glib coding style. - the IF_CHECKER_RUNNING and CHECKER_HINT_* macros should vanish and VALGRIND_* macros should be called directly in the code. for me, the current indirection just makes the code harder to read. - i think during one of our conversations, Behdad and me agreed that moving the slab_info structure to the beginning of a page in allocator_add_slab() and slab_allocator_free_chunk() doesn't really provide a significant benefit (and some timings during gslice development showed that this can decrease performance), so this should be backed out. - adding LIKELYness hints anywhere in allocator_memalign() is simply overkill, considering its call frequency or CPU consumption. - parts of the patch conflict with current CVS (debug_key_matches() etc.) - the patch adds contrib/memcheck.h from valgrind. it was mentioned that, according to valgrind policy, headers shouldn't be copied over but be included by third-party programs. i'm not sure what the resolution on this one was... - tests/slice-test.c contains some pretty nasty hackery, such as + g_malloc(0); which relies on implementation details such as g_malloc0() checking g_mem_initialized before n_bytes (at the very least a comment is missing here to describe the intention of this line). if this file really has to be changed at all, i'd rather see the patch be reduced to contain the *minimum* change set in order to trigger the valgrind error mentioned in a comment if that is still possible/required.
*** Bug 353328 has been marked as a duplicate of this bug. ***
Behdad: ping. is a submitting a new/cleaned up patch still on your TODO list?
Sure it is. In the mean time, valgrind has added some new hints that should be enough for us. I'll update the patch soon.
I don't wish to be a jerk, seeing the incredible amount of work poured into this patch, but in my experience the best thing would just be to have G_SLICE=always-malloc behavior if valgrind is detected. I've found that to be always very effective in practice, but many people forget to do it, and valgrind becomes less effective for them. If you are running under valgrind you will get different results out of malloc(), so a lot of the concerns are a little confusing here.
Created attachment 88577 [details] [review] always-malloc under valgrind In my experience, there is almost never a time you want to use gslice under valgrind, so this patch disables it.
Comment on attachment 88577 [details] [review] always-malloc under valgrind (In reply to comment #38) > I don't wish to be a jerk, seeing the incredible amount of work poured into > this patch, but in my experience the best thing would just be to have > G_SLICE=always-malloc behavior if valgrind is detected. I've found that to be > always very effective in practice, but many people forget to do it, and > valgrind becomes less effective for them. that might actually put a nice end to this bug, especially since it doesn't seem to be moving forward any on the other fronts. (In reply to comment #39) >--- glib/gslice.c 2007-03-08 20:01:39.000000000 -0800 >+++ glib/gslice.c 2007-05-21 20:24:20.000000000 -0700 >@@ -37,6 +37,12 @@ > #ifdef HAVE_UNISTD_H > #include <unistd.h> /* sysconf() */ > #endif >+#ifdef HAVE_VALGRIND_H /* hypothetical (plausibly given in CFLAGS) */ >+#include <valgrind.h> >+#endif >+#ifdef HAVE_VALGRIND_VALGRIND_H /* typical linux install location */ >+#include <valgrind/valgrind.h> >+#endif this has to be changed, if we attempt to detect valgrind, we need to always do it. we can accomplish that by copying valgrind.h into glib, including it here, and by periodically updating it. alternatively, if the valgrind detection can be considered reasonably stable, we can just cut-and-paste the relevant snippet here. > #ifdef G_OS_WIN32 > #include <windows.h> > #include <process.h> >@@ -275,6 +281,12 @@ > { > config->always_malloc = TRUE; > } >+#if defined(HAVE_VALGRIND_H) || defined(HAVE_VALGRIND_VALGRIND_H) >+ if (RUNNING_ON_VALGRIND) >+ { >+ config->always_malloc = TRUE; this should definitely not occour unverbose. it's too easy to confuse people to silently change behavior under the hood. so we should have something here like: fputs ("\n***MEMORY-MESSAGE***: ", stderr); fprintf (stderr, "%s[%u]: GSlice: valgrind detected, forcing G_SLICE=always-malloc mode", prgname, pid); (see _g_slice_thread_init_nomessage() for similar messaging code) >+ } >+#endif > } > > static void >--- configure.in 2007-03-08 20:01:50.000000000 -0800 >+++ configure.in 2007-05-21 20:27:42.000000000 -0700 >@@ -794,6 +794,13 @@ > AC_CHECK_HEADERS([sys/time.h sys/times.h sys/wait.h unistd.h values.h]) > AC_CHECK_HEADERS([sys/select.h sys/types.h stdint.h sched.h malloc.h]) > >+# check for valgrind headers (to disable gslice) >+# afaik, only valgrind/valgrind.h is used. >+# but valgrind.h seems like an idea for good measure. >+# (we could add a more sophisticated check to ensure RUNNING_ON_VALGRIND >+# is actually defined) >+AC_CHECK_HEADERS([valgrind.h valgrind/valgrind.h]) >+ this check is not needed then.
I have a new patch which copies valgrind.h into glib and includes it #include "valgrind.h" unconditionally. I added a guard "#if !defined(NVALGRIND) && defined(RUNNING_ON_VALGRIND)" as it wasn't clear to me that RUNNING_ON_VALGRIND is defined on unsupported architectures. A careful study of valgrind.h could probably determine if the guard is necessary, but it probably doesn't hurt.
Created attachment 88607 [details] [review] use always-malloc under valgrind This patch includes a local copy of valgrind.h, which will in included in the dist tarball. When valgrind is detected, it prints a message and drops to always-malloc behavior.
: This patch includes a local copy of valgrind.h Sorry, but are you sure that included copy of valgrind.h, which seems to be from valgrind-3.x, is and will be compatible with valgrind itself on the target host? For example with valgrind from the FreeBSD ports (which is analog of valgrind-2.x accordingly to the http://valgrind.org/info/platforms.html)?
(In reply to comment #43) > : This patch includes a local copy of valgrind.h > > Sorry, but are you sure that included copy of valgrind.h, which seems to be > from valgrind-3.x, is and will be compatible with valgrind itself on the target > host? > > For example with valgrind from the FreeBSD ports (which is analog of > valgrind-2.x accordingly to the http://valgrind.org/info/platforms.html)? > I don't precisely understand your concerns. I cannot test on every platform, I'm not a valgrind dev, etc, so i really am not sure. If you are concerned about this causing build problems, the header conditionalizes itself pretty strictly. So hopefully that'll be ok. If you are concerned about this client api not matching the valgrind host version, the header contains comments about how the opcodes used must not change for compatibility, so i think that this very conservative API isn't changing much, even between valgrind versions. On the other hand, there's nothing to do about the fact that valgrind will be ported to new systems in the future, and so the copy of valgrind.h will need periodic updating. In the meantime, valgrind will be undetected on those platforms. In general, I would assume that the later header would include portability fixes etc, perhaps even the one from svn would be good. (I just used whatever was on my system, actually)
: If you are concerned about this client api not matching the valgrind host : version, the header contains comments about how the opcodes used must not : change for compatibility, so i think that this very conservative API isn't : changing much, even between valgrind versions. Yes, I worry about header version vs. host valgrind version. I will check it using header from your patch and valgrind from my FreeBSD host...
1. "new" version of the RUNNING_ON_VALGRIND macro unable to find "old" valgrind core (returns zero in all cases, under valgrind or not). 2. [just a note]: RUNNING_ON_VALGRIND is macro without parameters, therefore, gcc-3.4.6 forbids following construction (given from the patch) if (RUNNING_ON_VALGRIND ()) { /* .... */ } with "called object is not a function" error. Conclusion: while using of the "new" header doesn't produce a harm (what was my fear), it gives no advantage under FreeBSD and, I suspect, under another systems except Linux. Therefore, I would to prefer using host's valgrind.h if it is found at the configure time.
(In reply to comment #46) > Conclusion: > while using of the "new" header doesn't produce a harm (what was my fear), it > gives no advantage under FreeBSD and, I suspect, under another systems except > Linux. Therefore, I would to prefer using host's valgrind.h if it is found at > the configure time. that is really not a suitable option unfotunately. because for the vast majority of glib deployments, the build host doesn't match the runtime host, so the build hosts don't need to have valgrind.h when it'll be needed during runtime. these are the practical options we have: a) let glib have a hard dependency on valgrind (fairly absurd); b) include valgrind.h in glib (seems to run into valgrind.h version incompats); c) cut-and-paste valgrind detection macros from multiple valgrind.h versions (is tedious to maintain); d) close this bug WONTFIX and don't special case valgrind runtime environment. so, (a) and (b) are ruled out for the mentioned reasons. if someone volounteers to take care of constructing and maintaining (c), fine. otherwise i'll WONTFIX this bug as described in (d).
I really don't understand the technical problem with a conditional dependency. It doesn't seem like an optional build dependency is unusual...
Would you mind moving your patch to another bugzilla entry? I do intend to finish this patch at some point (or someone else is more than welcome to do that). I don't want a FIXED over this instead when it's not. Also, for printing something I suggest something like: VALGRIND_PRINTF ("GLib valgrind hints enabled");
(In reply to comment #49) > Would you mind moving your patch to another bugzilla entry? > I do intend to finish this patch at some point (or someone else is more than > welcome to do that). I don't want a FIXED over this instead when it's not. Well, to be honest there are actually quite a few advantages to this approach besides its simplicity. Namely, valgrind spaces out the allocations and avoids reusing addressses so help flush out stale pointer uses and overruns and underruns. By definition, these don't mesh with allocators designed to benefit from caching, or with space-compact allocators. In any event, your patch will have the same distribution problems that Tim's quibbling over, so we may as well resolve those issues here.
(In reply to comment #48) > I really don't understand the technical problem with a conditional dependency. > > It doesn't seem like an optional build dependency is unusual... well, glib is a fairly low level component compared to other libraries/apps which is why most users and also developers get to use it as a pre-built package. everyone who will want to do a debug run with glib (i.e. even users actively filing bugs) in valgrind will need the above valgrind special casing. but if the availability of that is dependent on the random presence of valgrind.h on the package builder's system, you effectively randomize allocator behavior under valgrind which is exactly *not* what people need for debugging scenarios. so the net usability results of optionally supporting valgrind.h are simply unaccaptable. (optional header support is not a _technical_ problem if that answers your question.)
(In reply to comment #49) > Would you mind moving your patch to another bugzilla entry? > I do intend to finish this patch at some point (or someone else is more than > welcome to do that). I don't want a FIXED over this instead when it's not. Hey Behdad. with GSlice supporting G_SLICE=debug-blocks now to check proper use of the g_slice_* API (and does so much faster than using valgrind), and with the ability to run a valgrind session with G_SLICE=always-malloc to have valgrind do all its memory block checks also on slices, i really do see fairly little use in supporting extended valgrind hinting in GSlice code now. the downsides i see are: - adding valgrind hinting affects the common code GSlice code path; - the valgrind hinting patches we discussed need fairly complex changes; - there is no clear benefit (to me) in using complex valgrind hinting with slices over simply running valgrind with G_SLICE=always-malloc; - we'll need to depend on fairly new valgrind versions for the support. and from my GSlice experience so far, i can say that all slice bug cases not caught by always-malloc + valgrind are now taken care of by debug-blocks. so, could you please weight in your pros here, so we can reestimate the situation? > Also, for printing something I suggest something like: > VALGRIND_PRINTF ("GLib valgrind hints enabled"); i don't see how that's better than if (VALGRIND) { [...]; fprintf (stderr,""); } could you please elaborate?
(In reply to comment #51) > (In reply to comment #48) > > I really don't understand the technical problem with a conditional dependency. > > > > It doesn't seem like an optional build dependency is unusual... > > well, glib is a fairly low level component compared to other libraries/apps > which is why most users and also developers get to use it as a pre-built > package. > everyone who will want to do a debug run with glib (i.e. even users actively > filing bugs) in valgrind will need the above valgrind special casing. but if > the availability of that is dependent on the random presence of valgrind.h on > the package builder's system, you effectively randomize allocator behavior > under valgrind which is exactly *not* what people need for debugging scenarios. > so the net usability results of optionally supporting valgrind.h are simply > unaccaptable. (optional header support is not a _technical_ problem if that > answers your question.) > Well, as you requested, it prints out its behavior so I think most people will be aware. I think "simply unacceptable" is pretty overstated-- the effectiveness of valgrind will magically (but not quietly) improve if the build env supported valgrind, it will remain as it is if it didn't have the header... so what? Vast majority of people use packaged glib anyways and packagers for distros can add build-dependencies. I guess I am assuming that in the common case, people will simply run "valgrind PROGRAM" and expect things to work as well as possible. The difference is quite graphic between G_SLICE=always-malloc and not, but i would think that's pretty cryptic for anyone not involved in glib internals.
(In reply to comment #52) > (In reply to comment #49) > > Would you mind moving your patch to another bugzilla entry? > > I do intend to finish this patch at some point (or someone else is more than > > welcome to do that). I don't want a FIXED over this instead when it's not. > > Hey Behdad. with GSlice supporting G_SLICE=debug-blocks now to check proper use > of the g_slice_* API (and does so much faster than using valgrind), and with > the ability to run a valgrind session with G_SLICE=always-malloc to have > valgrind do all its memory block checks also on slices, i really do see fairly > little use in supporting extended valgrind hinting in GSlice code now. the > downsides i see are: > - adding valgrind hinting affects the common code GSlice code path; > - the valgrind hinting patches we discussed need fairly complex changes; > - there is no clear benefit (to me) in using complex valgrind hinting with > slices over simply running valgrind with G_SLICE=always-malloc; > - we'll need to depend on fairly new valgrind versions for the support. > > and from my GSlice experience so far, i can say that all slice bug cases not > caught by always-malloc + valgrind are now taken care of by debug-blocks. so, > could you please weight in your pros here, so we can reestimate the situation? There is at least parts of my patch adding hints to GArray. Those need to be applied still. - there is no clear benefit (to me) in using complex valgrind hinting with slices over simply running valgrind with G_SLICE=always-malloc; > > Also, for printing something I suggest something like: > > VALGRIND_PRINTF ("GLib valgrind hints enabled"); > > i don't see how that's better than if (VALGRIND) { [...]; fprintf (stderr,""); > } could you please elaborate? Well, fprintf from a library is not a good idea. It should either do VALGRIND_PRINTF or g_message. I prefer VALGRIND_PRINTF because the message will end up with other valgrind messages (default to stderr, but can be redirected to a log file).
(In reply to comment #54) > (In reply to comment #52) > There is at least parts of my patch adding hints to GArray. Those need to be > applied still. ok, agreed. that of course also bears the issue of which valgrind header to take, however i'd hope using the newest valgrind.h from SVN is good enough for the GArray hinting. > - there is no clear benefit (to me) in using complex valgrind hinting with > slices over simply running valgrind with G_SLICE=always-malloc; ok, depending on whether we can reliably and portably detect valgrind across multiple versions then, we can automate this or not - or at least printout a warning if we suspect a valgrind session and are not configured for always-malloc. > > > Also, for printing something I suggest something like: > > > VALGRIND_PRINTF ("GLib valgrind hints enabled"); > > > > i don't see how that's better than if (VALGRIND) { [...]; fprintf (stderr,""); > > } could you please elaborate? > > Well, fprintf from a library is not a good idea. It should either do > VALGRIND_PRINTF or g_message. I prefer VALGRIND_PRINTF because the message > will end up with other valgrind messages (default to stderr, but can be > redirected to a log file). ah, didn't know that that macro is a valgrind facility. about g_message/g_warning, yes usually libraries should use that, gslice.c is just a special case and has to use fprintf or fputs in places because it's an allocator and can't rely on other parts of the library. tbf/bratsche, do you think you could rework the last patch from Behdad to include just the garray.c changes + a newest valgrind header and make sure it applies to trunk, give it some testing, and then repost here for final review?
I might look at it tomorrow morning.
Created attachment 88812 [details] [review] Updated patch for trunk This is an updated patch that can apply against trunk. I haven't done any testing with it yet, but I might have time to do that later tonight unless tbf wants to.
btw, that patch contains valgrind.h header from valgrind's trunk.
What's the current state of this patch? Has it been applied, or not yet?
Tim, Cody, can we make some progress here? There's a few things involved. - My original patch had valgrind annotations for garray. That should go in intact. - There's new API to know if running on a checker. That can go in too. - Re how to handle gslice, I still like my original approach of not changing code path. My approach allows debugging gslice issues, but doesn't handle the need for always-malloc behavior. Alternative approach is to switch to always-malloc if checker running. I'm fine with that too. Lets please get it in. Perhaps add a new option to gslice, for always-gslice, which inhibits automatic switching?
The problem with always-malloc is that when code allocates with gslice and then frees with g_free(), using always-malloc makes that work and valgrind doesn't even complain. In order to track down some code I suspect of doing exactly that in Evolution (bug 630293), I'm currently running a version of gslice which uses always-malloc but allocates an extra 8 bytes and adds 8 bytes to the pointer it returns -- so calling g_free() on that will always cause a problem. I attempted to provide hinting for gslice, but ran into a Valgrind bug with the mempool tracking: https://bugs.kde.org/show_bug.cgi?id=254420
(In reply to comment #61) > The problem with always-malloc is that when code allocates with gslice and then > frees with g_free(), using always-malloc makes that work and valgrind doesn't > even complain. always-malloc isn't useful for GSlice allocation debugging, G_SLICE=debug-blocks is, see this: http://blogs.gnome.org/timj/2007/01/02/28122006-g_slicedebug-blocks/
Will G_SLICE=debug-blocks catch the case of a program allocating with GSlice and then freeing with g_free() ?
(In reply to comment #63) > Will G_SLICE=debug-blocks catch the case of a program allocating with GSlice > and then freeing with g_free() ? GSlice can't catch that, since by default g_free() is just free(), so it's not under GSlice control. However MALLOC_CHECK_=2 will catch this, see the glibc manual page on malloc(3).
Thanks. That may be useful in future, but I've mostly got used to how slow Evolution runs under Valgrind, so I may as well stick with it for now :) Valgrind does useful things like leak detection which debug-blocks presumably can't do either, so I think we do still want to make the valgrind annotations work.
To get proper coverage, you should do different runs with different debug settings -- that is especially true to the extent that you can automate things. valgrind with G_SLICE=always-malloc is good. Plain with G_SLICE=debug-blocks catches less, but occasionally things that valgrind does not (==cannot) see.
David, have you checked my patch?
Re: Comment 51 (timj's rebuttal of conditional build-dep on valgrind) I just want to continue to press for support for using the user's valgrind.h instead of an included valgrind.h. Tim, your chief argument against the conditional dep is that it introduces uncertainty. But at this point, that's moot, b/c obviously some glib 2 installs will have valgrind support, and some won't-- the idea has been on the table 4 years now -- and it'll probably confuse someone-- so the randomization effect is necessary for progress. Furthermore, most packagers could trivially add a build dependency on valgrind which make this pretty reliable. Still, this simple patch would probably have saved countless novice hours debugging .. i have first-hand experiences dealing with novices :) it would be great to get something committed. It is possible to make a bug to deal with remaining issues once something concrete is committed.
I do really like the idea of a debugging-style always-malloc that allocates N extra bytes and returns N bytes into the total allocation -- but unfortunately it reduces the power of valgrind's underrun detection so i'd be hesistant to actually do it. It might also confuse users who calculate/guess allocation sizes based on "allocated N bytes in M blocks" messages (since it would add an unexpected padding).
*** Bug 608947 has been marked as a duplicate of this bug. ***
I guess we should re-evaluate this bug and patches, now that we do include valgrind.h inside GLib.
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/42.