GNOME Bugzilla – Bug 698716
Use of g_mem_set_vtable() breaks after gobject automatic init ctor
Last modified: 2013-05-22 00:31:31 UTC
Steps to reproduce: Download a Glib 2.36.1 tarball and ./configure it with GCC 4.8.0 and CFLAGS="-Wall". Then config.log contains configure:21628: checking if malloc() and friends prototypes are gmem.h compatible configure:21654: gcc -c -Wall -Werror conftest.c >&5 conftest.c: In function 'main': conftest.c:66:13: error: variable 'my_realloc_p' set but not used [-Werror=unused-but-set-variable] void* (*my_realloc_p) (void*, size_t) = realloc; ^ conftest.c:65:13: error: variable 'my_free_p' set but not used [-Werror=unused-but-set-variable] void (*my_free_p) (void*) = free; ^ conftest.c:64:13: error: variable 'my_malloc_p' set but not used [-Werror=unused-but-set-variable] void* (*my_malloc_p) (size_t) = malloc; ^ conftest.c:63:13: error: variable 'my_calloc_p' set but not used [-Werror=unused-but-set-variable] void* (*my_calloc_p) (size_t, size_t) = calloc; ^ cc1: all warnings being treated as errors configure:21654: $? = 1 configure: failed program was: ... instead of the correct (output of ./configure with default CFLAGS) configure:21628: checking if malloc() and friends prototypes are gmem.h compatible configure:21654: gcc -c -g -O2 -Werror conftest.c >&5 configure:21654: $? = 0 configure:21663: result: yes As a result, the macro SANE_MALLOC_PROTOS gets incorrectly undefined in config.h.
Yeah, this check sucks. It dates back over 12 years to https://git.gnome.org/browse/glib/commit/?id=782a8e2e7c69c3d98bd69bcfdbb65ded520576f4 GLib uses -Wall by default, so you can work around this by simply not passing it. But maybe we should try detecting known platforms (glibc e.g.) and assuming it's OK.
This is just checking that malloc(), realloc(), calloc(), and free() have the prototypes specified by C90... I think we can probably just ditch the check.
Note that I just wish to point out that enabling or disabling compiler warnings should not change the behavior of the software (thus, -Werror isn't good). There's little benefit in enabling -Wall when just building the software for production use, of course. But this bug report comes from an anecdotal situation, where Glib 2.36.1 was crashing in Arch Linux and someone observed that simply rebuilding it with "-Wall" fixes the crash. The real difference was SANE_MALLOC_PROTOS being defined or not :-) Reference: https://bugs.archlinux.org/task/34630
Created attachment 242572 [details] [review] 0001-configure-Assume-C90-compatible-malloc-prototype.patch
Comment on attachment 242572 [details] [review] 0001-configure-Assume-C90-compatible-malloc-prototype.patch > * REALLOC_0_WORKS is defined if g_realloc (NULL, x) works. oh, I guess we don't need this any more either. C90 says "If ptr is a null pointer, the realloc function behaves like the malloc function for the specified size." >+#define standard_malloc malloc so we can get rid of all the standard_* defines too
Created attachment 242615 [details] [review] 0001-configure-Assume-C90-compatible-malloc-prototype.patch Now with even more code deleted! \o/
Comment on attachment 242615 [details] [review] 0001-configure-Assume-C90-compatible-malloc-prototype.patch great
https://git.gnome.org/browse/glib/commit/?id=518e3104bf6cdb5d8e6b43d3b721805db5951139
This commit reintroduces the bug from Archlinux Andrey mentioned above. How should this bug be fixed?
The relevant comment appears to be this one: https://bugs.archlinux.org/task/34630#comment108896 I'll have to think about this.
what flags does arch compile glib with?
@Colin: It seems like using g_malloc() from __attribute__((constructor)) is a bad idea in the first place because of g_mem_set_vtable(). The documentation for g_mem_set_vtable() says, "This function must be called before using any other GLib functions." But an __attribute__((constructor)) function gets called even before main(), so in order to use g_mem_set_vtable() safely, an application would have refrain from linking to libgobject.so and instead dlopen() it after calling g_mem_set_vtable().
@Dan: None, until they added -Wall as a workaround for this bug. https://projects.archlinux.org/svntogit/community.git/plain/trunk/PKGBUILD?h=packages/lib32-glib2
Jonh: Arch packages are compiled using the default values from /etc/makepkg.conf. As of now, for x86-64 these are CPPFLAGS="-D_FORTIFY_SOURCE=2" CFLAGS="-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4" CXXFLAGS="-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4" LDFLAGS="-Wl,-O1,--sort-common,--as-needed,-z,relro"
Comment #12 sounds right; so https://git.gnome.org/browse/glib/commit/?id=7c42ab23b55c43ab96d0ac2124b550bf1f49c1ec from 2.36 broke users of g_mem_set_vtable that were also linking to libgobject. What to do about it? I'm not sure we could just ignore calls to g_mem_set_vtable because the application could assume its own malloc functions were in place, and thus might freely intermix g_free()/myapp_free(). If we can't solve this glib side, we should at least strongly deprecate g_mem_set_vtable(); while theoretically there could be apps that only link to libglib that use it, in practice I doubt their existence. Ryan, any opinions here?
The only way that I think we can fix this is by advertising the name of a symbol (like "glib_get_mem_config" that you can have in your executable. If we dlsym() and find it, we let you change the memory config using it. Otherwise, we probably do want to deprecate it. I really want to avoid the idea that it's valid to call g_malloc() without calling g_free() (or vice versa). In the future, in context of our valgrind-friendly bug, I want to be able to switch on malloc/free counting so we can run our testcases outside of valgrind and still catch leaks.
Ok, per: https://bugs.archlinux.org/task/34630#comment108896 it sounds like Skype is not using g_mem_set_vtable(); it's likely using LD_PRELOAD to override malloc(), or including a definition in its main DSO. "As it happens, this bug only affects the pointer to malloc(), not the wrapper function; that's why the crash goes away if SANE_MALLOC_PROTOS is not defined." That indeed sounds very bizarre; I suspect it's something like the dynamic linker runs over data sections before text and processing constructors. So when the malloc pointer happens to live in a static structure, it gets processed first. Regardless though, this seems like a generic incompatibility between GLib's use of a constructor and applications that want to override malloc() - both via symbol overrides and g_mem_set_vtable().
> it sounds like Skype is not using g_mem_set_vtable(); it's likely using > LD_PRELOAD to override malloc(), or including a definition in its main DSO. More specifically, the Skype binary contains an invalid PLT, which is initialized by some startup code in Skype after shared library initialization. (This behavior seems to be unique to Skype; a test application that I linked with GCC 4.8.0 produced a valid PLT and did not modify it during startup.) ld-linux.so apparently assumes that an application's PLT is always valid and that it's okay to use it to resolve symbols in shared libraries also. Anyway, the reason it works without SANE_MALLOC_PROTOS is simply that GCC can optimize out the PLT relocation for a direct function call but not for a stored function pointer.
(In reply to comment #18) > More specifically, the Skype binary contains an invalid PLT, which is > initialized by some startup code in Skype after shared library initialization. > (This behavior seems to be unique to Skype; Thanks for debugging and explaning this, John! So there really are two separate bugs here. First is what we should do about g_mem_set_vtable() - I jumped to the conclusion Skype was using it, but apparently not. Second is whether we should carry a workaround for the Skype PLT breakage GLib side. I'd lean towards no. Unless there were multiple applications affected (say a known broken toolchain).
(In reply to comment #19) > Thanks for debugging and explaning this, John! So there really are two > separate bugs here. First is what we should do about g_mem_set_vtable() This is more-or-less the same as bug 687763.
Interesting bug. Has someone reported the issue to Skype? Do you have a recommendation for distros to work around it?
A Skype developer friend reported it internally with issue ID LINUX-1675. I don't think there's a way for Joe Public to view these though.
(In reply to comment #22) > A Skype developer friend reported it internally with issue ID LINUX-1675. I > don't think there's a way for Joe Public to view these though. Knowing exactly what it is that's going wrong, and if/when they expect to fix it would help us figure out how/if we should try to work around it on the glib side.
(In reply to comment #23) > Knowing exactly what it is that's going wrong, and if/when they expect to fix > it would help us figure out how/if we should try to work around it on the glib > side. Sure. They have a link to this bug so might reply directly. If I hear anything then I'll reply too.
(In reply to comment #21) > Interesting bug. Has someone reported the issue to Skype? Do you have a > recommendation for distros to work around it? In the 2.36 branch, the easiest workaround I know of is to patch gmem.c changing change "#ifdef SANE_MALLOC_PROTOS" to "#if 0". I don't know of any workaround for glib master since https://git.gnome.org/browse/glib/commit/glib/gmem.c?id=518e3104bf6cdb5d8e6b43d3b721805db5951139.
Skype for Linux 4.2 has been released: http://blogs.skype.com/2013/05/20/skype-for-linux-4-2/ The new Skype version seems to work fine with the current glib 2.37.
Skype 4.2 works for me also.
Ok, closing here as I can't think of a sane workaround GLib side, and it's hopefully easy enough for people to update Skype. If someone comes up with a not-too-gross workaround for GLib, feel free to reopen.