GNOME Bugzilla – Bug 791622
Disable strict-aliasing in GLib
Last modified: 2018-01-08 14:03:52 UTC
The culprit of the problem is the following commit: commit 88ad0dab214799f17f0ddc463d10f44c00587dbf Author: Philip Withnall <withnall@endlessm.com> Date: Fri Apr 28 12:05:42 2017 +0100 gdbusconnection: Add some comments about object ownership Some annotations I made while trying to debug bug #781847. They introduce no behavioural changes. Signed-off-by: Philip Withnall <withnall@endlessm.com> The test case is the following: #include <gio/gio.h> int main(int argc, char **argv) { GError *error = NULL; GDBusConnection *bus_connection; GCancellable *cancellable; cancellable = g_cancellable_new(); bus_connection = g_bus_get_sync(G_BUS_TYPE_SYSTEM, cancellable, &error); if (bus_connection == NULL) { g_error("Got an error: %s", error->message); return 1; } g_message("Got bus connection"); return 0; } /* ex:set ts=4 et: */ I double checked and it is definitely that commit: ea586b47a306a1e9b5edd6fc94eb872342a87024 <- is ok 88ad0dab214799f17f0ddc463d10f44c00587dbf <- not ok To reproduce the issue I built glib with mock and jhbuild inside of mock. For this I used epel-6-x86_64. Let me know if you need more info.
I can’t reproduce this with GLib master or 2.54.2. You said that removing the g_steal_pointer() change from that commit fixes the issue for you — could you please run the test under gdb, and step through the last few lines of g_dbus_connection_send_message_with_reply_unlocked() to try and work out why that single change is causing/fixing the failure?
This is the bt of the hang, tomorrow I will have a look at stepping into that method: (gdb) t a a bt
+ Trace 238264
I really do not get it, this is the steps when going with gdb on teh buggy code: Breakpoint 1, g_dbus_connection_send_message_with_reply_unlocked (connection=0x617050, message=0x60fcf0, flags=(unknown: 2147483648), timeout_msec=-1, out_serial=0x0, cancellable=0x0, callback=0x7ffff7b035a0 <send_message_with_reply_sync_cb>, user_data=0x7fffffffe1f0) at gdbusconnection.c:1895 warning: Source file is more recent than executable. 1895 { Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7_4.2.x86_64 libselinux-2.5-11.el7.x86_64 pcre-8.32-17.el7.x86_64 (gdb) n 1901 if (out_serial == NULL) (gdb) 1895 { (gdb) 1907 data = g_slice_new0 (SendMessageData); (gdb) 1895 { (gdb) 1901 if (out_serial == NULL) (gdb) 1904 if (timeout_msec == -1) (gdb) 1895 { (gdb) 1898 GError *error = NULL; (gdb) 1907 data = g_slice_new0 (SendMessageData); (gdb) 1908 task = g_task_new (connection, cancellable, callback, user_data); (gdb) 1907 data = g_slice_new0 (SendMessageData); (gdb) 1908 task = g_task_new (connection, cancellable, callback, user_data); (gdb) 1909 g_task_set_source_tag (task, (gdb) 1908 task = g_task_new (connection, cancellable, callback, user_data); (gdb) 1909 g_task_set_source_tag (task, (gdb) 1911 g_task_set_task_data (task, data, (GDestroyNotify) send_message_data_free); (gdb) 1913 if (g_task_return_error_if_cancelled (task)) (gdb) 1919 if (!g_dbus_connection_send_message_unlocked (connection, message, flags, out_serial, &error)) (gdb) 1925 data->serial = *out_serial; (gdb) 1927 if (cancellable != NULL) (gdb) 1925 data->serial = *out_serial; (gdb) 1927 if (cancellable != NULL) (gdb) 1935 if (timeout_msec != G_MAXINT) (gdb) 1937 data->timeout_source = g_timeout_source_new (timeout_msec); (gdb) 1938 g_task_attach_source (task, data->timeout_source, (gdb) 1937 data->timeout_source = g_timeout_source_new (timeout_msec); (gdb) 1938 g_task_attach_source (task, data->timeout_source, (gdb) 1940 g_source_unref (data->timeout_source); (gdb) 1944 GUINT_TO_POINTER (*out_serial), (gdb) 1943 g_hash_table_insert (connection->map_method_serial_to_task, (gdb) 1945 g_steal_pointer (&task)); (gdb) 1943 g_hash_table_insert (connection->map_method_serial_to_task, (gdb) 1946 } (gdb) g_dbus_connection_send_message_with_reply (connection=0x617050, message=0x60fcf0, flags=(unknown: 2147483648), timeout_msec=-1, out_serial=0x0, cancellable=0x0, callback=0x7ffff7b035a0 <send_message_with_reply_sync_cb>, user_data=0x7fffffffe1f0) at gdbusconnection.c:2016 2016 CONNECTION_UNLOCK (connection); (gdb) 2017 } (gdb) 2016 CONNECTION_UNLOCK (connection); (gdb) 0x00007ffff7a70c78 in g_mutex_unlock@plt () from bins/lib/libgio-2.0.so.0 (gdb) Single stepping until exit from function g_mutex_unlock@plt, which has no line number information. g_mutex_unlock (mutex=0x617068) at gthread-posix.c:230 warning: Source file is more recent than executable. 230 { (gdb) 233 if G_UNLIKELY ((status = pthread_mutex_unlock (g_mutex_get_impl (mutex))) != 0) (gdb) 235 } (gdb) g_dbus_connection_send_message_with_reply_sync (connection=0x617050, message=0x60fcf0, flags=(unknown: 2147483648), timeout_msec=-1, out_serial=0x0, cancellable=0x0, error=0x7fffffffe278) at gdbusconnection.c:2151 2151 g_main_loop_run (data.loop); (gdb) 2152 reply = g_dbus_connection_send_message_with_reply_finish (connection, (gdb) 2156 g_main_context_pop_thread_default (data.context); (gdb) 2152 reply = g_dbus_connection_send_message_with_reply_finish (connection, (gdb) 2156 g_main_context_pop_thread_default (data.context); (gdb) 2158 g_main_context_unref (data.context); (gdb) 2159 g_main_loop_unref (data.loop); (gdb) 2160 if (data.res) (gdb) 2161 g_object_unref (data.res); (gdb) 2164 } (gdb) g_dbus_connection_call_sync_internal (connection=<optimized out>, bus_name=0x7ffff7b7f67b "org.freedesktop.DBus", object_path=0x7ffff7b7f690 "/org/freedesktop/DBus", interface_name=0x7ffff7b7f67b "org.freedesktop.DBus", method_name=0x7ffff7b860cb "Hello", parameters=0x0, reply_type=0x7ffff7b765eb, flags=(unknown: 2147483648), timeout_msec=-1, fd_list=0x0, out_fd_list=0x0, cancellable=0x0, error=0x6170b8) at gdbusconnection.c:5958 5958 if (G_UNLIKELY (_g_dbus_debug_call ())) (gdb) 5979 if (reply == NULL) (gdb) 5981 if (error != NULL) (gdb) 5982 *error = local_error;
And this ones the steps when removing the g_steal_pointer: Breakpoint 1, g_dbus_connection_send_message_with_reply_unlocked (connection=0x617050, message=0x60fcf0, flags=(unknown: 2147483648), timeout_msec=-1, out_serial=0x0, cancellable=0x0, callback=0x7ffff7b035a0 <send_message_with_reply_sync_cb>, user_data=0x7fffffffe1f0) at gdbusconnection.c:1895 warning: Source file is more recent than executable. 1895 { Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7_4.2.x86_64 libselinux-2.5-11.el7.x86_64 pcre-8.32-17.el7.x86_64 (gdb) n 1901 if (out_serial == NULL) (gdb) 1895 { (gdb) 1907 data = g_slice_new0 (SendMessageData); (gdb) 1895 { (gdb) 1901 if (out_serial == NULL) (gdb) 1904 if (timeout_msec == -1) (gdb) 1895 { (gdb) 1898 GError *error = NULL; (gdb) 1907 data = g_slice_new0 (SendMessageData); (gdb) 1908 task = g_task_new (connection, cancellable, callback, user_data); (gdb) 1907 data = g_slice_new0 (SendMessageData); (gdb) 1908 task = g_task_new (connection, cancellable, callback, user_data); (gdb) 1909 g_task_set_source_tag (task, (gdb) 1908 task = g_task_new (connection, cancellable, callback, user_data); (gdb) 1909 g_task_set_source_tag (task, (gdb) 1911 g_task_set_task_data (task, data, (GDestroyNotify) send_message_data_free); (gdb) 1913 if (g_task_return_error_if_cancelled (task)) (gdb) 1919 if (!g_dbus_connection_send_message_unlocked (connection, message, flags, out_serial, &error)) (gdb) 1925 data->serial = *out_serial; (gdb) 1927 if (cancellable != NULL) (gdb) 1925 data->serial = *out_serial; (gdb) 1927 if (cancellable != NULL) (gdb) 1935 if (timeout_msec != G_MAXINT) (gdb) 1937 data->timeout_source = g_timeout_source_new (timeout_msec); (gdb) 1938 g_task_attach_source (task, data->timeout_source, (gdb) 1937 data->timeout_source = g_timeout_source_new (timeout_msec); (gdb) 1938 g_task_attach_source (task, data->timeout_source, (gdb) 1940 g_source_unref (data->timeout_source); (gdb) 1944 GUINT_TO_POINTER (*out_serial), (gdb) 1943 g_hash_table_insert (connection->map_method_serial_to_task, (gdb) 1946 } (gdb) g_dbus_connection_send_message_with_reply (connection=0x617050, message=0x60fcf0, flags=(unknown: 2147483648), timeout_msec=-1, out_serial=0x0, cancellable=0x0, callback=0x7ffff7b035a0 <send_message_with_reply_sync_cb>, user_data=0x7fffffffe1f0) at gdbusconnection.c:2016 2016 CONNECTION_UNLOCK (connection); (gdb) 2017 } (gdb) 2016 CONNECTION_UNLOCK (connection); (gdb) 0x00007ffff7a70c78 in g_mutex_unlock@plt () from bins/lib/libgio-2.0.so.0 (gdb) Single stepping until exit from function g_mutex_unlock@plt, which has no line number information. g_mutex_unlock (mutex=0x617068) at gthread-posix.c:230 230 { (gdb) 233 if G_UNLIKELY ((status = pthread_mutex_unlock (g_mutex_get_impl (mutex))) != 0) (gdb) 235 } (gdb) g_dbus_connection_send_message_with_reply_sync (connection=0x617050, message=0x60fcf0, flags=(unknown: 2147483648), timeout_msec=-1, out_serial=0x0, cancellable=0x0, error=0x7fffffffe278) at gdbusconnection.c:2151 2151 g_main_loop_run (data.loop); (gdb) 2152 reply = g_dbus_connection_send_message_with_reply_finish (connection, (gdb) 2156 g_main_context_pop_thread_default (data.context); (gdb) 2152 reply = g_dbus_connection_send_message_with_reply_finish (connection, (gdb) 2156 g_main_context_pop_thread_default (data.context); (gdb) 2158 g_main_context_unref (data.context); (gdb) 2159 g_main_loop_unref (data.loop); (gdb) 2160 if (data.res) (gdb) 2161 g_object_unref (data.res); (gdb) 2164 } (gdb) g_dbus_connection_call_sync_internal (connection=<optimized out>, bus_name=0x7ffff7b7f65b "org.freedesktop.DBus", object_path=0x7ffff7b7f670 "/org/freedesktop/DBus", interface_name=0x7ffff7b7f65b "org.freedesktop.DBus", method_name=0x7ffff7b860ab "Hello", parameters=0x0, reply_type=0x7ffff7b765cb, flags=(unknown: 2147483648), timeout_msec=-1, fd_list=0x0, out_fd_list=0x0, cancellable=0x0, error=0x6170b8) at gdbusconnection.c:5958 5958 if (G_UNLIKELY (_g_dbus_debug_call ())) (gdb) 5979 if (reply == NULL) (gdb) 5988 result = decode_method_reply (reply, method_name, reply_type, out_fd_list, error); (gdb) 5991 if (message != NULL) (gdb) 5992 g_object_unref (message); (gdb) 5993 if (reply != NULL) (gdb) 5994 g_object_unref (reply); (gdb) 5997 } (gdb) g_dbus_connection_call_sync (connection=<optimized out>, bus_name=<optimized out>, object_path=<optimized out>, interface_name=<optimized out>, method_name=<optimized out>, parameters=<optimized out>, reply_type=0x7ffff7b765cb, flags=(unknown: 2147483648), timeout_msec=-1, cancellable=0x0, error=0x6170b8) at gdbusconnection.c:6178 6178 } (gdb) initable_init (initable=<optimized out>, cancellable=<optimized out>, error=0x7fffffffe3a8) at gdbusconnection.c:2612 2612 if (hello_result == NULL) (gdb) 2601 hello_result = g_dbus_connection_call_sync (connection, (gdb) 2612 if (hello_result == NULL) (gdb) 2615 g_variant_get (hello_result, "(s)", &connection->bus_unique_name); (gdb) 2616 g_variant_unref (hello_result); (gdb) 2540 g_assert (connection->guid == NULL); (gdb) 2628 g_atomic_int_or (&connection->atomic_flags, FLAG_INITIALIZED); (gdb) 2629 g_mutex_unlock (&connection->init_lock); (gdb) 2632 } (gdb) g_bus_get_sync (bus_type=<optimized out>, cancellable=0x60c610, error=0x7fffffffe3a8) at gdbusconnection.c:7284 7284 } (gdb) 0x00000000004006c9 in main ()
So I went ahead and changed g_steal_pointer and added directly the code: --- a/gio/gdbusconnection.c +++ b/gio/gdbusconnection.c @@ -1897,6 +1897,8 @@ g_dbus_connection_send_message_with_reply_unlocked (GDBusConnection *connect SendMessageData *data; GError *error = NULL; volatile guint32 serial; + gpointer *ptr; + gpointer ref; if (out_serial == NULL) out_serial = &serial; @@ -1940,9 +1942,12 @@ g_dbus_connection_send_message_with_reply_unlocked (GDBusConnection *connect g_source_unref (data->timeout_source); } + ptr = (gpointer *)&task; + ref = *ptr; *ptr = NULL; + g_hash_table_insert (connection->map_method_serial_to_task, GUINT_TO_POINTER (*out_serial), - g_steal_pointer (&task)); + ref); } This breaks as using g_steal_pointer. Instead if we do: --- a/gio/gdbusconnection.c +++ b/gio/gdbusconnection.c @@ -1897,6 +1897,9 @@ g_dbus_connection_send_message_with_reply_unlocked (GDBusConnection *connect SendMessageData *data; GError *error = NULL; volatile guint32 serial; + gpointer *ptr; + gpointer ref; + gpointer t; if (out_serial == NULL) out_serial = &serial; @@ -1940,9 +1943,13 @@ g_dbus_connection_send_message_with_reply_unlocked (GDBusConnection *connect g_source_unref (data->timeout_source); } + t = (gpointer)task; + ptr = (gpointer *)&t; + ref = *ptr; *ptr = NULL; + g_hash_table_insert (connection->map_method_serial_to_task, GUINT_TO_POINTER (*out_serial), - g_steal_pointer (&task)); + ref); } It works as expected.
I wonder why g_steal_pointer take a gpointer instead of a gpointer *.... that seems to break the strict-aliasing rules which seems to be the problem here.
Only a brief drive by comment but if that's true the fix is to -fno-strict-aliasing, like is done in the Linux kernel, systemd, ostree, and tons of other projects.
(In reply to Colin Walters from comment #7) > Only a brief drive by comment but if that's true the fix is to > -fno-strict-aliasing, like is done in the Linux kernel, systemd, ostree, and > tons of other projects. indeed, that was going to be my next test today
Passing -fno-strict-aliasing seems to do the job. Sorry for the noise here
nacho, thanks for debugging this. I’m glad you worked out what the problem was! Colin, do you think GLib should be compiled with -fno-strict-aliasing -Wstrict-aliasing=2 by default? That’s what systemd does. OSTree doesn’t. At the least perhaps we should compile with -Wstrict-aliasing=2.
Hmm I swear I'd added -fno-strict-aliasing to ostree. Looking at that now. A quick glance shows that e.g. NetworkManager also uses it though. Let's do it for glib? As far as using -Wstrict-aliasing=2... well, my general rule of thumb is systemd is usually a good reference, so if they do it we probably should too absent a reason not to?
https://github.com/ostreedev/ostree/pull/1384
(In reply to Colin Walters from comment #11) > Let's do it for glib? Patch most definitely welcome. > As far as using -Wstrict-aliasing=2... well, my general rule of thumb is > systemd is usually a good reference, so if they do it we probably should too > absent a reason not to? Yup. When I compiled with CFLAGS=-Wstrict-aliasing=2, there were a lot of warnings from GLib (too many to fix in a drive-by). I don’t know if they get squashed if -fno-strict-aliasing is also in CFLAGS (I didn’t try); but if not, we might want to just add -fno-strict-aliasing for now, and open a follow-up bug about fixing the -Wstrict-aliasing=2 warnings.
Re: strict aliasing rules, we had a thread on gtk-devel-list: https://mail.gnome.org/archives/gtk-devel-list/2017-April/msg00031.html
Created attachment 365848 [details] [review] Fix various strict aliasing problems with sockaddr Fix various strict aliasing problems caused by casting between (struct sockaddr *) and (struct sockaddr_storage *): the correct code here is to keep the two in a union. Signed-off-by: Philip Withnall <withnall@endlessm.com>
Created attachment 365849 [details] [review] glib: Fix strict-aliasing warnings with g_clear_pointer() gpointer* cannot be aliased with arbitrary types. In order to fix -Wstrict-aliasing=2 warnings with the g_clear_pointer() macro, we need to cast through char*, which is allowed to alias with anything. Even if we don’t make GLib strict-aliasing safe, it’s important to ensure this macro is safe, since it could be used from projects which do compile with -fstrict-aliasing. Signed-off-by: Philip Withnall <withnall@endlessm.com>
Created attachment 365850 [details] [review] build: Enable -fno-strict-aliasing GLib makes various assumptions about aliasing throughout its codebase, and compiling with -fstrict-aliasing has been demonstrated to cause problems (for example, bug #791622). Explicitly disable strict aliasing as a result. Signed-off-by: Philip Withnall <withnall@endlessm.com>
Here’s some patches which should fix this. Technically, only attachment #365850 [details] (build system changes) is needed. Attachment #365849 [details] (g_steal_pointer()) is fairly important to keep GLib useful with projects which depend on it and compile with -fstrict-aliasing. Attachment #365848 [details] is unnecessary and just for fun, to eliminate some of the easier warnings I saw when compiling with -Wstrict-aliasing=2 to see what would happen.
Review of attachment 365848 [details] [review]: Minor coding style nitpicks, but this looks okay to me ::: gio/gsocket.c @@ +423,3 @@ { + union + { Weird coding style… ::: glib/gmessages.c @@ +2179,1 @@ "/run/systemd/journal/"); This can probably go on a single line, now
Review of attachment 365849 [details] [review]: This is dark magic and I'd have to consult the Necronomicon to double check that it's defined behaviour everywhere; from my best recollection of the Accursed Tomes, this looks good.
Review of attachment 365850 [details] [review]: Okay. I honestly would also drop a line — either in the "building GLib" documentation, or on the toolchain requirements wiki page — that we strongly recommend not to enable strict aliasing when building GLib, in case people try to inject their own compiler flags on top of ours.
Pushed those three with some coding style changes to the first one. I will do a follow up to change the documentation as per comment #21 now. Attachment 365848 [details] pushed as d8fe926 - Fix various strict aliasing problems with sockaddr Attachment 365849 [details] pushed as 97d24b9 - glib: Fix strict-aliasing warnings with g_clear_pointer() Attachment 365850 [details] pushed as ade324f - build: Enable -fno-strict-aliasing
I pushed commit e42ff0105d19be382f6c4e460c46ac9e18f92b25 (without review) to add to the documentation. Thanks for the reviews and for looking into the Dark Magic.
This triggered a dim memory - can we use G_GNUC_MAY_ALIAS here? Hmm...looks like clang doesn't implement it.
(In reply to Colin Walters from comment #24) > This triggered a dim memory - can we use G_GNUC_MAY_ALIAS here? Hmm...looks > like clang doesn't implement it. What do you mean by ‘here’? I’d rather we just fixed the aliasing problems, if we’re going to move in the direction of making GLib strict-aliasing-safe. (I’m not suggesting we _do_ make GLib strict-aliasing-safe though, since that seems a lot of work for no real reward.)