GNOME Bugzilla – Bug 777519
rpath problem broke gnome-shell on Debian unstable, Ubuntu zesty
Last modified: 2017-08-11 17:45:52 UTC
Something changed among gnome-shell's build dependencies (maybe in gcc) so that gnome-shell stopped building from source in Debian and Ubuntu 17.04 Alpha. (The same package builds fine in Ubuntu 16.10). I tried working around this with LD_LIBRARY_PATH. That got the package to build; it just doesn't work now. See smcv's commentary on the second bug: https://bugs.debian.org/844796 https://bugs.debian.org/851925
This appears to be a behaviour change in the linker that makes libtool -rpath (aka gcc -Wl,-rpath, aka ld -rpath) generate DT_RUNPATH instead of DT_RPATH. Documentation says that should be fine - the only difference is precedence, with DT_RPATH > LD_LIBRARY_PATH > DT_RUNPATH - but in fact there is another difference: DT_RUNPATH isn't used to resolve transitive dependencies. It isn't clear to me whether this is deliberate or not. See https://bugs.launchpad.net/ubuntu/+source/eglibc/+bug/1253638 Workaround: link Shell with ld --disable-new-dtags (gcc -Wl,--disable-new-dtags) to make the linker put DT_RPATH in /usr/bin/gnome-shell, like it used to. I suspect we can make it work again without that workaround by adding -rpath in more places, either in Shell or in Mutter or both - but I haven't got that working yet. (No, I don't know why Debian is using snapshots of a not-yet-released binutils version either.)
Created attachment 355369 [details] [review] Don't create new dynamic tags for ELF binaries and shared objects I've encountered this problem today on the development branch of Endless OS too while we were updating several tooling-related packages and spent quite some time investigating the issue, but unfortunately I couldn't find a better way to deal with it than passing --disable-new-tags, as Debian already do. However, we still need that regardless of what's done in the packaging related files (e.g. debian/rules) so that GNOME Shell builds fine on our CI workflow, so I've spent some time testing a patch to the autotools files to make sure that not only built, but also run, and this is what I came up with in case upstream wants to integrate it as well. I've tested this patch building and running from sources both on the development (unreleased) version of Endless OS as well as on my Fedora 26 VM. Not a real fix, I know, but considering that the breakage in glibc has been there for a while, I thought this might be interesting. Let me know what you think.
Created attachment 355585 [details] [review] 1. Make sure mutter's private lib dir is added to shell libraries' runtime path I got an internal review of my previous patch in Endless by Dan Nicholson and he pointed out a better way to fix this in the shell that does not involve reverting to using the (deprecated) RPATH instead of RUNPATH, which is clean enough and fixes the problem at hand regardless of whether the linker in the system is using the old or new dynamic tags to set the runtime path. The idea is simple: rely on libtool's -R option, which will do the right thing both for executables and libraries, and add it to libgnome_shell_ldflags too, so that the runtime path is automatically added whenever anything tries to link against internal libraries like libgnome-shell-menu.so (as it's the case when linking the ShellMenu typelib library). I've tested this and fixes the issue reliably, so I think I'm now more confidently proposing this as a fix instead of a workaround :) Please review, thanks!
One more thing: As Dan stated in https://github.com/endlessm/gnome-shell/pull/100/commits/ce5054d4eb4c483e2a92cf2af6e01c1951fbac7f, it might be good to consider having mutter set this -R argument right in its .pc file, so that clients linking against mutter don't need to know about its private directory where libmutter-clutter|cogl.so are being installed, outside standard locations. I still think the currently proposed patch makes sense for the shell at least for now, but I'm adding Jonas to CC here to see what he thinks, as his is pretty much caused by mutter after all, so it makes sense to consider fixing it there.
Review of attachment 355585 [details] [review]: Linking and rpaths and what not is slightly magical to me, but what you write in the commit message makes sense, so this is fine by me. Wouldn't hurt with a second opinion with someone with better libtool expertise.
I filed bug784954 against mutter and attached a patch so that it transmits the -R option from the private library pkg-config files.
@Jonas Thanks for your review and being open to this. Unfortunately, I experienced a weird fallout today in Endless's version of the shell that made me think that, while this certainly got the shell building fine, it might not be ok from a runtime point of view, because I'm seeing issues when trying to run it now that I did not see before, resulting in the shell not starting up and throwing errors like this: ** (gnome-shell:2508): WARNING **: Failed to load shared library 'libeos-shell-fx.so' referenced by the typelib: libeos-shell-fx.so: cannot open shared object file: No such file or directory Note that libeos-shell-fx.so is an private library we use in Endless, so it could certainly be an error affecting us only. However, I'd feel more comfortable if we could hold on applying this patch upstream until I had double/triple checked that this won't cause a problem with the shell on master, for what I'm now building it from scratch in my Fedora VM. Sorry I did not catch this before, and thanks again for considering it. Hopefully we will find some sort of reasonable solution at some point, so that downstream workarounds like Debian's can be avoided.
Created attachment 355891 [details] [review] 2. main: Add $(pkglibdir) to GIR's list of search path for scanned libraries I think we finally figured this out: the missing piece in the puzzle was a call to g_irepository_prepend_library_path() so that the gobject introspection machinery in charge of loading the "wrapped" shared objects can find them in $(pkglibdir)/ regardless of the shell's binaries being linked with DT_RPATH or DT_RUNPATH. I'd be happy to squash the newly attached patch together with the other one if requested, but this whole thing has been so tricky (at least for me) that I think there's value in keeping them split in two. Note: so far I've tested this with Endless's fork of the shell (based on 3.22) but I'll update the status here as soon as I manage to build the shell from master in my VM, which I hope to be able to do tomorrow.
Created attachment 355932 [details] [review] 2. main: Add private libdirs to GIR's list of search paths for scanned libraries A small iteration over the previous version of this patch to also add MUTTER_TYPELIB_DIR to the list of "library paths", to be on the safe side.
Created attachment 356602 [details] [review] 1. Make sure mutter's private lib dir is added to shell libraries' runtime path There was a typo in the patch's commit message, this fixes that
Created attachment 356603 [details] [review] 2. main: Add private libdirs to GIR's list of search paths for scanned libraries There was another error in the patch's commit message, this fixes that
I finally got to test this patches in my Fedora 26 VM where I built the shell from master + all the deps in jhbuild, and it seems to build and run fine, so I'm finally officially asking for a review on these 2 patches (which I just updated). I realized that in the time being the master branch has moved to build using meson but for now I haven't touched those files because I was unsure of what the current status is for that. Let me know if you want me to prepare similar patches for the meson files, though, and I'll gladly work on them and test them too.
Mario, I had some issues trying to build the latest gnome-shell development release on Ubuntu with meson. Maybe an update of your patch for meson would help. Thanks!
@Jeremy: Sure, I think I can work on those patches then if that's useful, will try to take a look to it soon.
I've checked the meson.build files and it seems to me that there's no need to fix anything in there as all the relevant binaries are being linked with the right rpath in place and, things seem to build and work just fine here, even when explicitly passing --enable-new-dtags to the linker (which would make the original error to happen at build time). To be completely sure, I checked with objdump -Tx and I could see that, effectively, DT_RUNPATH was being used to link the binaries when I did the experiment mentioned above, and even being this the case the shell built AND run perfectly fine, so perhaps meson (which doesn't user libtool) is doing with install_rpath some kind of similar black magic to what libtool does when passing -R, which basically adds the libdir to dependency_libs, as mentioned in [1]. So, it looks like the first patch is actually needed to fix this problem in meson, as the shell builds and run fine when using meson, regardless of the linker being configured to use DT_RPATH or DT_RUNPATH. If anything, it would be interesting to merge it in a stable branch in case there's another stable release of the 3.24.x, so that distributions can stop working this around in the packaging scripts. As for the second patch, it's indeed still needed in master since otherwise the shell will miserably fail at runtime with the following error when the binaries are linked using DT_RUNPATH: ``` Jul 31 16:46:29 localhost.localdomain gnome-shell[14298]: (/src/mutter/src/backends/x11/meta-monitor-manager-xrandr.c:1785):meta_monitor_manager_xrandr_is_transform_handled: runtime check failed: (crtc->all_transforms & transform) Jul 31 16:46:30 localhost.localdomain gnome-shell[14298]: JS WARNING: [resource:///org/gnome/shell/ui/main.js 317]: reference to undefined property "MetaStage" Jul 31 16:46:30 localhost.localdomain gnome-shell[14298]: JS WARNING: [resource:///org/gnome/shell/ui/layout.js 217]: reference to undefined property "MetaWindowGroup" Jul 31 16:46:30 localhost.localdomain gnome-shell[14298]: JS WARNING: [resource:///org/gnome/shell/ui/osdMonitorLabeler.js 59]: reference to undefined property "MetaDBusDisplayConfigSkeleton" Jul 31 16:46:31 localhost.localdomain gnome-shell[14298]: JS ERROR: Error: Unsupported type void, deriving from fundamental void Jul 31 16:46:31 localhost.localdomain gnome-shell[14298]: Execution of main.js threw exception: JS_EvaluateScript() failed ``` So, in summary: * First patch is only needed for autotools, to be considered for the next stable 3.24.x release (if there's any) * Second patch is still needed for meson, or the shell will break at runtime when linked with DT_RUNPATH [1] https://www.gnu.org/software/libtool/manual/html_node/Link-mode.html
On a side note, we should replace at some point the use of link_args with build_rpath + install_rpath once meson 0.42 is released, as that will be the recommended way of specifying the rpath instead of using `-Wl,-rpath` in the future [1]. But that can be done later. [1] https://www.gnu.org/software/libtool/manual/html_node/Link-mode.html
Kind ping on this one. @Florian @Jonas is there anything else you might want to consider here, or that I might have missed?
Review of attachment 356602 [details] [review]: lgtm
Review of attachment 356603 [details] [review]: lgtm
@Jonas Thanks for the review, I've just pushed the second patch to master: https://git.gnome.org/browse/gnome-shell/commit/?id=1518a778ed84ceca81f0ec5bccf690816b958c3b @Florian: I've seen the autotools file got removed already so I haven't done anything wrt the first patch. Let me know if you want me to push it on top of the 3-24 branch (or just feel free to do it yourself). In any case, this should be enough not to need to pass --disable-new-dtags in the debian/rules for Debian/Ubuntu, so marking this bug as RESOLVED.
Mario, thank you for working on this issue, but I still can't use meson to build gnome-shell on Ubuntu with an issue that seems similar to me to this one. See https://bugzilla.gnome.org/786169