GNOME Bugzilla – Bug 783950
flatpak builder silently fails
Last modified: 2017-08-30 19:35:11 UTC
Hi I am having a problem with gnome-builder (2.24.2) + flatpak (0.9.5) on Arch Linux. Starting the build process for gnome-calendar with org.gnome.Calendar.json Profile with the org.gnome.Plaftorm:master will randomly fail at random times without any error message. Dependencies will build fine, the random fail will only appear once the main project starts to build. If repeated the process long enough i will be able to build the application. Running the application then trough builder (means it gets started with the flatpak runtime) will result in random crashes of the application without any sort useful crash message. Running gnome-builder -vvv gave me just a: "Ide[14523]: WARNING: Process quit unexpectedly". The "New Runtime Terminal" for the project crashes in the same way after a random time. It feels like there is a timer killing flatpak processes when they are started trough gnome-builder. I have the same problem on both my desktop and my notebook both running Arch Linux. gnome-photos project behaves the same and crashes the same. I did not have the problem less then a month ago (2017-05-24), so it could be caused by some updated dependency. I tried the following: * switch from master to 3.24 org.gnome.Platform * delete both .flatpak-builder ~/.cache/gnome-builder folders * delete and reinstall both gnome-builder and flatpak packages * rebuild the gnome-builder package myself with Arch Linux official PKGBUILD https://git.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=packages/gnome-builder Starting flatpak applications directly trough commandline and flatpak works fine. The flatpak plugin used trough the the stable flatpak version gnome-builder also works fine.
%s/(2.24.2)/(3.24.2)/
Our failure case is sort of non-ideal today. We could really use some sort of failure state widget at least. If a process dies immediately, it's often before it has output any meaningful information. So there is very little we can pass on to the user. In this cause, it's usually due to something incorrect in how we are launching the application or miss-configuration of the flatpak .json file.
I tried to pass -v (--verbose) to flatpak-builder/ flatpak build but the information it provided did not help with the problem. It feels like some type of watchdog or similar is killing the application. I tried strace, which shows that the process gets a SIGKILL. I then tried to use systemtap to check where the signal is coming from but I am lacking the proper knowledge to do anything with the information revealed. Adding screenshots of the pstree and systemtap while the application is running trough gnome-builder and its flatpack plugin.
Created attachment 354403 [details] pstree of the running application pre crash
Created attachment 354404 [details] "stap sigkill.stp gets very verbose" after application crash
Is there any chance you can try out Nightly and see if that fixes it? Another bug someone has seen could be related to our older bundled libflatpak not being able to read some info from the new flatpack installation files. http://builder.readthedocs.io/en/latest/installation.html#via-flatpak I've bumped our libflatpak to 0.9.6 for the stable branches, but those builds won't be available for a few hours.
Hi sorry for the long delay in response. I think I miscommunication the problem somehow. I built the 2_24 branch of gnome-builder with the changes you mentioned but it did not help with this problem. Running the flaptaked version of gnome-builder is working just fine. It is the distribution packaged gnome-builder where the flatpak plugin is acting up and makes it non usable. I was able to reproduce this with simplest project now both on arch linux and fedora workstation rawhide. So how to reproduce it: * Install Fedora Workstation Rawhide with Gnome Desktop. * Install gnome-builder trough gnome-software-center. * Start gnome-builder and clone a project with flatpak support e.g. gnome-calendar. * Wait till it is cloned. * Builder should now automatically install the gnome master runtime and sdk.-- * Wait till they are installed. * Hamburger Menu -> New Runtime Terminal * This will result in a runtime terminal that will crash sometimes within a second and sometimes only after several seconds. * Build project (as flatpak with runtime org.gnome.Platform master as is default in the gnome-calendar json) * All non main modules will build fine. * Main module build will fail at random moment in build process.
%s/(2_24)/(3_24)/ Sorry I don't know why I keep making that typo.
In Builder nightly, I can build things just fine if I change Calendar to use the 3.24 runtime (in Build Preferences). So I'm going out on a limb here and say that I think the error is coming from GLib in Flatpak for the nightly branch. The error I get, in particular, is that there is a failure in gdbus-codegen. I also have an issue running (it seems to auto-exit after about 30 seconds), but I think that is a bug in how we are launching apps (or in flatpak) because I can find the same behavior in our terminals...
This seems to be the same issue as https://bugzilla.gnome.org/show_bug.cgi?id=785898 which it *seems* to me is now fixed, but I didn't try building via builder. Can someone else test this?
This doesn't seem to be related to bug 785898 which I just tracked down. Some things to note: - this also effects our "in build runtime" terminals, so the terminal exits/disappears immediate the first time, and after 20 seconds or thereafter. - the above means it's not from a spurious cancellation in the build pipeline like i suspected previously. - it happens in jhbuild builds of Builder, so it's not due to running Builder itself from a Flatpak. - it doesn't effect running a host terminal from a jhbuild Builder. - it doesn't effect running a host terminal from a flatpak'd Builder. - it doesn't effect running a build terminal from a flatpak'd Builder.
Just to double check, I ran Builder under GDB with breakpoints on all our termination API and nothing hit. - g_subprocess_force_exit, ide_subprocess_force_exit - g_subprocess_send_signal, ide_subprocess_send_signal - kill() - signal() So I'm relatively certain we aren't responsible for killing the children processes.
After a tussle with systemtap, I got some information on the sigkill. SPID SNAME RPID RNAME SIGNUM SIGNAME 6512 bwrap 6513 gdb 9 SIGKILL So it is bwrap that is signaling our process. What isn't clear to me is if this is caused from prctl() supplied death signals or not. We didn't use to have this problem, so I'm wondering if it is related to changes in flatpak over the past few months.
I figured out the reason for this. gnome-builder spawns flakpak-builder which exec:s some things, eventually making the child of builder a bwrap --die-with-parent instance. die-with-parent means using PR_SET_PDEATHSIG to ensure the process dies when its parent (in this case gnome-builder) dies. Unfortunately due to how PDEATHSIG works this means when *any* thread in the gnome-builder process dies the bwrap instance will be killed. I'm fixing this by not making --die-with-parent being on by default, instead manually passed to flatpak build by flatpak-builder.
Fixed in https://github.com/flatpak/flatpak/commit/75d7e762762116905496a33ba029f1bb09037506
Okay, I'm going to mark this as resolved, even though we are waiting for a new flatpak/flatpak-builder release.