After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 783950 - flatpak builder silently fails
flatpak builder silently fails
Status: RESOLVED FIXED
Product: gnome-builder
Classification: Other
Component: general
3.24.x
Other Linux
: Normal normal
: ---
Assigned To: GNOME Builder Maintainers
GNOME Builder Maintainers
Depends on:
Blocks:
 
 
Reported: 2017-06-19 08:29 UTC by Gabriel Rauter
Modified: 2017-08-30 19:35 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
pstree of the running application pre crash (246.95 KB, image/png)
2017-06-24 18:11 UTC, Gabriel Rauter
Details
"stap sigkill.stp gets very verbose" after application crash (18.37 KB, image/png)
2017-06-24 18:13 UTC, Gabriel Rauter
Details

Description Gabriel Rauter 2017-06-19 08:29:37 UTC
Hi I am having a problem with gnome-builder (2.24.2) + flatpak (0.9.5) on Arch Linux.

Starting the build process for gnome-calendar with org.gnome.Calendar.json Profile with the org.gnome.Plaftorm:master will randomly fail at random times without any error message.

Dependencies will build fine, the random fail will only appear once the main project starts to build.

If repeated the process long enough i will be able to build the application.

Running the application then trough builder (means it gets started with the flatpak runtime) will result in random crashes of the application without any sort useful crash message.
Running gnome-builder -vvv gave me just a:
"Ide[14523]:  WARNING: Process quit unexpectedly".

The "New Runtime Terminal" for the project crashes in the same way after a random time. It feels like there is a timer killing flatpak processes when they are started trough gnome-builder.

I have the same problem on both my desktop and my notebook both running Arch Linux. gnome-photos project behaves the same and crashes the same.

I did not have the problem less then a month ago (2017-05-24), so it could be caused by some updated dependency.

I tried the following:
 * switch from master to 3.24 org.gnome.Platform
 * delete both .flatpak-builder  ~/.cache/gnome-builder folders
 * delete and reinstall both gnome-builder and flatpak packages
 * rebuild the gnome-builder package myself with Arch Linux official PKGBUILD
https://git.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=packages/gnome-builder

Starting flatpak applications directly trough commandline and flatpak works fine. The flatpak plugin used trough the the stable flatpak version gnome-builder also works fine.
Comment 1 Gabriel Rauter 2017-06-19 13:31:56 UTC
%s/(2.24.2)/(3.24.2)/
Comment 2 Christian Hergert 2017-06-19 21:13:36 UTC
Our failure case is sort of non-ideal today. We could really use some sort of failure state widget at least.

If a process dies immediately, it's often before it has output any meaningful information. So there is very little we can pass on to the user.

In this cause, it's usually due to something incorrect in how we are launching the application or miss-configuration of the flatpak .json file.
Comment 3 Gabriel Rauter 2017-06-24 18:10:08 UTC
I tried to pass -v (--verbose) to flatpak-builder/ flatpak build but the information it provided did not help with the problem.

It feels like some type of watchdog or similar is killing the application.
I tried strace, which shows that the process gets a SIGKILL.
I then tried to use systemtap to check where the signal is coming from but I am lacking the proper knowledge to do anything with the information revealed.

Adding screenshots of the pstree and systemtap while the application is running trough gnome-builder and its flatpack plugin.
Comment 4 Gabriel Rauter 2017-06-24 18:11:29 UTC
Created attachment 354403 [details]
pstree of the running application pre crash
Comment 5 Gabriel Rauter 2017-06-24 18:13:35 UTC
Created attachment 354404 [details]
"stap sigkill.stp gets very verbose" after application crash
Comment 6 Christian Hergert 2017-06-25 06:35:09 UTC
Is there any chance you can try out Nightly and see if that fixes it? Another bug someone has seen could be related to our older bundled libflatpak not being able to read some info from the new flatpack installation files.

http://builder.readthedocs.io/en/latest/installation.html#via-flatpak

I've bumped our libflatpak to 0.9.6 for the stable branches, but those builds won't be available for a few hours.
Comment 7 Gabriel Rauter 2017-08-01 12:17:13 UTC
Hi sorry for the long delay in response.

I think I miscommunication the problem somehow. I built the 2_24 branch of gnome-builder with the changes you mentioned but it did not help with this problem.

Running the flaptaked version of gnome-builder is working just fine. It is the distribution packaged gnome-builder where the flatpak plugin is acting up and makes it non usable.

I was able to reproduce this with simplest project now both on arch linux and fedora workstation rawhide.

So how to reproduce it:

 * Install Fedora Workstation Rawhide with Gnome Desktop.
 * Install gnome-builder trough gnome-software-center.
 * Start gnome-builder and clone a project with flatpak support e.g. gnome-calendar.
 * Wait till it is cloned.
 * Builder should now automatically install the gnome master runtime and sdk.--
 * Wait till they are installed.
 * Hamburger Menu -> New Runtime Terminal
   * This will result in a runtime terminal that will crash sometimes within a second and sometimes only after several seconds.
 * Build project (as flatpak with runtime org.gnome.Platform master as is default in the gnome-calendar json)
   * All non main modules will build fine.
   * Main module build will fail at random moment in build process.
Comment 8 Gabriel Rauter 2017-08-01 12:17:59 UTC
%s/(2_24)/(3_24)/

Sorry I don't know why I keep making that typo.
Comment 9 Christian Hergert 2017-08-05 23:26:03 UTC
In Builder nightly, I can build things just fine if I change Calendar to use the 3.24 runtime (in Build Preferences). So I'm going out on a limb here and say that I think the error is coming from GLib in Flatpak for the nightly branch. The error I get, in particular, is that there is a failure in gdbus-codegen.

I also have an issue running (it seems to auto-exit after about 30 seconds), but I think that is a bug in how we are launching apps (or in flatpak) because I can find the same behavior in our terminals...
Comment 10 Alexander Larsson 2017-08-29 14:59:37 UTC
This seems to be the same issue as https://bugzilla.gnome.org/show_bug.cgi?id=785898 which it *seems* to me is now fixed, but I didn't try building via builder. Can someone else test this?
Comment 11 Christian Hergert 2017-08-29 21:08:33 UTC
This doesn't seem to be related to bug 785898 which I just tracked down.

Some things to note:

 - this also effects our "in build runtime" terminals, so the terminal
   exits/disappears immediate the first time, and after 20 seconds or
   thereafter.

 - the above means it's not from a spurious cancellation in the build
   pipeline like i suspected previously.

 - it happens in jhbuild builds of Builder, so it's not due to running
   Builder itself from a Flatpak.

 - it doesn't effect running a host terminal from a jhbuild Builder.

 - it doesn't effect running a host terminal from a flatpak'd Builder.

 - it doesn't effect running a build terminal from a flatpak'd Builder.
Comment 12 Christian Hergert 2017-08-29 21:38:38 UTC
Just to double check, I ran Builder under GDB with breakpoints on all our termination API and nothing hit.

 - g_subprocess_force_exit, ide_subprocess_force_exit
 - g_subprocess_send_signal, ide_subprocess_send_signal
 - kill()
 - signal()

So I'm relatively certain we aren't responsible for killing the children processes.
Comment 13 Christian Hergert 2017-08-30 00:49:42 UTC
After a tussle with systemtap, I got some information on the sigkill.

SPID     SNAME            RPID  RNAME            SIGNUM SIGNAME         
6512     bwrap            6513  gdb              9      SIGKILL         

So it is bwrap that is signaling our process. What isn't clear to me is if
this is caused from prctl() supplied death signals or not. We didn't use to
have this problem, so I'm wondering if it is related to changes in flatpak over
the past few months.
Comment 14 Alexander Larsson 2017-08-30 09:25:34 UTC
I figured out the reason for this.
gnome-builder spawns flakpak-builder which exec:s some things, eventually making the child of builder a bwrap --die-with-parent instance. die-with-parent means using PR_SET_PDEATHSIG to ensure the process dies when its parent (in this case gnome-builder) dies.

Unfortunately due to how PDEATHSIG works this means when *any* thread in the gnome-builder process dies the bwrap instance will be killed.

I'm fixing this by not making --die-with-parent being on by default, instead manually passed to flatpak build by flatpak-builder.
Comment 16 Christian Hergert 2017-08-30 19:35:11 UTC
Okay, I'm going to mark this as resolved, even though we are waiting for a new flatpak/flatpak-builder release.