GNOME Bugzilla – Bug 764823
gnome-builder sometimes pegs a core, even when closing the window, with "Bad file descriptor" warnings in the logs
Last modified: 2016-05-06 10:50:46 UTC
Sometimes when editing python code and then just leaving builder open for a while and not touching it, it pegs a cpu core, 100% cpu useage. on the system log, I can see one line that looks like this Apr 09 21:08:17 weatherwax org.gnome.Builder[1585]: sys:1: Warning: g_object_weak_unref: couldn't find weak ref 0x7f4df50a59d0(0x55f7c5512000) And then an endless amount of lines that look like this: Apr 09 21:08:18 weatherwax org.gnome.Builder[1585]: 21:08:18.0077 GLib-GIO[4617]: WARNING: fail: Error accepting connection: Bad file descriptor Apr 09 21:08:18 weatherwax org.gnome.Builder[1585]: 21:08:18.0078 GLib-GIO[4617]: WARNING: fail: Error accepting connection: Bad file descriptor Apr 09 21:08:18 weatherwax org.gnome.Builder[1585]: 21:08:18.0078 GLib-GIO[4617]: WARNING: fail: Error accepting connection: Bad file descriptor Apr 09 21:08:18 weatherwax org.gnome.Builder[1585]: 21:08:18.0078 GLib-GIO[4617]: WARNING: fail: Error accepting connection: Bad file descriptor Apr 09 21:08:18 weatherwax org.gnome.Builder[1585]: 21:08:18.0078 GLib-GIO[4617]: WARNING: fail: Error accepting connection: Bad file descriptor Apr 09 21:08:18 weatherwax org.gnome.Builder[1585]: 21:08:18.0078 GLib-GIO[4617]: WARNING: fail: Error accepting connection: Bad file descriptor which flood the log in a very high rate (repeating many times per second), and also trigger lines like this one: Apr 09 21:13:18 weatherwax systemd-journald[545]: Suppressed 2017046 messages from /user.slice/user-1000.slice I'm not sure if the high CPU usage starts before or after I close the window, but I'm sure it's also noticable after closing the window (I expected builder to quit when the window is closed) Here's a backtrace:
+ Trace 236171
Thread 1 (Thread 0x7f4dfa59da40 (LWP 4617))
Both gnome-builder and gnome-builder-worker --type=jedi_plugin are still running, even though the window is closed.
Eek, is this really still on 3.16.x? The codebase has changed so much since then, we hadn't moved Jedi out of process yet either, no?
No, this is on 3.20, I forgot to set the version field when reporting the bug.
I fixed an issue we had previously that was causing unnecessary spin re-calculating CSS information (due to a stupid error on my part). Any chance you can build 3.20.2 from source? (Or maybe it will land in repos soon?)
The problem still exists in 3.20.2. The high CPU usage only seems to start *after* closing the window. strace on the gnome-builder process shows recvmsg(4, 0x7ffe4d1136a0, 0) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 4, 0) = 2 ([{fd=0, revents=POLLNVAL}, {fd=3, revents=POLLIN}]) read(3, "\2\0\0\0\0\0\0\0", 16) = 8 write(3, "\1\0\0\0\0\0\0\0", 8) = 8 accept(0, NULL, NULL) = -1 EBADF (Bad file descriptor) stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2265, ...}) = 0 gettid() = 22625 write(1, "11:48:03.0750 "..., 118) = 118 write(3, "\1\0\0\0\0\0\0\0", 8) = 8 recvmsg(4, 0x7ffe4d1136a0, 0) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 4, 0) = 2 ([{fd=0, revents=POLLNVAL}, {fd=3, revents=POLLIN}]) read(3, "\2\0\0\0\0\0\0\0", 16) = 8 write(3, "\1\0\0\0\0\0\0\0", 8) = 8 accept(0, NULL, NULL) = -1 EBADF (Bad file descriptor) repeating ad infinitum.
I'm having a hell of a time replicating this. I'm just opening an a simple project I created with a few text files, exercising the python completion a bit, and closing. Any suggestions to help me replicate?
Okay, I think I found out how to trigger this issue: 1. Edit a python file (that is in a git repository) in Builder. Make a change, no matter how small. 2. Switch to the terminal, and commit that change to git. 3. Switch back to Builder, and close the window. The process immediately jumps to 100% CPU usage once I close the window.
Hrmm, that seems to work fine here on git master. The project is just a "directory project" right? (Not autotools)?
correct, no autotools. I can't reliably reproduce this right now either, I tried with a bunch of different projects, even in the same one in which I reproduced this issue earlier today. I'm not sure what's going on here. Maybe it'll help if I'll run a profiler next time this issue occurs? I assume it would at least give a clue on what exactly Builder is doing (apart from writing error messages to the log in an extremely fast rate) when the issue occurs.
Looking at the accept(0,...), and the error messages you provide, it looks to me that we are doing a g_socket_accept() (possibly in GDBus) with an invalid socket id. I can't imagine a case where we'd do a valid accept() on STDIN_FILENO (0). The only place we do that I believe is in our IdeWorker (multi-process) stuff. I'll see if I can shake that down today.
Created attachment 327383 [details] [review] workers: try to more gracefully shutdown worker processes This still isn't "graceful" in that we don't notify the children processes to cleanly exit, but it does more gracefully cleanup the GDBusServer. Sometimes, it looks like the GDBusServer would get in a spinloop with accepting connections on a closed file-descriptor. Doing this forced cleanup should fix that.
I finally managed to reproduce this once I tested force closing the child processes. The loop was caused by GDBusServer accepting connections in an async spinloop. Now we manually stop the dbus server and then cleanup our children processes (by force killing them). Attachment 327383 [details] pushed as 671d29e - workers: try to more gracefully shutdown worker processes