After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 764823 - gnome-builder sometimes pegs a core, even when closing the window, with "Bad file descriptor" warnings in the logs
gnome-builder sometimes pegs a core, even when closing the window, with "Bad ...
Status: RESOLVED FIXED
Product: gnome-builder
Classification: Other
Component: general
3.20.x
Other Linux
: Normal normal
: ---
Assigned To: GNOME Builder Maintainers
GNOME Builder Maintainers
Depends on:
Blocks:
 
 
Reported: 2016-04-09 18:25 UTC by Elad Alfassa
Modified: 2016-05-06 10:50 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
workers: try to more gracefully shutdown worker processes (5.01 KB, patch)
2016-05-06 10:48 UTC, Christian Hergert
committed Details | Review

Description Elad Alfassa 2016-04-09 18:25:43 UTC
Sometimes when editing python code and then just leaving builder open for a while and not touching it, it pegs a cpu core, 100% cpu useage. on the system log, I can see one line that looks like this

Apr 09 21:08:17 weatherwax org.gnome.Builder[1585]: sys:1: Warning: g_object_weak_unref: couldn't find weak ref 0x7f4df50a59d0(0x55f7c5512000)


And then an endless amount of lines that look like this:

Apr 09 21:08:18 weatherwax org.gnome.Builder[1585]: 21:08:18.0077                        GLib-GIO[4617]:  WARNING: fail: Error accepting connection: Bad file descriptor
Apr 09 21:08:18 weatherwax org.gnome.Builder[1585]: 21:08:18.0078                        GLib-GIO[4617]:  WARNING: fail: Error accepting connection: Bad file descriptor
Apr 09 21:08:18 weatherwax org.gnome.Builder[1585]: 21:08:18.0078                        GLib-GIO[4617]:  WARNING: fail: Error accepting connection: Bad file descriptor
Apr 09 21:08:18 weatherwax org.gnome.Builder[1585]: 21:08:18.0078                        GLib-GIO[4617]:  WARNING: fail: Error accepting connection: Bad file descriptor
Apr 09 21:08:18 weatherwax org.gnome.Builder[1585]: 21:08:18.0078                        GLib-GIO[4617]:  WARNING: fail: Error accepting connection: Bad file descriptor
Apr 09 21:08:18 weatherwax org.gnome.Builder[1585]: 21:08:18.0078                        GLib-GIO[4617]:  WARNING: fail: Error accepting connection: Bad file descriptor

which flood the log in a very high rate (repeating many times per second), and also trigger lines like this one:
Apr 09 21:13:18 weatherwax systemd-journald[545]: Suppressed 2017046 messages from /user.slice/user-1000.slice

I'm not sure if the high CPU usage starts before or after I close the window, but I'm sure it's also noticable after closing the window (I expected builder to quit when the window is closed)

Here's a backtrace:

Thread 1 (Thread 0x7f4dfa59da40 (LWP 4617))

  • #0 _int_free
  • #1 free
  • #2 g_free
  • #3 g_closure_unref
  • #4 g_source_destroy_internal
  • #5 g_child_source_remove_internal
  • #6 g_source_destroy_internal
  • #7 free_sources
  • #8 g_task_finalize
  • #9 g_object_unref
  • #10 accept_ready
  • #11 socket_source_dispatch
  • #12 g_main_context_dispatch
  • #13 g_main_context_iterate.isra
  • #14 g_main_context_iteration
  • #15 g_application_run
  • #16 main
  • #0 g_pointer_bit_lock
  • #1 g_datalist_id_dup_data
  • #2 g_socket_listener_accept_socket_finish
  • #3 g_socket_listener_accept_finish
  • #4 g_socket_service_ready
  • #5 g_task_return_now
  • #6 g_task_return
  • #7 accept_ready
  • #8 socket_source_dispatch
  • #9 g_main_context_dispatch
  • #10 g_main_context_iterate.isra
  • #11 g_main_context_iteration
  • #12 g_application_run
  • #13 main


Both gnome-builder and gnome-builder-worker --type=jedi_plugin are still running, even though the window is closed.
Comment 1 Christian Hergert 2016-04-29 03:06:23 UTC
Eek, is this really still on 3.16.x?

The codebase has changed so much since then, we hadn't moved Jedi out of process yet either, no?
Comment 2 Elad Alfassa 2016-04-29 07:28:27 UTC
No, this is on 3.20, I forgot to set the version field when reporting the bug.
Comment 3 Christian Hergert 2016-04-29 08:17:14 UTC
I fixed an issue we had previously that was causing unnecessary spin re-calculating CSS information (due to a stupid error on my part).

Any chance you can build 3.20.2 from source? (Or maybe it will land in repos soon?)
Comment 4 Elad Alfassa 2016-04-29 08:53:04 UTC
The problem still exists in 3.20.2. The high CPU usage only seems to start *after* closing the window.

strace on the gnome-builder process shows

recvmsg(4, 0x7ffe4d1136a0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 4, 0) = 2 ([{fd=0, revents=POLLNVAL}, {fd=3, revents=POLLIN}])
read(3, "\2\0\0\0\0\0\0\0", 16)         = 8
write(3, "\1\0\0\0\0\0\0\0", 8)         = 8
accept(0, NULL, NULL)                   = -1 EBADF (Bad file descriptor)
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2265, ...}) = 0
gettid()                                = 22625
write(1, "11:48:03.0750                   "..., 118) = 118
write(3, "\1\0\0\0\0\0\0\0", 8)         = 8
recvmsg(4, 0x7ffe4d1136a0, 0)           = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 4, 0) = 2 ([{fd=0, revents=POLLNVAL}, {fd=3, revents=POLLIN}])
read(3, "\2\0\0\0\0\0\0\0", 16)         = 8
write(3, "\1\0\0\0\0\0\0\0", 8)         = 8
accept(0, NULL, NULL)                   = -1 EBADF (Bad file descriptor)

repeating ad infinitum.
Comment 5 Christian Hergert 2016-05-03 16:02:47 UTC
I'm having a hell of a time replicating this. I'm just opening an a simple project I created with a few text files, exercising the python completion a bit, and closing.

Any suggestions to help me replicate?
Comment 6 Elad Alfassa 2016-05-03 16:19:32 UTC
Okay, I think I found out how to trigger this issue:

1. Edit a python file (that is in a git repository) in Builder. Make a change, no matter how small.
2. Switch to the terminal, and commit that change to git.
3. Switch back to Builder, and close the window.

The process immediately jumps to 100% CPU usage once I close the window.
Comment 7 Christian Hergert 2016-05-03 16:27:30 UTC
Hrmm, that seems to work fine here on git master.

The project is just a "directory project" right? (Not autotools)?
Comment 8 Elad Alfassa 2016-05-03 17:08:35 UTC
correct, no autotools. I can't reliably reproduce this right now either, I tried with a bunch of different projects, even in the same one in which I reproduced this issue earlier today. I'm not sure what's going on here.

Maybe it'll help if I'll run a profiler next time this issue occurs? I assume it would at least give a clue on what exactly Builder is doing (apart from writing error messages to the log in an extremely fast rate) when the issue occurs.
Comment 9 Christian Hergert 2016-05-04 07:22:56 UTC
Looking at the accept(0,...), and the error messages you provide, it looks to me that we are doing a g_socket_accept() (possibly in GDBus) with an invalid socket id.

I can't imagine a case where we'd do a valid accept() on STDIN_FILENO (0).

The only place we do that I believe is in our IdeWorker (multi-process) stuff. I'll see if I can shake that down today.
Comment 10 Christian Hergert 2016-05-06 10:48:09 UTC
Created attachment 327383 [details] [review]
workers: try to more gracefully shutdown worker processes

This still isn't "graceful" in that we don't notify the children processes
to cleanly exit, but it does more gracefully cleanup the GDBusServer.

Sometimes, it looks like the GDBusServer would get in a spinloop with
accepting connections on a closed file-descriptor.

Doing this forced cleanup should fix that.
Comment 11 Christian Hergert 2016-05-06 10:50:41 UTC
I finally managed to reproduce this once I tested force
closing the child processes. The loop was caused by GDBusServer
accepting connections in an async spinloop.

Now we manually stop the dbus server and then cleanup
our children processes (by force killing them).

Attachment 327383 [details] pushed as 671d29e - workers: try to more gracefully shutdown worker processes