GNOME Bugzilla – Bug 554362
gedit uses %100 CPU when opening a file in /var/lib/dpkg/info/
Last modified: 2009-08-03 09:10:11 UTC
Please describe the problem: I found this bug on Ubuntu intrepid. You can see the original bug report here: https://bugs.launchpad.net/ubuntu/+source/gedit/+bug/276094 Steps to reproduce: 1. gedit /var/lib/dpkg/info/adduser.list 2. << force closure >> 3. run gedit without any arguments, and it happens again Actual results: gedit maxes the cpu, and as a result, runs very slowly. Expected results: Does this happen every time? yes Other information: this had been confirmed downstream. There is another quirk of this bug, that even after you close the instance of gedit that is causing a problem, the bug then lingers to subsequent instances. The main example is step 2; just try to run gedit again.
I forgot one important detail. Gedit must be closed for this bug to happen. If gedit is already open, and you open a file in /var/lib/dpkg/info/, then it works fine.
The bug is due to the filebrowser plugin, if you disable it the cpu usage goes back to normal. It happens when you launch gedit because in that case the filebrowser plugin goes to the file dir (and the next time you simply launch "gedit" it remembers that dir). We need to figure out what's wrong with that dir...
500 files == 4 second cpu burn 1000 files == 14 second cpu burn 2000 files == 25 second cpu burn 4000 files == 120 second cpu burn /var/lib/dpkg/info (6,200 files) == 200 second cpu burn So, the problem with the directory is the sheer number of files, and also it looks like the algorithm being used is non-linear. It might be that there is no bug in the algorithm per say, and that it needs all that cpu to organize all the files. I looked at what the filebrowser does, and I can, off the cuff, think of 2 possible solutions: a) defer the execution of the algorithm until the user attempts to actually browse the files. That way, normal operations can continue normally and there isn't much of a problem unless the user uses that feature directly. b) have the execution operate in a separate thread, with low priority. That way, one can use gedit without it being super slow.
Created attachment 119677 [details] [review] Change filebrowser store internals to use gsequence instead of gslist Well, I've analyzed this a bit, and the problem is the sorting of the files. Each file is currently insorted into the container using g_slist_insert_sorted, which is of course terribly slow. The patch changes the container to a gsequence which is a binary tree internally, so it is much faster. This fixes the problem of long waiting times, but not necessarily the responsiveness of gedit (loading 2000 files is still making gedit unresponsive here, which I think is unacceptable). I'm not sure how we could fix this, but we should at least be able to not start loading the store until it is actually visualized.
in model_resort_node the patch does: - dir->children = g_slist_sort (dir->children, - (GCompareFunc) (model->priv-> - sort_func)); + item = g_sequence_search (dir->children, node, sequence_compare_nodes, model); but as far as I can tell, this does not actually resort the item
Also, pressing the "Up" button just after gedit is started causes a crash with the following assertions triggered sys:1: Warning: g_sequence_get_begin_iter: assertion `seq != NULL' failed sys:1: Warning: g_sequence_iter_is_end: assertion `iter != NULL' failed sys:1: Warning: g_sequence_get: assertion `iter != NULL' failed sys:1: Warning: g_sequence_iter_next: assertion `iter != NULL' failed
(In reply to comment #5) > in model_resort_node the patch does: > > - dir->children = g_slist_sort (dir->children, > - (GCompareFunc) (model->priv-> > - sort_func)); > + item = g_sequence_search (dir->children, node, > sequence_compare_nodes, model); > > > but as far as I can tell, this does not actually resort the item > Well, with the gslist, we sorted the whole child list again when a node had changed, but with the sequence we search for the node to get the iter, then do g_sequence_sort_changed (the next line in the diff which you omitted here). This sorts that item.
Oops, you were talking about the second case. Yes it seems a g_sequence_sort_changed is missing there.
Created attachment 119978 [details] [review] Sequence sort fix attempt 2 Ok, I think I fixed that and another bug in finding nodes in the sequence. The attached patch should fix the problems.
With the latest patch "gedit some_file.txt" causes a crash due to some form of memory corruption (at a quick look it seems that "node" is not properly initialized or it has been double-freed)
*** Bug 556901 has been marked as a duplicate of this bug. ***
Created attachment 121362 [details] [review] batch insertion I finally had some time to work out this alternative patch: instead of changing data structure it inserts items in batch so that sorting is done just once for every batch instead of every item. The patch is not as efficient as switching to gsequence, but it's also less invasive. I was not able to figure out how to fix the gsequence patch :( Review and testing of this patch are very very appreciated, since on one hand it's a fix that I'd like to see go in asap, but on the other hand it's the kind of change that makes me a bit nervous in a stable series.
Created attachment 121455 [details] [review] batch v2 Here is an improved version of the batch insertion patch. I have been using it without problems and I am going to commit it to svn. With respect to the previous version I fixed a problem I introduced in svn last week when I tried to remove the check for the presence of a Node in the parent->children list when populating a dir: the check cannot be removed since I was getting duplicates nodes in some cases, but it can be made way faster by just checking against the *initial* list of children when reading a dir instead of interating also on the items we just added
Patch looks good, some comments: 1) I'd still like to see the exists check gone, I don't think it is really needed, but it obviously needs some more thinking as to the cause of the duplicate problem 2) I would also still want to fix the gsequence thing and see how much speed up it still gives us in constrast with the batch insert, if the performance increase is significant I'd vote for this solution for our next development cycle 3) Other than that, I'm glad you could find the time to work this out :) Great work!
This patch has been forwarded downstream to ubuntu. I reported my results there, but I also wanted to report here, since I was the original reporter: opening a file in /var/lib/dpkg/info (now 5,800 files) takes 177 seconds (minus a few) to open without the patch. With the patch, it takes 10 seconds (minus a few). This is on par with how long nautilus takes, so I think its great! Thanks.