Bug 554362 – gedit uses %100 CPU when opening a file in /var/lib/dpkg/info/

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 554362 - gedit uses %100 CPU when opening a file in /var/lib/dpkg/info/


Summary:	gedit uses %100 CPU when opening a file in /var/lib/dpkg/info/


Status:	RESOLVED FIXED

Product:	gedit
Classification:	Applications
Component:	general
Version:	2.24.x
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Gedit maintainers
QA Contact:	Gedit maintainers

URL:
Whiteboard:

Duplicates:	556901 (view as bug list)
Depends on:
Blocks:

Reported:	2008-09-29 22:14 UTC by Dan
Modified:	2009-08-03 09:10 UTC

See Also:
GNOME target:	---
GNOME version:	2.23/2.24

Attachments
Change filebrowser store internals to use gsequence instead of gslist (20.04 KB, patch) 2008-09-30 19:31 UTC, jessevdk@gmail.com	none	Details \| Review
Sequence sort fix attempt 2 (20.82 KB, patch) 2008-10-05 20:00 UTC, jessevdk@gmail.com	needs-work	Details \| Review
batch insertion (5.42 KB, patch) 2008-10-25 18:57 UTC, Paolo Borelli	none	Details \| Review
batch v2 (9.58 KB, patch) 2008-10-27 19:53 UTC, Paolo Borelli	committed	Details \| Review

Description Dan 2008-09-29 22:14:59 UTC

Please describe the problem:
I found this bug on Ubuntu intrepid.  You can see the original bug report here:
https://bugs.launchpad.net/ubuntu/+source/gedit/+bug/276094

Steps to reproduce:
1. gedit /var/lib/dpkg/info/adduser.list
2. << force closure >>
3. run gedit without any arguments, and it happens again


Actual results:
gedit maxes the cpu, and as a result, runs very slowly.

Expected results:


Does this happen every time?
yes

Other information:
this had been confirmed downstream.

There is another quirk of this bug, that even after you close the instance of gedit that is causing a problem, the bug then lingers to subsequent instances.  The main example is step 2; just try to run gedit again.

Comment 1 Dan 2008-09-29 22:25:11 UTC

I forgot one important detail.  Gedit must be closed for this bug to happen.  If gedit is already open, and you open a file in /var/lib/dpkg/info/, then it works fine.

Comment 2 Paolo Borelli 2008-09-29 22:52:14 UTC

The bug is due to the filebrowser plugin, if you disable it the cpu usage goes back to normal. It happens when you launch gedit because in that case the filebrowser plugin goes to the file dir (and the next time you simply launch "gedit" it remembers that dir).

We need to figure out what's wrong with that dir...

Comment 3 Dan 2008-09-30 02:04:11 UTC

500 files == 4 second cpu burn

1000 files == 14 second cpu burn

2000 files == 25 second cpu burn

4000 files == 120 second cpu burn

/var/lib/dpkg/info (6,200 files)  == 200 second cpu burn

So, the problem with the directory is the sheer number of files, and also it looks like the algorithm being used is non-linear.

It might be that there is no bug in the algorithm per say, and that it needs all that cpu to organize all the files.

I looked at what the filebrowser does, and I can, off the cuff, think of 2 possible solutions:
a) defer the execution of the algorithm until the user attempts to actually browse the files.  That way, normal operations can continue normally and there isn't much of a problem unless the user uses that feature directly.  
b) have the execution operate in a separate thread, with low priority.  That way, one can use gedit without it being super slow.

Comment 4 jessevdk@gmail.com 2008-09-30 19:31:13 UTC

Created attachment 119677 [details] [review]
Change filebrowser store internals to use gsequence instead of gslist

Well, I've analyzed this a bit, and the problem is the sorting of the files. Each file is currently insorted into the container using g_slist_insert_sorted, which is of course terribly slow. The patch changes the container to a gsequence which is a binary tree internally, so it is much faster. This fixes the problem of long waiting times, but not necessarily the responsiveness of gedit (loading 2000 files is still making gedit unresponsive here, which I think is unacceptable). I'm not sure how we could fix this, but we should at least be able to not start loading the store until it is actually visualized.

Comment 5 Paolo Borelli 2008-10-05 13:33:18 UTC

in model_resort_node the patch does:

-		dir->children = g_slist_sort (dir->children,
-					      (GCompareFunc) (model->priv->
-							      sort_func));
+		item = g_sequence_search (dir->children, node, sequence_compare_nodes, model);


but as far as I can tell, this does not actually resort the item

Comment 6 Paolo Borelli 2008-10-05 14:28:33 UTC

Also, pressing the "Up" button just after gedit is started causes a crash with the following assertions triggered

sys:1: Warning: g_sequence_get_begin_iter: assertion `seq != NULL' failed
sys:1: Warning: g_sequence_iter_is_end: assertion `iter != NULL' failed
sys:1: Warning: g_sequence_get: assertion `iter != NULL' failed
sys:1: Warning: g_sequence_iter_next: assertion `iter != NULL' failed

Comment 7 jessevdk@gmail.com 2008-10-05 16:28:25 UTC

(In reply to comment #5)
> in model_resort_node the patch does:
> 
> -               dir->children = g_slist_sort (dir->children,
> -                                             (GCompareFunc) (model->priv->
> -                                                             sort_func));
> +               item = g_sequence_search (dir->children, node,
> sequence_compare_nodes, model);
> 
> 
> but as far as I can tell, this does not actually resort the item
> 

Well, with the gslist, we sorted the whole child list again when a node had changed, but with the sequence we search for the node to get the iter, then do g_sequence_sort_changed (the next line in the diff which you omitted here). This sorts that item.

Comment 8 jessevdk@gmail.com 2008-10-05 16:31:00 UTC

Oops, you were talking about the second case. Yes it seems a g_sequence_sort_changed is missing there.

Comment 9 jessevdk@gmail.com 2008-10-05 20:00:00 UTC

Created attachment 119978 [details] [review]
Sequence sort fix attempt 2

Ok, I think I fixed that and another bug in finding nodes in the sequence. The attached patch should fix the problems.

Comment 10 Paolo Borelli 2008-10-11 18:44:42 UTC

With the latest patch "gedit some_file.txt" causes a crash due to some form of memory corruption (at a quick look it seems that "node" is not properly initialized or it has been double-freed)

Comment 11 Paolo Borelli 2008-10-25 18:52:57 UTC

*** Bug 556901 has been marked as a duplicate of this bug. ***

Comment 12 Paolo Borelli 2008-10-25 18:57:57 UTC

Created attachment 121362 [details] [review]
batch insertion

I finally had some time to work out this alternative patch: instead of changing data structure it inserts items in batch so that sorting is done just once for every batch instead of every item.

The patch is not as efficient as switching to gsequence, but it's also less invasive.

I was not able to figure out how to fix the gsequence patch :(

Review and testing of this patch are very very appreciated, since on one hand it's a fix that I'd like to see go in asap, but on the other hand it's the kind of change that makes me a bit nervous in a stable series.

Comment 13 Paolo Borelli 2008-10-27 19:53:28 UTC

Created attachment 121455 [details] [review]
batch v2

Here is an improved version of the batch insertion patch. I have been using it without problems and I am going to commit it to svn.

With respect to the previous version I fixed a problem I introduced in svn last week when I tried to remove the check for the presence of a Node in the parent->children list when populating a dir: the check cannot be removed since I was getting duplicates nodes in some cases, but it can be made way faster by just checking against the *initial* list of children when reading a dir instead of interating also on the items we just added

Comment 14 jessevdk@gmail.com 2008-10-28 09:22:48 UTC

Patch looks good, some comments:

1) I'd still like to see the exists check gone, I don't think it is really needed, but it obviously needs some more thinking as to the cause of the duplicate problem
2) I would also still want to fix the gsequence thing and see how much speed up it still gives us in constrast with the batch insert, if the performance increase is significant I'd vote for this solution for our next development cycle
3) Other than that, I'm glad you could find the time to work this out :) Great work!

Comment 15 Dan 2008-11-04 17:20:38 UTC

This patch has been forwarded downstream to ubuntu.  I reported my results there, but I also wanted to report here, since I was the original reporter:
opening a file in /var/lib/dpkg/info (now 5,800 files) takes 177 seconds (minus a few) to open without the patch.

With the patch, it takes 10 seconds (minus a few).  This is on par with how long nautilus takes, so I think its great!  Thanks.