GNOME Bugzilla – Bug 160917
Gedit freezes while performing lenghtly search and replace operations
Last modified: 2005-12-17 10:33:03 UTC
1. Get a big text file (such as a 1.5 mb sql file) 2. Search and replace a text string. 3. Gedit freezes while performing the task, but when finished, unfreezes.
You are right, gedit has quite big performance problems while performing "Replace All" operations on big files. This is an area where gedit would need some profiling.
Created attachment 37220 [details] Profiling data collected during a big "replace all" operation... gedit_view_update_cursor_position_statusbar and gedit_document_find seem to have some performance problems...
Thanks! this is really valuable information... that's a gprof profile, right? I don't trust gprof much, is the data reproducible? Can you also describe in detail how you did the profile? I never had much luck with gprof and GUI programs, but maybe that's my fault :) Looking at the data, gedit_view_update_cursor_position_statusbar can definately be one of the bottlenecks since each time is called has to count how many lines there are from the beginning of the buffer and how many chars there are from the beginning of the current line. I don't see an easy way to improve that (keeping the counter up to date instead of counting from the start every time seems an abvious optimization, but may become tricky if you consider that the user may move the cursor at a random place with the mouse). Much more interesting is to find out why it is called a shitload of times... here the first thing that comes to my mind is to disable counting while performing the search&replace. However just by braking in update_cursor_position_statusbar() from gdb it turns out that the function is called a lot of times even just when scrolling the text and that sounds strange to me, so I've come up with the following really simple patch: it does not remove all the unneeded calls, but it should improve things and seems pretty simple and logical to me. Any chance you could test again with the ptach applied? (PS: in future can you attach the profile data as plain text instead of gzipping it? It makes things more confortable and bugzilla is not falling short on storage :)
Created attachment 37229 [details] [review] proposed patch
Luis, it looks like you forgot to add yoourself to CC :) Can you read the above comment, in caseyou have missed it?
ok here are some numbers obtained with the patch above by s/gedit/pippo in the profile file you attached above. The numbers are obtained by installing a timer in gedit-document.c and measuring the elapsed time from the start to the end of the replace_all function. without the patch: paolo@murdock:~/cvs/gnome2/gedit$ gedit/gedit ** Message: start : 0,000 ** Message: end : 2,231 ** Message: start : 0,000 ** Message: end : 2,156 ** Message: start : 0,000 ** Message: end : 2,202 ** Message: start : 0,000 ** Message: end : 2,202 with the patch: paolo@murdock:~/cvs/gnome2/gedit$ gedit/gedit ** Message: start : 0,000 ** Message: end : 1,985 ** Message: start : 0,000 ** Message: end : 2,035 ** Message: start : 0,000 ** Message: end : 1,750 ** Message: start : 0,000 ** Message: end : 1,743 A speedup is measurable, even if not huge... this may also be due to the test file: in the test file the string "gedit" is present on almost every line, so the percentage of the the time spent in the actual substitution may be larger then in an average file.
Thanks for adding me to CC paolo, I'm still learning this all :-) I tried to do my best, I'm very new to linux programming, and even with the gnome-love help, the learning curve is still high... For the profiling data, yes I used gprof ( I used the information I found at http://www.network-theory.co.uk/docs/gccintro/gccintro_66.html ), and used a test file with lots of duplicate lines (~ 10000 lines). On my machine, replacing a word is about 8-10 seconds... The same action with vim is intantaneous ! Gedit had previously been successfully compiled by jhbuild. I did the profiling this way : 1. I performed a "make clean" 2. Compiled with "make CFLAGS=-pg" (do you know if this is the way I should modify compilation settings in a autotools-driven project ?) 3. I ran Gedit on my test file, then performed a big "replace all" 4. Profiling data was extracted with "gprof /opt/gnome2/bin/gedit > gedit_replace_all_profiling.txt" If I compressed the data its because I thought 228KB of text data was a little too big. I will not do it anymore, I promise :-) . I try your patch ASAP...
> 2. Compiled with "make CFLAGS=-pg" (do you know if this is the way I should > modify compilation settings in a autotools-driven project ?) Sounds right, but the fact is that theoretically you would need to recompile also all the underling libraries (glib, atk, gtk etc) with the same flags... anyway for now it would be great if you could just measure if the above patch improves things whithout changing other things.
Created attachment 37272 [details] Comparison of the results without and with the patch I put it as an attachment since columns are too wide to fit well in a normal comment... My test file has 9879 lines... you can see that now gedit_view_update_cursor_position_statusbar is roughly called once per line. But I'm not sure if it really has to be called at all anyway...
What I don't understand is how gedit_view_update_cursor_position_statusbar can be called thrice less and still replacement have so poor performance (still around 8 seconds for me). It still needs improvement, because this is awful compared to vim. Otherwise, I saw nothing broken, but only gave it a quick try.
Comment on attachment 37229 [details] [review] proposed patch The patch looks good to me, please commit.
Comment on attachment 37229 [details] [review] proposed patch committed the above patch, but further optmization would still be good.
Created attachment 53929 [details] Sysprof profiling data for gedit 2.12.1 Adding this sysprof data in case someone can understand better the problem from it (the file is compressed because it was refused by bugzilla: too big). Check out http://primates.ximian.com/~federico/news-2005-07.html#26 for more info about sysprof. The sample file was genererated with: for i in $(seq 5000); do echo "foo bar foobar" >> gedit-replace-all-sample.txt; done I replaced "foo" by "fun" to have on each line "fun bar funbar". What I understand from this is that too many events are sent while performing the replacement, while we only need to redraw everything when the global replacement is done, and not after replacing each word. But I'm unsure of that. The other guilty one may be gedit_mdi_child_find_state_changed_handler. May one of the Paolos tell me if the comportment of this function is affected by the ongoing new_mdi modifications, and maybe try to check if the results are better in new_mdi ?
The sysprof log from comment #13 is great; it clearly indicates the problems. I just blogged about this; please see http://primates.ximian.com/~federico/news-2005-10.html#gedit-replace-all-sysprof for the details. The summary is this: 1. gedit_document_replace_all() calls gedit_document_find() many times. 2. When gedit_document_find() does its work, it emits a signal that the GUI code uses to update the sensitivity of menu items. 3. The GUI code changes the sensitivity of a couple menu items. It does this by asking the BonoboUIComponent to freeze itself, then changing the sensitivity, then thawing the component. Bonobo then updates the widgets that correspond to menu items with pending changes. The problem is that (3) happens for every match in the "replace all" process! The solution is to update the GUI only once, when the whole process has finished.
IIRC, we still have to write "Replace All" in new_mdi... help is welcome. Your data demonstrate how some time a developer could be wrong when he tries to guess where the performance problems of a given piece of code are (without trying to gather real profiling data). I was quite sure the most important performance problem was the undo_manager with its bad list management. Thanks for your help!
Luis, Federico: awesome! As Paolo said, the problem doesn't affect new_mdi (yet), but knowing about it will surely halp us avoid introducing it. Paolo: replace_all is there already, but we currently don't implement the 'find again' functionality (in the sense that you can find again, but sensitivity is not set and last searched term is not persisted) so we never emit that signal. By the way, I think that we should get rid of the signal and turn it into a property, but that obviusly doesn't affect the fact that the property should just be set once at the end of replace_all.
Thanks federico for your help on this ! Still, I don't know if doing a patch is necessary, if the new_mdi branch is not affected... Unless the new_mdi release of gedit is not ready for GNOME 2.14... Is it worth the trouble ? Paolo said: "I was quite sure the most important performance problem was the undo_manager with its bad list management." You may still be wright on this point, Paolo. I noticed yesterday that after doing the replacements, Gedit was reaaaly slow to quit when modifications are not saved. So maybe your intuition is correct here about the undo_manager. Does this comportment also happen in the new_mdi branch ?
Luis, here are some clarifications: - new_mdi is currently not affected by the problem, but this is because we have not a system in place to specify the sensitivity of 'find again' (that is to say that the can_find_again signal is never emitted). This is because the 'search' code has been changed in many ways, for instance to support find-as-you-type, the code is still in flux and we probably need some discussion on how to finalize it. Anyway at some point we need code to set the sensitivity on 'find again'. And yes, new_mdi is targeted at 2.14. - As for the patch for 2.12, I'd say that it's up to you... it's a dying codebase, but I guess that after the effort you put into this a patch may be fun to write and should be pretty easy[1]. It would also help us to quantify the improvement. If it's small and self contained I'd be ok to include it for 2.12.2, however I haven't discussed this with Paolo, so he may not agree. - for the undo manager, see bug #312653. It's a gtksourceview issue, thus it's still valid for new_mdi [1]: should be a matter of factoring out a gedit_document_find_real() that doesn't emit the signal from gedit_document_find and use it in replace_all.
As stated in patch submission reference: http://live.gnome.org/GnomeLove_2fSubmittingPatches , I join the changelog in the comments of the patch... Improvement can be noticed: without patch: 15 seconds with patch: 10 seconds I also attach a gzipped sysprof profile with the patch, showing that now gedit_document_replace_selected_text seems to be the way to go... But I still don't understand why using vim, the operations is instantaneous on the same file...
Created attachment 54151 [details] [review] Patch to avoid sending GUI update events while in "replace all" 2005-10-31 Luis Menina <liberforce@fr.st> * gedit/gedit-document.c: Improvement for bug #160917 gedit_document_replace_all now calls gedit_document_find_real instead of gedit_document_find to avoid sending useless GUI update events
Created attachment 54153 [details] Sysprof profile with the patch above applied
Ok, this is finally fixed for good on HEAD (2.13.0, based on the new_mdi branch). Luis, feel free to give it a run just to confirm :) Paolo, do we close the bug or do you think that it's worth at least apply the last patch to the 2.12 branch? (patch is easy enough, but I am not sure we are even going to have another 2.12.X release)
Ok, tried with HEAD... 1.5 second :-) that's much better ! Undoing the action however takes 7.5s, but this must be related with bug #312653 This one is good for me, ok for closing. It's a pity my first patch wasn't applied, even if trivial :'-( Maybe next one :-p
Yes, please keep 2.12 alive. You know, we need a fallback in the case HEAD will not prove itself stable enough.
Ok, I committed the last patch to the 2-12 branch. Btw, Luis your profiling work has been wonderful and would have been very appreciated even if the patch didn't went in :) About undoing the replace all being slower: this is due to the fact that at the moment GtkSourceView's UndoManager sees the replace all as a collection of deletes and inserts, so when we are performing the real replace all we can prevent the signal from being emitted, but during the undo there is no way we can prevent emitting the signal for every change in the buffer. This is a known issue and not easily solvable with the current undo manager, unless we move replace all into SourceView itself. I am closing this bug.