GNOME Bugzilla – Bug 68720
gedit perfomance problems with big files
Last modified: 2004-12-22 21:47:04 UTC
Package: gedit Severity: normal Version: 0.9.7 Synopsis: Crash doing search on large text file Bugzilla-Product: gedit Bugzilla-Component: general Description: I'm working with a couple of large text files (that i can't send for reasons of privacy.) One is 211k, another is 240k. I opened the find window using F5, and then searched for Girral (it's in there). gedit then crashed Debugging Information: (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)...[New Thread 1024 (LWP 23761)] 0x40742989 in __wait4 () from /lib/i686/libc.so.6
+ Trace 16216
------- Bug moved to this database by unknown@bugzilla.gnome.org 2002-01-14 19:09 ------- Reassigning to the default owner of the component, chema@celorio.com.
Ooops, that was F6 not F5. I opened both files again and couldn't reproduce the crash. Bummer Hope the debug info helps chema!
Any chance this information will be usable, guys?
Cleaning up and reassigning. Paolo, you are down to 17 bugs, many of which are crashers. If you could read through all the bugs, I'd appreciate it. You should do two things for each bug, especially crashers: 1) If the code path is not relevant for GNOME2, please remove the GNOME2 keyword. 2) if the code is otherwise irrelevant (not in gedit, already fixed, whatever) please close it out. If you do these things, hopefully I can help keep the gedit corner of the bugzilla much cleaner in the future. Search on 'luis doing GNOME2 work' to filter out this spam.
I tested with 1.118.0 on linux/solaris with approx 3k line text file, this bug couldnot be reproduced.
I'm using gedit2-1.119.0.0.200205141824-0.snap.ximian.1 and after loading a 3.2M 1200 line text file, it crashed doing a find and replace. As a comparison, this file took about a minute to open fully in gedit and performance wasn't good. In vi, it opens in less than 5 seconds. While the search and replace failed in gedit (I left the process chewing 100% of CPU for a full 10 minutes before shuting the window) vi completed the same search and replace (':% s/and/AND/g ) in less than five seconds. Out of interest there was 16429 substitutions on 506 lines. Food for thought.
About the gedit 1.119.0 problem. Please, try to reproduce this bug in testtext (in the gtk+ test directory) and let me know. I can try to accellerate gedit loading, but probably we will never get the vi performance. gedit is only a simple editor, vi is more sofisticated and I think it is a "disk" editor. Did gedit core dump? Or did you kill it? In the first case, could you please attach a backtrace?
About gedit 1.119.0 I have made some experiment with a big text file ~4.5Mb (containing the same line repeated 120000 times) on an AMD K6-III 450Mhz with 128Mb running RH Linux 7.2 Yep, "Replace All" on big files (120000 substituitions) is actually very very slow (about 4 minutes) This is partially due to the way gedit manages undo info for replace all operations (I should fix this). But it does not crash for me. The file loading is quite fast for me (less than 5 secs), even if I see gedit chewing 100% of CPU for about 1 minute. Note that, during this time, you can use it. If you go at the end of the file you will also see strange things happening to the scroll bar. BTW, I can reproduce the loading problems also with testtext so I think it is a gtk+ problem. Find operations on big files are quite slow on testtext too (but not so slow as in gedit) Also copying all the file in clipboard is quite slow (both in gedit and in testtext) Also redoing a Replace All operation (120000 substitutions) is slow. Lowering Severity and Priority to since it is not a crasher Rodd: is a crasher for you?
Haven't got the time to test right now, I'll do it tomorrow. However, gedit didn't core dump, but when I closed the window, a dialog informed me that it had stopped reponding and did i want to kill it. No backtrace to attach.
Maggi: I'd class this as a crasher for me.
One interesting fact! There is a significant difference in time taken for Find/Replace between "case-sensitive" and "case-insensitive" cases. I have a file with 85000 lines. When I did find/replace for a word(with 3800 instances) on my Linux box, the results are as below: case-sensitive 40-42 secs. case-insensitive 70-72 secs. When I went thru the code, I could find only one difference in the code executed for both the cases. For case-sensitive case, each line of the file is searched for word with strstr.(gedit_document_find -> gedit_text_iter_forward_search-> gtk_text_iter_forward_search -> lines-match->strstr) For case-insensitive find/replace, each line of the file is searched for the word with g_utf8_strcasestr call. (gedit_document_find -> gedit_text_iter_forward_search -> lines_match -> g_utf8_strcasestr) g_utf8_strcasestr is slow when compared with strstr(). And g_utf8_strcasestr is used to make internationalised caseless search. I am just wondering why g_utf8_XXXXXXXX calls are used for case-insensitive search and NOT for case-sensitive search.??
I have profiled find/replace(case-sensitive) for 18030+ instances. I will attach the profile output shortly. It is observed that gedit_undo_manager_get_number_of_groups() function is hogging lot of time. (18030+ find/replacements took 360 odd seconds to finish. The profiler showed a total time of 236 seconds. Out of this 234.31 seconds is hogged by this function.)
Created attachment 10435 [details] Profiler output for 18000+ case-sensitive replacements
Proposed patch that does away with linked-list scanning in gedit_undo_manager_get_number_of_groups function by caching the group count.
Created attachment 10600 [details] [review] caching group count
Created attachment 10601 [details] [review] Does away with varargs processing inside gedit_debug() when debug flags not set
I have committed both the patches. The second one was ok. The first one had some problems so I modified it a bit.
Did you find other performance problems? Is "replace all" fast enough now?
Is this a problem for the 0.9.7 release that it was originally reported against or is it limited to GNOME2?
The crash was in gedit 0.9.7, but the performance problems are in the GNOME2 version.
I'm not sure if it is right to note this here or not, but for me the biggest performance hits come with a single long line- like, if I have a single 600 character line, operations get brutally slow, even if the whole doc is on that line.
About the slowness with very long lines see bug #114337