GNOME Bugzilla – Bug 324246
Performance degredation with large numbers of highlighted addresses/URLs
Last modified: 2007-02-07 10:20:24 UTC
Distribution/Version: Debian Sid g-t's performance wrt auto-underlining of URLs/email addresses/etc gets pretty miesrable when there are a lot of them visible in the terminal at any given moment. The first few displayed can be selected or moused over quickly. Given an entire terminal full of them (the contents of an /etc/aliases file, say), the later ones are very slow to do this. Repro procedure: open a terminal, scale it fairly tall. 80x75 works just fine. Fill it with email addresses. Here are some, for easy cutting and pasting and also the fond edification of the spammers whose address harvesters infest bugzilla like an advanced bout of amoebic dysentery. % perl -e 'print ((int rand 0xffffffff)."\@".(int rand 0xffffffff).".com\n") for (0..75)' 2598771383@3173614636.com 4276080976@3345431981.com 2913665852@3040408401.com 1634532303@2098512552.com 4080501969@3309528979.com 3713301835@4161872514.com 271997235@4059478435.com 453340240@1928502679.com 927768547@3395571373.com 2710852044@3661163957.com 3811274047@2792815582.com 2399510606@655015751.com 2571286225@48569895.com 543943281@1139742901.com 3888525537@2764433982.com 974032018@1269519369.com 4088000850@4225579244.com 2716122568@970231509.com 793111467@2183222150.com 3808181268@3559765075.com 3316423282@1026720948.com 1176635352@2312734158.com 179464641@3014947192.com 378987142@3287958849.com 3379083025@2297790067.com 2684388504@281994814.com 2745727580@4115711328.com 1080278727@955771151.com 980806756@2910543589.com 1292040823@2626438177.com 2043133516@3065382026.com 3602894192@3175677856.com 3889280623@2649247497.com 4183135442@313288543.com 429639281@2352682184.com 3954068298@3329750778.com 2578181776@2300292258.com 2381450882@3257033790.com 1662681664@2817136877.com 1996818390@712338376.com 684564983@4080033464.com 1043246156@2351514603.com 2668756548@811699540.com 3655370692@1024055767.com 3801364518@1778596373.com 3376547694@3746577654.com 1948218011@3809244249.com 1854535548@983358617.com 2042292598@1232172489.com 2340792344@1436908500.com 2232000885@3481506526.com 1933110680@2675179429.com 3513550226@675360679.com 3934351344@625048411.com 597198234@573160649.com 1096482152@214638389.com 1700235770@4075159749.com 1040831630@2025123144.com 205803745@3830788480.com 2387286381@1734891686.com 4005464193@1838391808.com 2549803518@857420477.com 2421537464@1352140916.com 1705166302@3985476347.com 733928912@3545276533.com 3009035585@2580767931.com 3981549217@3657018770.com 2827950886@476669787.com 1531426785@2348491274.com 3806953163@3693210796.com 3519670963@1295216083.com 550738020@1600786831.com 360410245@2138159587.com 242482439@2000465116.com 955112983@2202145710.com 2707736041@2920501381.com Pass the mouse cursor over the addresses near the top. Quick underlining, no great surprises on the CPU meter. Now pass over the ones near the end. Better yet, wave the cursor vertically down the list while the CPU pegs out. Try to select the list (by dragging, not by shift-clicking). Etc. Bleh. This affects highlighted things only within the visible area, and does not seem affected by what's in the scrollback buffer.
Perfectly reproducible here, marking as NEW. Thanks.
Created attachment 57219 [details] [review] Alteration to highlighting regexps to improve performance. I came across this when I was trying to paste 1000-char URLs into an application's input for testing. It takes many many seconds for gnome-terminal to become usable - each time there's a mouse movement to change the highlighting. http://aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa@www.lyranthe.org/''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
+ Trace 65154
It seems obvious that this is due to an exponential explosion in the regex possibilities. The problem seemed to like in the "user@domain" highlighting. The section "user@" was highlighted, but the "domain" bit wasn't. Here is a patch which seems to resolve both the OP's issue and mine.
Would you be so kind to provide your patch in diff -up format?
Actually, it was not that hard to read, and fixed the problem gracefully. Your patch has been committed to CVS HEAD. Thanks a lot!
For what it's worth, I can still reproduce this (as described in the original report) with CVS HEAD.
Created attachment 81226 [details] [review] Be a little smarter about what we match Currently we always start mathcing from the start of the screen when looking for something to highlight. This patch starts looking from the beginning of the line in which the pointer is in. I'm not sure this is correct wrt how the beginning of the line is being matched though...
/* Repaint the newly-hilited area. */ _vte_invalidate_cells(terminal, - 0, - terminal->column_count, + terminal->pvt->match_start.column, + terminal->pvt->match_end.column - + terminal->pvt->match_start.column + 1, terminal->pvt->match_start.row, terminal->pvt->match_end.row - terminal->pvt->match_start.row + 1); This doesn't look right. The matched area is not necessarily a rectangle, is it?
Urgh, the last two hunks were not meant to be there. Indeed, repainting goes wrong with them. BTW, it may make sense to remove the match highlighting when we get an LeaveNotify event and simmetrically reshow it on EnterNotify.
Created attachment 81768 [details] [review] Update Mariano's patch to head Works well. Though I'm not clear on why it works - the regex matching is an area I've avoided so far ;-)
Created attachment 81898 [details] [review] Update Mariano's patch to head (regex matching)
[Digression r1639: 2007-02-06 Chris Wilson <chris@chris-wilson.co.uk> And finally as noted on bug 324246#c8 hide the hilite when the mouse leaves the terminal (show again when it enters and let motion-notify correct it later). * src/vte.c: (vte_terminal_enter), (vte_terminal_leave), (vte_terminal_realize), (vte_terminal_class_init): ]
For reference, this is one half of the easy part of bug 345344.
Ok, I'm starting to understand what is trying to be achieved here and what the code is doing. Mariano, your patch is broken in the sense that it does not reset coffset for each regex - as coffset is advanced until a match is found. I think we should consider multiline regexes to be unsupported and limit the string to the contents of the current row since during simple usage regexec() consumes ~30% of the runtime. Unfortunately that looks like it'll incur a copy as the basic posix regexec does not take a string length. Grrr.
Created attachment 82057 [details] [review] Perform regex on the single line. I'm tired but this seems to work.
Created attachment 82058 [details] [review] Perform regexec() on the single line Bah apart from the use after free in the g_strndup() of the result. :-(
Can't you temporarily add a '\0' where you want to end the string and restore after the regexec()?
After cleaning up... r1645: 2007-02-07 Chris Wilson <chris@chris-wilson.co.uk> Bug 345344 – Pattern matching is inefficient Bug 324246 – Performance degredation with large numbers of highlighted addresses/URLs * src/vte.c: (vte_terminal_match_check_internal): Trim the searched string down to the row containing the pointer. During a mutt session this drops the time consumed by regexec from ~30% to ~2%. Note: multi-line regexes are now unsupported!
> Note: multi-line regexes are now unsupported! Does that mean that long urls that are wrapped are not detected properly now?
Only accidentally ;-) They should be fine since soft-wrapping does not insert a newline. (However as I forgot to scan forwards to find the next newline...)
Ah, great :) Thanks!
Just to keep everybody happy with those long URLs.. r1646: 2007-02-07 Chris Wilson <chris@chris-wilson.co.uk> * src/vte.c: (vte_terminal_match_check_internal): Scan for newlines before and after the current row to find the entire line in case of soft-wrapping.