After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 324246 - Performance degredation with large numbers of highlighted addresses/URLs
Performance degredation with large numbers of highlighted addresses/URLs
Status: RESOLVED FIXED
Product: vte
Classification: Core
Component: general
0.15.x
Other Linux
: Normal normal
: ---
Assigned To: VTE Maintainers
VTE Maintainers
Depends on:
Blocks:
 
 
Reported: 2005-12-16 08:16 UTC by Devin Carraway
Modified: 2007-02-07 10:20 UTC
See Also:
GNOME target: ---
GNOME version: 2.9/2.10


Attachments
Alteration to highlighting regexps to improve performance. (1016 bytes, patch)
2006-01-12 12:03 UTC, Gary Coady
committed Details | Review
Be a little smarter about what we match (1.91 KB, patch)
2007-01-25 21:43 UTC, Mariano Suárez-Alvarez
none Details | Review
Update Mariano's patch to head (947 bytes, patch)
2007-02-02 15:43 UTC, Chris Wilson
none Details | Review
Update Mariano's patch to head (regex matching) (947 bytes, patch)
2007-02-04 21:50 UTC, Chris Wilson
needs-work Details | Review
Perform regex on the single line. (5.54 KB, patch)
2007-02-07 02:28 UTC, Chris Wilson
none Details | Review
Perform regexec() on the single line (5.58 KB, patch)
2007-02-07 02:31 UTC, Chris Wilson
none Details | Review

Description Devin Carraway 2005-12-16 08:16:18 UTC
Distribution/Version: Debian Sid

g-t's performance wrt auto-underlining of URLs/email addresses/etc gets pretty
miesrable when there are a lot of them visible in the terminal at any given
moment.  The first few displayed can be selected or moused over quickly.  Given
an entire terminal full of them (the contents of an /etc/aliases file, say), the
later ones are very slow to do this.

Repro procedure: open a terminal, scale it fairly tall.  80x75 works just fine.
 Fill it with email addresses.  Here are some, for easy cutting and pasting and
also the fond edification of the spammers whose address harvesters infest
bugzilla like an advanced bout of amoebic dysentery.

% perl -e 'print ((int rand 0xffffffff)."\@".(int rand 0xffffffff).".com\n") for
(0..75)'
2598771383@3173614636.com
4276080976@3345431981.com
2913665852@3040408401.com
1634532303@2098512552.com
4080501969@3309528979.com
3713301835@4161872514.com
271997235@4059478435.com
453340240@1928502679.com
927768547@3395571373.com
2710852044@3661163957.com
3811274047@2792815582.com
2399510606@655015751.com
2571286225@48569895.com
543943281@1139742901.com
3888525537@2764433982.com
974032018@1269519369.com
4088000850@4225579244.com
2716122568@970231509.com
793111467@2183222150.com
3808181268@3559765075.com
3316423282@1026720948.com
1176635352@2312734158.com
179464641@3014947192.com
378987142@3287958849.com
3379083025@2297790067.com
2684388504@281994814.com
2745727580@4115711328.com
1080278727@955771151.com
980806756@2910543589.com
1292040823@2626438177.com
2043133516@3065382026.com
3602894192@3175677856.com
3889280623@2649247497.com
4183135442@313288543.com
429639281@2352682184.com
3954068298@3329750778.com
2578181776@2300292258.com
2381450882@3257033790.com
1662681664@2817136877.com
1996818390@712338376.com
684564983@4080033464.com
1043246156@2351514603.com
2668756548@811699540.com
3655370692@1024055767.com
3801364518@1778596373.com
3376547694@3746577654.com
1948218011@3809244249.com
1854535548@983358617.com
2042292598@1232172489.com
2340792344@1436908500.com
2232000885@3481506526.com
1933110680@2675179429.com
3513550226@675360679.com
3934351344@625048411.com
597198234@573160649.com
1096482152@214638389.com
1700235770@4075159749.com
1040831630@2025123144.com
205803745@3830788480.com
2387286381@1734891686.com
4005464193@1838391808.com
2549803518@857420477.com
2421537464@1352140916.com
1705166302@3985476347.com
733928912@3545276533.com
3009035585@2580767931.com
3981549217@3657018770.com
2827950886@476669787.com
1531426785@2348491274.com
3806953163@3693210796.com
3519670963@1295216083.com
550738020@1600786831.com
360410245@2138159587.com
242482439@2000465116.com
955112983@2202145710.com
2707736041@2920501381.com

Pass the mouse cursor over the addresses near the top.  Quick underlining, no
great surprises on the CPU meter.  Now pass over the ones near the end.  Better
yet, wave the cursor vertically down the list while the CPU pegs out.  Try to
select the list (by dragging, not by shift-clicking).  Etc.  Bleh.

This affects highlighted things only within the visible area, and does not seem
affected by what's in the scrollback buffer.
Comment 1 Guilherme de Siqueira Pastore 2006-01-10 03:14:47 UTC
Perfectly reproducible here, marking as NEW. Thanks.
Comment 2 Gary Coady 2006-01-12 12:03:15 UTC
Created attachment 57219 [details] [review]
Alteration to highlighting regexps to improve performance.

I came across this when I was trying to paste 1000-char URLs into an application's input for testing. It takes many many seconds for gnome-terminal to become usable - each time there's a mouse movement to change the highlighting.

http://aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa@www.lyranthe.org/''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

  • #0 sift_states_backward
    from /lib/tls/i686/cmov/libc.so.6
  • #1 re_search_internal
    from /lib/tls/i686/cmov/libc.so.6
  • #2 regexec
    from /lib/tls/i686/cmov/libc.so.6
  • #3 _vte_regex_exec
    from /usr/lib/libvte.so.4
  • #4 vte_terminal_get_text
    from /usr/lib/libvte.so.4
  • #5 vte_terminal_get_text
    from /usr/lib/libvte.so.4
  • #6 vte_terminal_get_text
    from /usr/lib/libvte.so.4
  • #7 _gtk_marshal_BOOLEAN__BOXED
    from /usr/lib/libgtk-x11-2.0.so.0
  • #8 g_cclosure_new_swap
    from /usr/lib/libgobject-2.0.so.0
  • #9 g_closure_invoke
    from /usr/lib/libgobject-2.0.so.0
  • #10 g_signal_stop_emission
    from /usr/lib/libgobject-2.0.so.0
  • #11 g_signal_emit_valist
    from /usr/lib/libgobject-2.0.so.0
  • #12 g_signal_emit
    from /usr/lib/libgobject-2.0.so.0
  • #13 gtk_widget_activate
    from /usr/lib/libgtk-x11-2.0.so.0
  • #14 gtk_propagate_event
    from /usr/lib/libgtk-x11-2.0.so.0
  • #15 gtk_main_do_event
    from /usr/lib/libgtk-x11-2.0.so.0
  • #16 _gdk_events_queue
    from /usr/lib/libgdk-x11-2.0.so.0
  • #17 g_main_context_dispatch
    from /usr/lib/libglib-2.0.so.0
  • #18 g_main_context_check
    from /usr/lib/libglib-2.0.so.0
  • #19 g_main_loop_run
    from /usr/lib/libglib-2.0.so.0
  • #20 gtk_main
    from /usr/lib/libgtk-x11-2.0.so.0
  • #21 ??

It seems obvious that this is due to an exponential explosion in the regex possibilities. The problem seemed to like in the "user@domain" highlighting. The section "user@" was highlighted, but the "domain" bit wasn't.

Here is a patch which seems to resolve both the OP's issue and mine.
Comment 3 Guilherme de Siqueira Pastore 2006-01-12 13:13:29 UTC
Would you be so kind to provide your patch in diff -up format?
Comment 4 Guilherme de Siqueira Pastore 2006-01-12 14:02:32 UTC
Actually, it was not that hard to read, and fixed the problem gracefully. Your patch has been committed to CVS HEAD. Thanks a lot!
Comment 5 Behdad Esfahbod 2006-03-30 22:31:54 UTC
For what it's worth, I can still reproduce this (as described in the original report) with CVS HEAD.
Comment 6 Mariano Suárez-Alvarez 2007-01-25 21:43:47 UTC
Created attachment 81226 [details] [review]
Be a little smarter about what we match

Currently we always start mathcing from the start of the screen when looking for something to highlight. This patch starts looking from the beginning of the line in which the pointer is in. I'm not sure this is correct wrt how the beginning of the line is being matched though...
Comment 7 Behdad Esfahbod 2007-01-25 22:05:20 UTC
 		/* Repaint the newly-hilited area. */
 		_vte_invalidate_cells(terminal,
-				     0,
-				     terminal->column_count,
+				     terminal->pvt->match_start.column,
+				     terminal->pvt->match_end.column -
+				     terminal->pvt->match_start.column + 1,
 				     terminal->pvt->match_start.row,
 				     terminal->pvt->match_end.row -
 				     terminal->pvt->match_start.row + 1);

This doesn't look right.  The matched area is not necessarily a rectangle, is it?
Comment 8 Mariano Suárez-Alvarez 2007-01-25 22:31:17 UTC
Urgh, the last two hunks were not meant to be there. Indeed, repainting goes wrong with them.

BTW, it may make sense to remove the match highlighting when we get an LeaveNotify event and simmetrically reshow it on EnterNotify.
Comment 9 Chris Wilson 2007-02-02 15:43:14 UTC
Created attachment 81768 [details] [review]
Update Mariano's patch to head

Works well. Though I'm not clear on why it works - the regex matching is an area I've avoided so far ;-)
Comment 10 Chris Wilson 2007-02-04 21:50:35 UTC
Created attachment 81898 [details] [review]
Update Mariano's patch to head (regex matching)
Comment 11 Chris Wilson 2007-02-06 19:04:53 UTC
[Digression r1639: 2007-02-06  Chris Wilson  <chris@chris-wilson.co.uk>

	And finally as noted on bug 324246#c8 hide the hilite when the mouse
	leaves the terminal (show again when it enters and let motion-notify
	correct it later).

	* src/vte.c: (vte_terminal_enter), (vte_terminal_leave),
	(vte_terminal_realize), (vte_terminal_class_init):
]
Comment 12 Chris Wilson 2007-02-07 01:02:35 UTC
For reference, this is one half of the easy part of bug 345344.
Comment 13 Chris Wilson 2007-02-07 02:05:35 UTC
Ok, I'm starting to understand what is trying to be achieved here and what the code is doing.

Mariano, your patch is broken in the sense that it does not reset coffset for each regex - as coffset is advanced until a match is found.

I think we should consider multiline regexes to be unsupported and limit the string to the contents of the current row since during simple usage regexec() consumes ~30% of the runtime. Unfortunately that looks like it'll incur a copy as the basic posix regexec does not take a string length. Grrr.
Comment 14 Chris Wilson 2007-02-07 02:28:43 UTC
Created attachment 82057 [details] [review]
Perform regex on the single line.

I'm tired but this seems to work.
Comment 15 Chris Wilson 2007-02-07 02:31:20 UTC
Created attachment 82058 [details] [review]
Perform regexec() on the single line

Bah apart from the use after free in the g_strndup() of the result. :-(
Comment 16 Behdad Esfahbod 2007-02-07 04:09:56 UTC
Can't you temporarily add a '\0' where you want to end the string and restore after the regexec()?
Comment 17 Chris Wilson 2007-02-07 09:51:50 UTC
After cleaning up...

r1645: 2007-02-07  Chris Wilson  <chris@chris-wilson.co.uk>

	Bug 345344 – Pattern matching is inefficient
	Bug 324246 – Performance degredation with large numbers of highlighted addresses/URLs

	* src/vte.c: (vte_terminal_match_check_internal):
		Trim the searched string down to the row containing the
		pointer. During a mutt session this drops the time consumed by
		regexec from ~30% to ~2%.
		Note: multi-line regexes are now unsupported!

Comment 18 Richard Hult 2007-02-07 09:54:58 UTC
> Note: multi-line regexes are now unsupported!

Does that mean that long urls that are wrapped are not detected properly now?
Comment 19 Chris Wilson 2007-02-07 10:08:28 UTC
Only accidentally ;-)
They should be fine since soft-wrapping does not insert a newline. (However as I forgot to scan forwards to find the next newline...)
Comment 20 Richard Hult 2007-02-07 10:09:33 UTC
Ah, great :) Thanks!
Comment 21 Chris Wilson 2007-02-07 10:20:24 UTC
Just to keep everybody happy with those long URLs..

r1646: 2007-02-07  Chris Wilson  <chris@chris-wilson.co.uk>

	* src/vte.c: (vte_terminal_match_check_internal):
		Scan for newlines before and after the current row
		to find the entire line in case of soft-wrapping.