Bug 650334 – Lag when inserting a character in an highlighted line

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 650334 - Lag when inserting a character in an highlighted line


Summary:	Lag when inserting a character in an highlighted line


Status:	RESOLVED NOTABUG

Product:	gtksourceview
Classification:	Platform
Component:	General
Version:	2.10.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	GTK Sourceview maintainers
QA Contact:	GTK Sourceview maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2011-05-16 18:14 UTC by Jean-Philippe Fleury
Modified:	2011-06-20 13:58 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Sample definition language file for testing purpose (1.21 KB, application/octet-stream) 2011-05-16 18:14 UTC, Jean-Philippe Fleury	Details
Profiles created with sysprof (32.44 KB, application/x-bzip) 2011-05-16 18:22 UTC, Jean-Philippe Fleury	Details
Little screencast showing the bug (91.65 KB, video/ogg) 2011-05-16 18:52 UTC, Jean-Philippe Fleury	Details

Description Jean-Philippe Fleury 2011-05-16 18:14:24 UTC

Created attachment 187923 [details]
Sample definition language file for testing purpose

I can always reproduce the following bug:

1) Copy the language definition file ("markdown-hr.lang") attached to this report in the folder "~/.local/share/gtksourceview-2.0/language-specs/".

2) Open gedit.

3) Choose "Markdown HR" in "View > Highlight Mode > Markup".

4) In the text area, add one Markdown horizontal rule, for example:

************************************************************************

Note that the rule is highlighted in green.

5) At the end of the rule, after the last asterisk, add a character, for example "a".

Actual result: it takes 1-2 seconds before the letter "a" appears and the horizontal rule looses highlighting.

More information: in Markdown, an horizontal rule can only be followed by spaces or tabs, so adding a character as "a" breaks horizontal rule highlighting. The problem here is the time needed by GtkSourceView to un-highlight the rule and to display the letter "a".

Comment 1 Jean-Philippe Fleury 2011-05-16 18:22:07 UTC

Created attachment 187924 [details]
Profiles created with sysprof

I attach 2 profiles generated by sysprof. Here are my steps:

- I installed the language definition file attached to this report.
- I opened gedit.
- I chose "Markdown HR" highlight mode.
- I added an horizontal rule of 80 asterisks.
- I started sysprof.
- I added a letter "a" at the end of the rule.
- I get the profile created by sysprof, and named it "profile-80.xml".
- In gedit, I cleared the text area and created a new rule of 200000 asterisks.
- I started sysprof.
- I added a letter "a" at the end of the rule.
- I get the profile created by sysprof, and named it "profile-200000.xml".

Comment 2 Jean-Philippe Fleury 2011-05-16 18:52:23 UTC

Created attachment 187925 [details]
Little screencast showing the bug

This screencast illustrates the bug. We note that asterisks are added normally. There's no lag. Then, if we add other characters than asterisks, spaces or tabs (letter "a" in the screencast), there's a lag of about 1 second for each character added.

Comment 3 José Aliste 2011-06-16 23:47:34 UTC

Well, for some reason (that was explained to me as backtracing the failure in the pcre library) the regex using here is very slow to match the 
"**************************************************a"

So you need to modify the regex. In the particular case of the testcase, 
I think you need to do something like 
"(^\*{3,})\s*$|^ {0,3}(\*+ {1,2}){2,}\*+\s*$"

Comment 4 Jean-Philippe Fleury 2011-06-20 13:58:02 UTC

Thanks to José Aliste on irc for discussion about this report.

So, backtracking can make some regex very slow, with an exponential complexity. For now, there's no benchmarking to automatically detect these cases, so we must deal with them while using language files.

A solution to these slow regex is to make use of atomic grouping or possessive quantifiers (linear complexity). See the following articles:

"Runaway Regular Expressions: Catastrophic Backtracking"
http://www.regular-expressions.info/catastrophic.html

"Atomic Grouping"
http://www.regular-expressions.info/atomic.html

"Possessive Quantifiers"
http://www.regular-expressions.info/possessive.html

*****

For this particular report, the following 2 regex are a lot faster:

^ {0,3}(\*+ {0,2}){3,}+[ \t]*$
^ {0,3}(\* {0,2}){3,}[ \t]*$