GNOME Bugzilla – Bug 650334
Lag when inserting a character in an highlighted line
Last modified: 2011-06-20 13:58:02 UTC
Created attachment 187923 [details] Sample definition language file for testing purpose I can always reproduce the following bug: 1) Copy the language definition file ("markdown-hr.lang") attached to this report in the folder "~/.local/share/gtksourceview-2.0/language-specs/". 2) Open gedit. 3) Choose "Markdown HR" in "View > Highlight Mode > Markup". 4) In the text area, add one Markdown horizontal rule, for example: ************************************************************************ Note that the rule is highlighted in green. 5) At the end of the rule, after the last asterisk, add a character, for example "a". Actual result: it takes 1-2 seconds before the letter "a" appears and the horizontal rule looses highlighting. More information: in Markdown, an horizontal rule can only be followed by spaces or tabs, so adding a character as "a" breaks horizontal rule highlighting. The problem here is the time needed by GtkSourceView to un-highlight the rule and to display the letter "a".
Created attachment 187924 [details] Profiles created with sysprof I attach 2 profiles generated by sysprof. Here are my steps: - I installed the language definition file attached to this report. - I opened gedit. - I chose "Markdown HR" highlight mode. - I added an horizontal rule of 80 asterisks. - I started sysprof. - I added a letter "a" at the end of the rule. - I get the profile created by sysprof, and named it "profile-80.xml". - In gedit, I cleared the text area and created a new rule of 200000 asterisks. - I started sysprof. - I added a letter "a" at the end of the rule. - I get the profile created by sysprof, and named it "profile-200000.xml".
Created attachment 187925 [details] Little screencast showing the bug This screencast illustrates the bug. We note that asterisks are added normally. There's no lag. Then, if we add other characters than asterisks, spaces or tabs (letter "a" in the screencast), there's a lag of about 1 second for each character added.
Well, for some reason (that was explained to me as backtracing the failure in the pcre library) the regex using here is very slow to match the "**************************************************a" So you need to modify the regex. In the particular case of the testcase, I think you need to do something like "(^\*{3,})\s*$|^ {0,3}(\*+ {1,2}){2,}\*+\s*$"
Thanks to José Aliste on irc for discussion about this report. So, backtracking can make some regex very slow, with an exponential complexity. For now, there's no benchmarking to automatically detect these cases, so we must deal with them while using language files. A solution to these slow regex is to make use of atomic grouping or possessive quantifiers (linear complexity). See the following articles: "Runaway Regular Expressions: Catastrophic Backtracking" http://www.regular-expressions.info/catastrophic.html "Atomic Grouping" http://www.regular-expressions.info/atomic.html "Possessive Quantifiers" http://www.regular-expressions.info/possessive.html ***** For this particular report, the following 2 regex are a lot faster: ^ {0,3}(\*+ {0,2}){3,}+[ \t]*$ ^ {0,3}(\* {0,2}){3,}[ \t]*$