GNOME Bugzilla – Bug 316835
Syntax highlighting not correct with LF instead of CR
Last modified: 2014-02-15 12:53:14 UTC
Please describe the problem: After a single line comment (//) all following lines are "commnents" if the file uses 0x0D as end of line instead of 0x0A. Only the syntax highlighting is wrong. The presentation of the text itself is correct. Steps to reproduce: 1. Load a java file with LF instead of CR as end of line Actual results: All lines after a single line comment are marked as comment. Expected results: The lines after a single line comment are highlighted as java code syntax. Does this happen every time? Yes. Other information: All of my GNOME components are from fedora development repository and have the version 2.12.
Syntax Highlghting is provided by gtksourceview -> moving the bug to it. Does the problem happens also with C and other languages? I can't see anything particular in the java.lang synatx file...
This happens also with C source code file (and I suppose also with other languages).
The gtksourceview version is gtksourceview-1.4.1-1
Probably this bug is not specific to the Java language. In gtksourcetag.c we explicit use \n as line terminator.
*** Bug 340354 has been marked as a duplicate of this bug. ***
*** Bug 343910 has been marked as a duplicate of this bug. ***
I have tried to fix this bug without success. As I said in comment #4, gtksourcetag.c explicitly use \n as line terminator. But replacing it with "$" or with [\n\r] didn't solve the problem. One of the problems is that regex does not match "\r" when using $ and we are using "$" in a lot of .lang files. Furthermore some .lang files also use "\n". BTW, fixing all the "\n" I have found (at least related to c.lang and core code) did not solve the problem, so I gave up. We will try to investigate again this problem when releasing the new engine. May be PCRE will help us.
From new engine code: /* Line terminator characters (\n, \r, \r\n, or unicode paragraph separator) * are removed from the line text. The problem is that pcre does not understand * arbitrary line terminators, so $ in pcre means (?=\n) (not quite, it's also * end of matched string), while we really need "((?=\r\n)|(?=[\r\n])|(?=\xE2\x80\xA9)|$)". * It could be worked around by replacing line terminator in matched text with * \n, but it's a good source of errors, since offsets (not all, unfortunately) returned * from pcre need to be compared to line length, and adjusted when necessary. * Not using line terminator only means that \n can't be in patterns, it's not a * big deal: line end can't be highlighted anyway; if a rule needs to match it, it can * can use "$" as start and "^" as end. */ An example is trailing backslash rule: <context id="line-continue"> <start>\\$</start> <end>^</end> </context> Using '\n' in lang files is broken (and simply won't work).
(In reply to comment #8) > From new engine code: > > /* Line terminator characters (\n, \r, \r\n, or unicode paragraph separator) > * are removed from the line text. Need to make sure that \r followed by \n in pathological cases (line ending with \r, next line ending with \n, deleting second line body) is handled right.
If some lang file uses "\n" explicitely, then it's a lang file bug which should be fixed (this one, java, is fine).