GNOME Bugzilla – Bug 744993
Syntax highlighting: once-only has not the expected behavior
Last modified: 2021-07-05 11:01:48 UTC
The following component of a syntax file fails to work: <context id="onechar" once-only="true" extend-parent="false"> <!-- bug! --> <include> <context style-ref="char-escape"> <!-- once-only here is fine --> <match>\%{escape-sequence}</match> </context> <context style-ref="character"> <!-- but this allows both to match once each --> <match>.</match> </context> </include> </context> From the language definition specs, onechar should be a valid container context (<start> omitted, id given, <include> given). There is no indication that once-only is not usable. A valid workaround is <context once-only="true" extend-parent="false"> <match>(?<esc>\%{escape-sequence})|(?<reg>.)</match> <include> <context sub-pattern="esc" style-ref="char-escape"/> <context sub-pattern="reg" style-ref="character"/> </include> </context> but this is not extensible when a combined regex is not writable.
Is extend-parent relevant for this bug? The doc should maybe better explain the once-only, it is indeed not clear. But if you look at this code: https://git.gnome.org/browse/gtksourceview/tree/data/language-specs/docbook.lang#n41 You see a keyword context with the once-only attribute. It is quite clear that each keyword can appear once. So I guess it's the same for a container context. Some other lang files in GtkSourceView have already a container context with once-only, so if we change the behavior of once-only it'll break some lang files. > but this is not extensible when a combined regex is not writable. You mean readable? All regexes are writable, but some of them are unfortunately write-only ;-) Nothing prevents you from writing a multi-line regex with comments, indentation, etc to make it clearer. Or is there another limitation?
(In reply to Sébastien Wilmet from comment #1) > Is extend-parent relevant for this bug? Not necessarily, I was just writing a "C-style character" match, with a match-everything else error catch. The problem was that using once-only on the outer container causes gtksourceview to fail completely (no highlighting at all). > > The doc should maybe better explain the once-only, it is indeed not clear. > > But if you look at this code: > https://git.gnome.org/browse/gtksourceview/tree/data/language-specs/docbook. > lang#n41 > > You see a keyword context with the once-only attribute. It is quite clear > that each keyword can appear once. So I guess it's the same for a container > context. Some other lang files in GtkSourceView have already a container > context with once-only, so if we change the behavior of once-only it'll > break some lang files. Noted, so yes I think the documentation for once-only should be clarified. > > > but this is not extensible when a combined regex is not writable. > > You mean readable? All regexes are writable, but some of them are > unfortunately write-only ;-) Nothing prevents you from writing a multi-line > regex with comments, indentation, etc to make it clearer. Or is there > another limitation? There is indeed another limitation. Because regexes will only describe regular languages, while the container classes describe recursive (CFG-type) structures, the following is impossible: Inside " " match EXACTLY ONE OF 1. properly-nested { } braces (arbitrary text inside) 2. properly-nested ( ) braces (arbitrary text inside) This use example is for special-delimiter strings. (Think sed.) The only current way to do this is to split the cases and copy-paste the surrounding context (the begin/end " pair), which is not maintainable/extensible at all.
Ok, I see. With a little more thoughts, the expected behavior for once-only is indeed to match the context only one time, regardless of the "sub-context" found. For a keyword context it would not be convenient to add the once-only attribute for each keyword, but it would be more explicit. For a container context, the current behavior could be easily achieved by adding once-only to each included context, like this: <context id="onechar" extend-parent="false"> <include> <context style-ref="char-escape" once-only="true"> <match>\%{escape-sequence}</match> </context> <context style-ref="character" once-only="true"> <match>.</match> </context> </include> </context> This is currently equivalent (I suppose) to having the once-only only for 'onechar'. So for GtkSourceView 4 it would be possible to break the compatibility and fix once-only. For consistency, a once-only attribute would need to be added to the <keyword> tag. In the meantime, without breaking compatibility, an 'or' boolean attribute could be added ('or', or any other better name). But the ContextEngine class is quite complex, so it involves some non-negligible work…
Once-only is now better documented: https://git.gnome.org/browse/gtksourceview/commit/?id=61b6cef807ec5402f32a16c4265fcefe973a326c
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/gtksourceview/-/issues/ Thank you for your understanding and your help.