GNOME Bugzilla – Bug 128605
Request support for "sub languages"
Last modified: 2007-07-06 20:35:45 UTC
A nifty idea I had was support for sub languages. Such would be implemented as one language being able to define a start-regex and end-regex, and specifying the name of the language to apply to the text contained within. This would be a major plus for all those composite languages out there: HTML/XML->PHP, HTML/XML->JSP, HTML/XML->vbscript HTML/XML-><script> tag->javaScript etc. This might also be accomplished cleanly through the use of additive/overriding languages. A JSP language defination could "override" the HTML language, adding additional definations, to trap for <% %>, and then applying the Java language to everything within. Mostly brainstorming!
SharpDevelop (a .net IDE) has a great system for declaring syntax highlighting for this purpose. Its parser and files are GPL, so they could be used. Basically, it allows you to declare: <Span name="ScriptTag" rule="JavaScriptSet" bold="false" italic="false" color="SpringGreen" stopateol="false"> <Begin><script></Begin> <End></script></End> </Span> <RuleSet name="JavaScriptSet" reference="JavaScript" /> So, what this means is that: "From the region <scropt> to </script>, use the ruleset `javascriptset' the ruleset `javascriptset' is defined as the rules in JavaScript.lang" The ruleset concept allows for a lot of flexability.
This feature would also be good for 'literate' scripts, where you embed code in comments rather than the other way around, eg code inside latex \begin{code} \end{code} brackets. Haskell supports this and another style; where code lines have to start with "> " all other lines are comments. I coule imagine doing this either by having a named environment inside a start and end regex, or by putting rules inside an ordinary <syntax-item> or <pattern-item> tag, eg: <syntax-item _name="foo" style="bar"> <start-regex> ... </start-regex> <end-regex> ... </end-regex> <pattern-item> ... </pattern-item> <keyword-list> ... </keyword-list> etc... </nested-environment> You would want to impose a limit on nesting depth because the nesting has a performance impact. I believe NEdit limits you to 3 or 4. But more than two is rairly necessary. Nesting in general could provide a way to indicate prioirty & scope of matches, eg a <keyword-list> inside a <pattern-item> could mean highlight as this pattern item unless it matches one of these keywords in which case colour as a keyword, eg: <pattern-item _name="Operator" style="Operator"> <regex>[:!#$%&*+./>=<?@\\^|~-]+</regex> <keyword-list _name="Reserved Operator" style="Keyword"> <keyword>::</keyword> <keyword>=</keyword> ... etc </keyword-list> </pettern-item>
Finally fixed in gtksourceview 2! It took less than 4 years :)