Bug 613378 – Improvement of the octave lang file

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 613378 - Improvement of the octave lang file


Summary:	Improvement of the octave lang file


Status:	RESOLVED DUPLICATE of bug 648494

Product:	gtksourceview
Classification:	Platform
Component:	Syntax files
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	GTK Sourceview maintainers
QA Contact:	GTK Sourceview maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2010-03-20 00:51 UTC by Carnë Draug
Modified:	2011-04-25 18:14 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Enhanced octave lang file (23.01 KB, application/octet-stream) 2010-03-20 00:51 UTC, Carnë Draug		Details
Yet another improvement on the octave.lang file (25.74 KB, application/xml) 2010-04-20 21:11 UTC, Carnë Draug		Details
Improved octave.lang with classes (25.92 KB, application/xml) 2010-04-21 20:50 UTC, Carnë Draug		Details
Improved octave.lang with even more classes (26.32 KB, text/plain) 2010-10-15 22:42 UTC, Carnë Draug		Details
Diff for the improved octave.lang with even more classes (28.27 KB, patch) 2010-10-15 22:44 UTC, Carnë Draug	none	Details \| Review

Description Carnë Draug 2010-03-20 00:51:26 UTC

Created attachment 156599 [details]
Enhanced octave lang file

Hi

I decided to pick the octave file and extend it. I asked input through the octave mailing list and also in the ubuntu forums (though the later I only did today and haven't got any answer).

The thread on the octave mailing list can be found at http://tinyurl.com/ygr8yj3
The thread in the ubuntu forums can be found at http://tinyurl.com/yzdks6l

Here's the list of changes I made

    * comments and continuation line
          o ... is now identified as a continuation line character
          o comments after the continuation line character do not disrupt it
          o continuation lines characters are ignored if they are between single quotes

    * shebang line
          o now it's defined by the default configuration as being such instead of just another comment

    * block comments
          o added #{ and #} to the list of possibilities for block comments
          o highlights correctly when block comments are nested

    * operators

          o now are highlighted (not only simple arithmetic operators, this includes element by element, transpose operator, autoincrement, assignment and logical operators)

    * functions
          o added a bunch of functions (got some input for one of the octave developers to avoid problems picking functions that may be removed from octave to avoid problems in the future)
          o just to make it look pretty I mixed some very similar functions into one such as "(a)?sin(d|h)?"
          o removed 'ans' from the list of functions and highlighted it as a variable

    * metadata
          o added start and end of block comments start to the list (this is the part that I REALLY don't if it works ok. I don't know any text editor that uses this so I had to guess)
          o changed default for line comment from % to # (as asked from one of octave developers)

    * pkg as preprocessor
          o if pkg is not called 'like a function' (i.e. pkg ("load",..)) it's highlighted as preprocessor (in similarity to the 'use' function in Perl)

    * constants/functions
          o functions such as Inf, pi, NaN which can be called with no arguments to return a constant value, are highlighted as constants in those situations but still as functions if followed by opening parenthesis

    * data types
          o the function handle character '@' is now highlighted

    * true/false as functions
          o are highlighted as boolean unless followed by opening parenthesis, in which case are highlighted as functions

    * keywords
          o added the keywords, persistent, replot, static, varargin, varargout
          o removed the keywords assert, nargin, nargout and moved them to be highlighted as functions

    * strings
          o added a printf regexp that should identify its formatting (copied it from the C.lang file)
          o added an escape regexp that should escape only some stuff, not whatever is right after an \ (also copied it from the C.lang file)

Comment 1 Ignacio Casal Quinteiro (nacho) 2010-03-20 01:02:40 UTC

The file looks pretty good for me, though could you please change the indentation to spaces and use 2 spaces?

This looks weird to me, you sure this is ok?
		<property name="block-comment-start">\n#{\n</property>
		<property name="block-comment-end">\n#}\n</property>

We are right now under hardcode till 30th of this month so if you could please change the indentation and attach the file, and also poke us after that date I would be glad on adding your changes to the repo.

Thanks a lot for the patch.

Comment 2 Carnë Draug 2010-03-20 01:12:30 UTC

(In reply to comment #1)
> The file looks pretty good for me, though could you please change the
> indentation to spaces and use 2 spaces?
> 
> This looks weird to me, you sure this is ok?
>         <property name="block-comment-start">\n#{\n</property>
>         <property name="block-comment-end">\n#}\n</property>

This is the only part I was not sure. I wrote in the description that I don't know a text editor that uses this so I couldn't try it. The thing is that in octave, these things have to be on their own lines. The regexp for them is ^(\s)*#{(\s)*$ but I thought that the metadata would only paste what's in there in the line so a regexp would make little sense. I couldn't find any information on the gtksourceview documentation about this either.

Comment 3 Carnë Draug 2010-04-01 22:01:02 UTC

I've been making more improvements to this lang file. There is however, 2 things that I don't know how to do.

1 - how to write in the metadata block that the start block comment character must be alone (other than whitespace) on its own line? I thought about \n#{\n and ^#{$ The first was told me to be wrong and the second does not make me a lot of sense.

2 - how to extend a simple context with a keyword context. What I want is to have '@' highlighted on its own. However, if it's followed by any of the functions names, those function names should be highlighted like '@' . It would be possible I guess to make a regex with all functions and use that id in both the functions keyword context, flanked by \b, and in the same as @ and mark it as optional. But wouldn't a regex with all the functions names be some trouble to mantain? (there's more than 500 and some of them are regex on its own)
I tried the following but it didn't work

<context id="function-handle" style-ref="data-type">
  <match>\b@([A-Za-z][A-Za-z0-9_]*)+</match>
  <include>
    <context subpattern="1" style-ref="data-type">
      <context ref="functions"/>
    </context>
  </include>
</context>

Comment 4 Carnë Draug 2010-04-20 21:11:01 UTC

Created attachment 159199 [details]
Yet another improvement on the octave.lang file

Comment 5 Ignacio Casal Quinteiro (nacho) 2010-04-20 21:30:09 UTC

Some comments:

You moved this to the beggining, we prefer it as it was at the end and you removed the class
+    <context id="octave">
+      <include>
+        <context ref="def:shebang"/>
....


This still looks suspicious to me, but if it is ok fine by me.
-    <property name="line-comment-start">%</property>
+    <property name="line-comment-start">#</property>
+    <property name="block-comment-start">#{</property>
+    <property name="block-comment-end">#}</property>


You removed the class stuff why?
-    <context id="line-comment" style-ref="comment" end-at-line-end="true" class="comment" class-disabled="no-spell-check">
+    <context id="line-comment" style-ref="comment" end-at-line-end="true">

Same in the strings.


Once you get this fixed, I'll continue the reviewing as it is too late right now :)

Comment 6 Carnë Draug 2010-04-20 22:04:53 UTC

(In reply to comment #5)
> This still looks suspicious to me, but if it is ok fine by me.
> -    <property name="line-comment-start">%</property>
> +    <property name="line-comment-start">#</property>
> +    <property name="block-comment-start">#{</property>
> +    <property name="block-comment-end">#}</property>

I changed from % to # because that's the default for octave (although octave can accept both % and # to make it easier for users using matlab code) while matlab accepts only the %. But I guess this was syntax highlight for octave not matlab.

The block comment starts with #{ and finishes with }#. The only thing is that these characters must be the only thing in the line other than whitespace. Should I have it written in other way?

> You removed the class stuff why?
> -    <context id="line-comment" style-ref="comment" end-at-line-end="true"
> class="comment" class-disabled="no-spell-check">
> +    <context id="line-comment" style-ref="comment" end-at-line-end="true">
> 
> Same in the strings.

I didn't meant to remove anything. I was editing the file I had installed which was from before the addition of these classes (which were only added this Christmas). I checked now, and other than that change I have all up to date.

Comment 7 jessevdk@gmail.com 2010-04-21 07:27:39 UTC

Please don't change the line comment to something which is not compatible with matlab. I for one, would be very annoyed by that change :) Since % will work in both, I see no reason to break that functionality for matlab files.

Comment 8 Carnë Draug 2010-04-21 08:07:42 UTC

(In reply to comment #7)
> Please don't change the line comment to something which is not compatible with
> matlab. I for one, would be very annoyed by that change :) Since % will work in
> both, I see no reason to break that functionality for matlab files.

It would still highlight comments made with both # and % that would not be broken.

It would, however, use # to comment and uncomment lines automatically. If the metadata block would be used only to automatically comment lines, then you'd be right in that there would be no reason to change it (other than it looks bad to have some lines with % and #). However, if you work in a project where everyone uses octave and # for comment, using the % for the automatic comments means you can no longer automatically uncomment code with #. This is, after all, the lang file for octave not for matlab (which isn't even free software) so I guess it makes sense to use #

And I do work in the mentioned circumstances, a project where everyone uses octave.

Comment 9 jessevdk@gmail.com 2010-04-21 08:13:17 UTC

That is all fine, but if you use matlab, you cannot use the functionality _at all_. Therefore, I think you have to go with the compromise. I don't think you can drop the 'free software' bomb and just ignore people that are using matlab.

Another solution would be to somehow make the separation between matlab and octave, the only problem is that they cannot be distinguished from each other by mime.

I'd like to hear what the other developers think about this. If I'm the only one that feels this way I guess we should make the change, and I'll just make local changes.

Comment 10 Paolo Borelli 2010-04-21 08:19:34 UTC

I agree with Jesse: one solution works for all with some drawbacks for some users, the other completely breaks the functionality for half of the users. (where incidentally "half" is probably an underestimation numerically).

Comment 11 Carnë Draug 2010-04-21 08:31:36 UTC

(In reply to comment #9)
> That is all fine, but if you use matlab, you cannot use the functionality _at
> all_. Therefore, I think you have to go with the compromise. I don't think you
> can drop the 'free software' bomb and just ignore people that are using matlab.

The way I see it, the octave lang file is a tool to highlight octave code (plus the fancy things from the metadata block). By using % instead #, you're  making it a worst tool for what it meant to do, in order to make it a better tool for something else.

> Another solution would be to somehow make the separation between matlab and
> octave, the only problem is that they cannot be distinguished from each other
> by mime.

I think you can have two different lang files, one for matlab and another for octave. They'll have the same extension but that already happens with Objective-C

> I'd like to hear what the other developers think about this. If I'm the only
> one that feels this way I guess we should make the change, and I'll just make
> local changes.

This is not a change, it's more changing back to it was previously (it used to be a # not a %, until last year)

Comment 12 jessevdk@gmail.com 2010-04-21 08:53:42 UTC

I think the file was always meant to highlight both languages because, apart from some minor details, they are basicly the same. So the argument for making it a worse tool for 'what is was meant to do' is not so strong.

Wrt my comment for mime, yes you can have two lang files, and yes, the files do share extensions, but we use mime types in these cases to be able to know which file is which. This works reasonable, even for Obj-C vs Octave, but for matlab vs Octave I don't think it will work at all. Which means that I will have to change the highlighting to matlab each and every time I open a .m file.

For your comment on change or not a change, let's not make the discussion about semantics.

Comment 13 Carnë Draug 2010-04-21 10:16:12 UTC

(In reply to comment #12)
> I think the file was always meant to highlight both languages because, apart
> from some minor details, they are basicly the same. So the argument for making
> it a worse tool for 'what is was meant to do' is not so strong.

Well, if it was meant for that, it was highlighting incorrectly matlab code which does not allow strings with double quotes for example (there's other differences). It's like saying that the file can be used to highlight perl because only the functions and keywords are different (there's obviously more differences in the syntax, but not many more in the syntax highlight). The changes I made to do the file make it even more different than matlab, highlighting functions that do not exist in matlab and escape characters and line continuation for double quoted strings only (octave does not escape nor continues the line if in single quotes).

> Wrt my comment for mime, yes you can have two lang files, and yes, the files do
> share extensions, but we use mime types in these cases to be able to know which
> file is which. This works reasonable, even for Obj-C vs Octave,

Pretty much every time I open a new .m file I have to change the highlight from Objective-C to Octave. I thought it had to do with the shebang and since function files in octave don't have it, they are not highlighted correctly.

> but for matlab
> vs Octave I don't think it will work at all. Which means that I will have to
> change the highlighting to matlab each and every time I open a .m file.

3 things:
1 - you would only have to change the highlight the first time you open the .m file. After that, I believe it remembers the last setting
2 - how can you talk about compromise when in order to save you from changing the highlight (one click and drag mouse per file), you want people in my situation do uncomment lines manually (many keyboards buttons per commented *line*)?
3 - you still wouldn't have to waste the time to move the mouse once because when in doubt, in goes by alphabetic order and Matlab would comes before Octave (unless they change it to GNU Octave which wouldn't be that incorrect).

> For your comment on change or not a change, let's not make the discussion about
> semantics.

I just didn't want to sound like the guy that comes out of nowhere, changing things that have always been that way. This thing, I'm changing to the way it had been for quite some time. I am tough, adding many more things to the file that but they don't break anything other than highlighting things that don't exist or are not allowed in matlab.


I'll also point out that this won't affect only people that rely on gtksourceview to comment and uncomment lines automatically. If ones start commenting lines with % in a project, other people, with text editors that highlight and comment octave code, won't be able to uncomment their lines as well.

Comment 14 Carnë Draug 2010-04-21 20:50:18 UTC

Created attachment 159282 [details]
Improved octave.lang with classes

Comment 15 Carnë Draug 2010-04-28 21:44:00 UTC

I was wondering about the status of this...

Comment 16 søren hauberg 2010-05-07 00:45:16 UTC

I am one of the original developers of the Octave syntax highlighting, and I must say I am surprised to read that this highlighter has changed status to a Matlab+Octave syntax highlighter. Yes, the languages are similar, but that does not mean they are the same. Octave supports syntax that Matlab doesn't support and vice versa.

Given this fact, I simply do not see why Octave and Matlab should be treated as being the same. If I was a Matlab user, I would find it very annoying that 1) I had to figure out to use "Octave" for highlighting (how would I figure that out?) and 2) words like "endfor" would get highlighted for no apparent reason.

Seeing how providing an Octave+Matlab syntax highlighter creates an inferior experience for Octave developers (as noted by Carnë) and an inferior experience for Matlab developers, I would split the two.

Basically, I would suggest having a common syntax file for the syntax shared by both Matlab and Octave. An Octave and a Matlab (and Scilab?) highlighter could then inherit from this shared syntax file. Would this be possible or would the shared syntax also appear as a language option in Gedit? The downside is that it would be hard to differentiate Octave and Matlab files based on mime info, but as was pointed out, this already is a problem because of a clash with Objective-C.

Comment 17 jessevdk@gmail.com 2010-05-07 08:31:03 UTC

I agree that splitting the two languages seems like the best way to go (I can't figure out any other satisfactory solution). As far as I understand, we can have hidden languages, so it should be possible to write common stuff in a separate language file, and then include the relevant contexts on both octave and matlab files.

On that note, it might be worth filing a bug for the freedesktop shared mime database because they define the octave mime-type as an alias for the matlab mime-type. They also define the magic header of that mime type to be either starting the file with '%' or with 'function', it might be worth having '#' for octave instead, so that we can do at least a little bit of mime detection.

Comment 18 Carnë Draug 2010-05-18 16:12:56 UTC

I noticed that nothing has been done about this yet. Is the comment thing between % and # the only problem?

Comment 19 Carnë Draug 2010-07-08 11:23:28 UTC

I'd like to know if there's any update on this. Even without the the metadata about comment start, there's a lot of upgrades that can be very useful.

Comment 20 Carnë Draug 2010-09-23 13:24:01 UTC

Is anyone even considering this improvement on the syntax file?

Comment 21 Carnë Draug 2010-10-15 22:42:48 UTC

Created attachment 172463 [details]
Improved octave.lang with even more classes

Comment 22 Carnë Draug 2010-10-15 22:44:08 UTC

Created attachment 172464 [details] [review]
Diff for the improved octave.lang with even more classes

I got the new octave.lang from git, and applied the changes again. I'm uploading both the diff file and a new octave.lang to be used. I've used it a lot, and I've also distributed it to other users who seem to be quite satisfied.

Hopefully someone will patch it.

Comment 23 Carnë Draug 2011-03-22 01:33:26 UTC

Anyone doing anything about this? It's reported more than an year ago.

Comment 24 Paolo Borelli 2011-04-25 17:47:43 UTC

I am assuming this is obsoleted by the newer lang file

*** This bug has been marked as a duplicate of bug 648494 ***

Comment 25 Carnë Draug 2011-04-25 18:14:50 UTC

(In reply to comment #24)
> I am assuming this is obsoleted by the newer lang file
> 
> *** This bug has been marked as a duplicate of bug 648494 ***

You assume correctly. Than you for resolving this