After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 565524 - HTML and PHP syntax coloring fails
HTML and PHP syntax coloring fails
Status: RESOLVED FIXED
Product: gtksourceview
Classification: Platform
Component: Syntax files
git master
Other All
: Normal minor
: ---
Assigned To: Gedit maintainers
Gedit maintainers
: 579683 608027 614022 617950 674212 723169 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2008-12-24 00:17 UTC by Mateus
Modified: 2014-02-09 13:01 UTC
See Also:
GNOME target: ---
GNOME version: 2.23/2.24


Attachments
Alternate lang file for html highlighting (6.14 KB, text/plain)
2008-12-26 15:17 UTC, Mateus
  Details
Patch (unified context) for the alternate language file (550 bytes, patch)
2011-08-27 18:20 UTC, Jean-Philippe Fleury
reviewed Details | Review
Same patch, but with more context (2.43 KB, patch)
2011-08-27 18:21 UTC, Jean-Philippe Fleury
reviewed Details | Review
Screenshot of highlighting (18.61 KB, image/png)
2011-08-28 19:15 UTC, Jean-Philippe Fleury
  Details

Description Mateus 2008-12-24 00:17:25 UTC
Please describe the problem:
When you type HTML code combined with PHP, the syntax coloring doesn't work properly. As far as I tested, it fails in the closing parenthesis of a PHP function inside an HTML attribute. It doesn't matter whether you choose to highlight PHP or HTML.

Steps to reproduce:
1. Turn on syntax highlighting for HTML or PHP
2. Type <img src="<?php echo $vector["n"]; ?>" />


Actual results:
The rightmost "]; ?>" /> are shown in the wrong colour.

Expected results:
They showed up with the right colours. At least, the whole thing in the standard orange of the things between "" or the PHP colours.

Does this happen every time?
Yes.

Other information:
Colours also go awry when you throw some Javascript functions into the mix. But this is the simplest example I could come up with, and the bug is always related to the HTML attributes.

Also, when I use simple quotes instead of double quotes, the whole sentence is orange. I guess this is the expected behaviour.
Comment 1 Mateus 2008-12-26 15:14:19 UTC
With a bit more research, I've found the bug's origins, in the syntax files of gtksourceview, and changed the bug accordingly.

It turns out that the root of the problem is with HTML not supporting nested quotes in attributes. It is specifically forbidden by the <a href="http://www.w3.org/html/wg/markup-spec/#syntax-attr-double-quoted">w3c</a>.

But, it is perfectly legal to use them inside a <?php ?>, as after processing they will disappear.

The solution I've found is just to alter the ending of quoted attributes to    <end>\"(?=\s|/?&gt;)</end>, as I've done in the attached file.

The ideal would be to make this change conditional to the use of the PHP (or javascript, AFAIK) lang files, but I don't know if this is possible or how.
Comment 2 Mateus 2008-12-26 15:17:37 UTC
Created attachment 125348 [details]
Alternate lang file for html highlighting

As I don't trust xml diffs, I'm attaching the whole file.
Comment 3 Michael Martin-Smucker 2011-05-12 20:20:37 UTC
*** Bug 614022 has been marked as a duplicate of this bug. ***
Comment 4 Ignacio Casal Quinteiro (nacho) 2011-05-13 17:42:59 UTC
*** Bug 608027 has been marked as a duplicate of this bug. ***
Comment 5 Carnë Draug 2011-05-16 17:07:43 UTC
*** Bug 579683 has been marked as a duplicate of this bug. ***
Comment 6 Jean-Philippe Fleury 2011-08-27 18:20:33 UTC
Created attachment 194936 [details] [review]
Patch (unified context) for the alternate language file

I created a patch (unified context) to see modifications proposed on the alternate language file attached to this report. As source file, I used html.lang from GtkSourceView 2.4.1 (01-Nov-2008), the last version when this report was created.
Comment 7 Jean-Philippe Fleury 2011-08-27 18:21:58 UTC
Created attachment 194937 [details] [review]
Same patch, but with more context

Same patch, but with a context of 10 lines.
Comment 8 Jean-Philippe Fleury 2011-08-27 18:23:31 UTC
I created patches from the alternate language file.

So, is this old proposition acceptable?
Comment 9 Paolo Borelli 2011-08-27 20:31:21 UTC
That's not the correct fix: since it hardcodes php's "?>" syntax in html.lang. 

The quoted attribute context already contains the "embedded-lang-hook", but the problem is that probably the closing " of the attribute takes precedence and the hook is only matched inside the string itself.

Can you see if using the special gtksourceview attribute extend-parent="true" helps?
Comment 10 Jean-Philippe Fleury 2011-08-28 19:15:55 UTC
Created attachment 194986 [details]
Screenshot of highlighting

(In reply to comment #9)
> Can you see if using the special gtksourceview attribute extend-parent="true"
> helps?

To be sure, I tried for a few contexts, but there was no visible change, except for the context 

<context id="string" extend-parent="false"

changed for

<context id="string" extend-parent="true"

but it doesn't solve the problem. See the attached screenshot.
Comment 11 h7e 2013-01-20 03:13:15 UTC
*** Bug 674212 has been marked as a duplicate of this bug. ***
Comment 12 teo (Account deactivated) 2013-01-20 11:11:37 UTC
I don't know much (pardon, anything) about syntax parsing, so I'm not sure what I'm going to say really makes sense, but if it does, it is meant to be a suggestion.

Here there are two different language being parsed: HTML, and PHP, right? So I guess there is a definition for HTML and a definition for PHP, and when you are inside "<?php" and ">?", PHP parsing comes into play, right? Is it more or less like that?

If that is the case, it should be reverted. Actually HTML should be considered to be "inside" PHP, and not viceversa, because THAT is how it is parsed by the PHP interpreter. "<?php" and "?>" should have the highest priority over everything else (being "?>" an opening and "<?php" a closing, except there is an implicit opening at the beginning of file) and only when you are "inside" HTML then HTML syntax should be taken into account. The HTML part is like a string inside PHP.

(of course when I say "<php" and "?>" I mean them and whatever aliases they have).

Otherwise, you will fix this particular case and there will be others.

This also means that a syntax error in the HTML should never break the syntax highlight in the PHP part, like a syntax error in HTML will NEVER cause a parse error in the PHP interpreter. If you have valid PHP with broken HTML, you should see broken syntax highlight only in the part outside the <?php and ?>, and the part inside should be correctly highlighted.

But of course as I said I may be misunderstanding the whole mechanism behind all this. The point anyway is that the parsing done by the the editor should always match perfectly the parsing done by (a correct implementation of) the interpreter. And in the case of PHP+HTML, PHP is parsed before.
Comment 13 Sébastien Wilmet 2013-01-20 13:19:13 UTC
*** Bug 617950 has been marked as a duplicate of this bug. ***
Comment 14 Sébastien Wilmet 2014-02-03 10:27:29 UTC
*** Bug 723169 has been marked as a duplicate of this bug. ***
Comment 15 Sébastien Wilmet 2014-02-05 15:39:13 UTC
matteo, your comment makes sense. It should maybe be fixed in GtkSourceView.
Comment 16 Sébastien Wilmet 2014-02-08 19:15:03 UTC
The particular problem is fixed:
https://git.gnome.org/browse/gtksourceview/commit/?id=7433cc6663d9d55682709d4fca1c85d2692b9156

(In reply to comment #12)
If we give PHP a higher priority, the PHP sections must be ignored when highlighting the HTML. This is currently not easily feasible I think.
Comment 17 teo (Account deactivated) 2014-02-09 03:48:11 UTC
If that's true, it highlights a weakness of whatever system is currently used for syntax highlight.
Syntax highlight should follow the same logig as the parser of the language being highlighted: that's the only correct, sensible syntax highlight. 
If the current way of defining grammars for syntax highlight doesn't allow that, then the current way of defining grammars for syntax highlight has a design flaw.

One tollerable fallback (while we wait for a syntax highlight algorithm that parses the text the same way as the parser that is going to interpret it), certainly better than the current behavior (before the fix which I haven't tried) would be to have:
- php code inside html attributes => all of the same color as the attribute
- php code inside a javascript string => all of the same color as the string

but the case remains of php code inside javascript but outside any javascript string, as in:

 <script>
   var myJSIntVar=<?php echo $myPHPIntVar; ?>;
 </script>

The only right syntax highlight for that is to highlight php as php, which is probably "not easily feasible" I guess (not in ALL cases, I guess), but I can't think of an acceptable alternative that would be ok for such a situation.

Currently, that usually results in php code being all black, with php strings highlighted as js string, which in most real life cases is almost fine. I think it would not be difficult to come up with an example where the highlight would become quite screwed up, though...
Comment 18 Sébastien Wilmet 2014-02-09 13:01:51 UTC
I've filed bug #723956 so we will not forget the problem.

Note that for PHP, HTML etc the Bluefish application is probably better, but it doesn't use GtkSourceView (but it uses GtkTextView, the GTK+ widget).
http://bluefish.openoffice.nl/index.html