GNOME Bugzilla – Bug 546628
incorrect URL highlighting
Last modified: 2016-02-21 15:19:30 UTC
I cloned the bug so I could track the crash part (in gio) indepedently of fixing the URL regex in g-t. +++ This bug was initially created as a clone of Bug #546625 +++ I've caught this one yesterday when packaging the latest 2.23.6 packages. In rpm files we have lines such as this one: Source0: http://ftp.gnome.org/pub/GNOME/sources/libgnomeui/2.23/%{name}-%{version}.tar.bz2 In previous releases g-t would only highlight the part before first '%' sign, for example: http://ftp.gnome.org/pub/GNOME/sources/libgnomeui/2.23/ Now in 2.23.6 (with vte 0.17.1) it highlights and tries to open: http://ftp.gnome.org/pub/GNOME/sources/libgnomeui/2.23/%{ Which results in an immediate terminal crash (it is also an incorrect URL as % is an escape character and is supposed to be followed by a hexadecimal number). I tend to forget about this bug only to lose some data on an hourly basis :) Anyway it's hard for me to provide a stack trace as I don't have any other terminal apps installed but it's rather easy to reproduce: 1) echo 'http://ftp.gnome.org/pub/GNOME/sources/libgnomeui/2.23/%{' 2) ctrl+click the output of the previous command
Created attachment 139428 [details] [review] Patch to expand the URL matching characters via PATHCHARS_CLASS
I've expanded what PATHCHARS_CLASS contains to better support URLs pasted from modern browsers that tend not to do percent encoding in the URL bar. It's also common to see URLs like the following, with the pipe char, that the URL matcher won't work with: http://cgi.ebay.co.uk/Logitech-V550-Nano-4-Button-Wireless-Laser-Scroll-Mouse_W0QQitemZ270418908629QQcmdZViewItemQQptZUK_Computing_ComputerComponents_KeyboardsMice?hash=item3ef63915d5&_trksid=p3286.c0.m14&_trkparms=65%3A15|66%3A2|39%3A1|293%3A1|294%3A50 There's a patched attached that should apply against quite a long line of terminal-screen.c revisions, but it's against current git HEAD.
I'm working on this as part of bug 756038. Matching '{' but not the rest is clearly a bug, already fixed there. It should either not match it, or match it and all subsequent characters. (Right now it doesn't match it, % is the last underlined character.) I have a couple of questions though... Should we allow a % that's not followed by 2 hex digits, or not? Firefox doesn't seem to mangle % signs I place in the URL (they become unchanged in the http traffic) and the remote site can do whatever it wants. I can easily imagine services that don't quite care about the standard and use % as a field separator or something like that. Should we allow a { or not? It's an unsafe character that must be percent-encoded, and e.g. firefox does indeed encode it - but it happens behind the scenes, without changing the URL on the UI. In the location bar, it still contains {, so with this in mind, it's reasonable to expect that vte allows this form too. In case of expected stop at any of these characters, should we just stop the URL there, or should we invalidate the whole URL (not highlight at all)?
If we do allow pure % and { then we need to process the matched string to turn it into a real URI... so I´d say at least for now, it´s ok not to allow these. And to stop the URL highlighting at these is better than not highlighting anzthing, IMHO.
(In reply to Christian Persch from comment #4) > If we do allow pure % and { then we need to process the matched string to > turn it into a real URI... I don't think we need to, I guess whoever's responsible for opening a URL handles this. E.g. $ firefox 'http://example.com/foo%bar{baz' The URL bar shows http://example.com/foo%bar%7Bbaz; the request goes out as GET /foo%bar%7Bbaz HTTP/1.1 URL escaping sucks by design by the way. The whole point of escaping would be to allow to have whatever value in the user-facing places (e.g. browser URL bar) and do the escaping under the hood to preserve them under all circumstances. That's not the case with the way % is handled. It's allowed (and sometimes required) to be user-visible in the URL bar as the escaping char, and hence if the intent is a literal % then the user should see %25. It's really fu^H^Hmessed up. And, even worse, you can have a % not followed by two hex digits, browsers are okay with this, the interpretation is up to the remote server. I always hated it... > And to stop the URL highlighting at these is better than not highlighting anzthing, IMHO. This is another generic dilemma. Suppose the URL http://example.com:1234567. What to do? (1) Highlight http://example.com:12345 as the longest prefix that makes sense; (2) Highlight http://example.com since the port is invalid (3) Don't highlight at all. Probably it's a matter of taste with no single correct answer. I'm not in favor of (1), but couldn't decide between (2) or (3). I suspect (based on your preference on the issue above) that you wouldn't like (3). So probably (2) is the one best matching our preferences.
Yes, I'd say (2) is better than either 1 or 3.
Let's call this NOTABUG as per comment 3, 5.