After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 546628 - incorrect URL highlighting
incorrect URL highlighting
Status: RESOLVED NOTABUG
Product: gnome-terminal
Classification: Core
Component: general
git master
Other All
: Normal normal
: ---
Assigned To: GNOME Terminal Maintainers
GNOME Terminal Maintainers
Depends on: 756038
Blocks:
 
 
Reported: 2008-08-06 17:03 UTC by Christian Persch
Modified: 2016-02-21 15:19 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Patch to expand the URL matching characters via PATHCHARS_CLASS (1.05 KB, patch)
2009-07-28 21:45 UTC, Rys Sommefeldt
none Details | Review

Description Christian Persch 2008-08-06 17:03:50 UTC
I cloned the bug so I could track the crash part (in gio) indepedently of fixing the URL regex in g-t.

+++ This bug was initially created as a clone of Bug #546625 +++

I've caught this one yesterday when packaging the latest 2.23.6 packages. In rpm files we have lines such as this one:

Source0: http://ftp.gnome.org/pub/GNOME/sources/libgnomeui/2.23/%{name}-%{version}.tar.bz2

In previous releases g-t would only highlight the part before first '%' sign, for example:

http://ftp.gnome.org/pub/GNOME/sources/libgnomeui/2.23/

Now in 2.23.6 (with vte 0.17.1) it highlights and tries to open:

http://ftp.gnome.org/pub/GNOME/sources/libgnomeui/2.23/%{

Which results in an immediate terminal crash (it is also an incorrect URL as % is an escape character and is supposed to be followed by a hexadecimal number). I tend to forget about this bug only to lose some data on an hourly basis :)

Anyway it's hard for me to provide a stack trace as I don't have any other terminal apps installed but it's rather easy to reproduce:

1) echo 'http://ftp.gnome.org/pub/GNOME/sources/libgnomeui/2.23/%{'
2) ctrl+click the output of the previous command
Comment 1 Rys Sommefeldt 2009-07-28 21:45:39 UTC
Created attachment 139428 [details] [review]
Patch to expand the URL matching characters via PATHCHARS_CLASS
Comment 2 Rys Sommefeldt 2009-07-28 21:46:33 UTC
I've expanded what PATHCHARS_CLASS contains to better support URLs pasted from
modern browsers that tend not to do percent encoding in the URL bar.

It's also common to see URLs like the following, with the pipe char, that the
URL matcher won't work with:

http://cgi.ebay.co.uk/Logitech-V550-Nano-4-Button-Wireless-Laser-Scroll-Mouse_W0QQitemZ270418908629QQcmdZViewItemQQptZUK_Computing_ComputerComponents_KeyboardsMice?hash=item3ef63915d5&_trksid=p3286.c0.m14&_trkparms=65%3A15|66%3A2|39%3A1|293%3A1|294%3A50

There's a patched attached that should apply against quite a long line of
terminal-screen.c revisions, but it's against current git HEAD.
Comment 3 Egmont Koblinger 2015-10-25 22:08:29 UTC
I'm working on this as part of bug 756038.

Matching '{' but not the rest is clearly a bug, already fixed there. It should either not match it, or match it and all subsequent characters. (Right now it doesn't match it, % is the last underlined character.)

I have a couple of questions though...

Should we allow a % that's not followed by 2 hex digits, or not? Firefox doesn't seem to mangle % signs I place in the URL (they become unchanged in the http traffic) and the remote site can do whatever it wants. I can easily imagine services that don't quite care about the standard and use % as a field separator or something like that.

Should we allow a { or not? It's an unsafe character that must be percent-encoded, and e.g. firefox does indeed encode it - but it happens behind the scenes, without changing the URL on the UI. In the location bar, it still contains {, so with this in mind, it's reasonable to expect that vte allows this form too.

In case of expected stop at any of these characters, should we just stop the URL there, or should we invalidate the whole URL (not highlight at all)?
Comment 4 Christian Persch 2015-10-27 19:57:53 UTC
If we do allow pure % and { then we need to process the matched string to turn it into a real URI... so I´d say at least for now, it´s ok not to allow these. And to stop the URL highlighting at these is better than not highlighting anzthing, IMHO.
Comment 5 Egmont Koblinger 2015-10-27 20:17:23 UTC
(In reply to Christian Persch from comment #4)
> If we do allow pure % and { then we need to process the matched string to
> turn it into a real URI...

I don't think we need to, I guess whoever's responsible for opening a URL handles this.

E.g.
$ firefox 'http://example.com/foo%bar{baz'

The URL bar shows http://example.com/foo%bar%7Bbaz; the request goes out as GET /foo%bar%7Bbaz HTTP/1.1

URL escaping sucks by design by the way. The whole point of escaping would be to allow to have whatever value in the user-facing places (e.g. browser URL bar) and do the escaping under the hood to preserve them under all circumstances. That's not the case with the way % is handled. It's allowed (and sometimes required) to be user-visible in the URL bar as the escaping char, and hence if the intent is a literal % then the user should see %25. It's really fu^H^Hmessed up. And, even worse, you can have a % not followed by two hex digits, browsers are okay with this, the interpretation is up to the remote server. I always hated it...

> And to stop the URL highlighting at these is better than not highlighting anzthing, IMHO.

This is another generic dilemma. Suppose the URL http://example.com:1234567. What to do?

(1) Highlight http://example.com:12345 as the longest prefix that makes sense;
(2) Highlight http://example.com since the port is invalid
(3) Don't highlight at all.

Probably it's a matter of taste with no single correct answer. I'm not in favor of (1), but couldn't decide between (2) or (3). I suspect (based on your preference on the issue above) that you wouldn't like (3). So probably (2) is the one best matching our preferences.
Comment 6 Christian Persch 2015-12-13 19:39:03 UTC
Yes, I'd say (2) is better than either 1 or 3.
Comment 7 Christian Persch 2016-02-21 15:19:30 UTC
Let's call this NOTABUG as per comment 3, 5.