GNOME Bugzilla – Bug 349445
Slightly better URL parsing in text msgs
Last modified: 2006-11-21 17:27:59 UTC
+++ This bug was initially created as a clone of Bug #327874 +++ Consider the portion of mail: <http://mysystem/MSDOS/'SHARED.PZ00.PROD.SUPERBATCH.ZIPLIB(Z00055R)'> The URL is actually: http://mysystem/MSDOS/'SHARED.PZ00.PROD.SUPERBATCH.ZIPLIB(Z00055R)' But Evolution only highlights: http://mysystem/MSDOS/'SHARED.PZ00.PROD.SUPERBATCH.ZIPLIB(Z00055R From RFC 1738: Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.
This part is for the highlighting problems when *receiving* mail.
As does bugzilla. :-)
In camel_url_web_end() in e-d-s/camel/camel-url-scanner.c: /* urls are extremely unlikely to end with any * punctuation, so strip any trailing * punctuation off. Also strip off any closing * braces or quotes. */ while (inptr > pos && strchr (",.:;?!-|)}]'\"", inptr[-1])) inptr--; That's obviously a wrong assumption in our case.
We should go back in the URL, and check whether there is a matching bracket (useful), or a matching single-quote (hmm). Could someone comment on this?
Checking for matching chars and including the closing ones if the opening equivalent is part of the URL sounds pretty save to me. Probably a good idea for all bracket and quote chars.
Created attachment 72697 [details] [review] camel-url-scanner-testing.patch For testing: Apply this patch to your camel version, and compile the following program, changing the location of camel-url-scanner.c to match your setup. ---8<--- // gcc -g -o test `pkg-config --libs --cflags glib-2.0` test.c #include <glib.h> #include <sys/types.h> #include <string.h> typedef struct { const char *pattern; const char *prefix; off_t um_so; off_t um_eo; } urlmatch_t; #include "/home/bnocera/Projects/Cvs/evolution-data-server/camel/camel-url-scanner.c" int main (int argc, char **argv) { char *in, *pos, *inend; urlmatch_t match; char *str = " (http://mysystem/MSDOS/'SHARED.PZ00.PROD.SUPERBATCH.ZIPLIB(Z00055R)'>) "; memset (&match, 0, sizeof(urlmatch_t)); match.pattern = "http://"; match.prefix = ""; in = str; pos = str + 2; inend = str + strlen (str); if (camel_url_web_start (in, pos, inend, &match) != FALSE) { #if 0 g_message ("!FALSE"); g_message ("in: %s", in); g_message ("pos: %s", pos); g_message ("inend: %s", inend); #endif } else { g_message ("FALSE"); return 1; } if (camel_url_web_end (in, pos, inend, &match) != FALSE) { #if 0 g_message ("!FALSE"); g_message ("in: %s", in); g_message ("pos: %s", pos); g_message ("inend: %s", inend); g_message ("from %llu to %llu", match.um_so, match.um_eo); #endif char *url = g_strndup (str + match.um_so, match.um_eo - match.um_so); g_print ("orig: %s\n", str); g_print ("url %s\n", url); } else { g_message ("FALSE"); return 1; } return 0; } ---8<---
Created attachment 72700 [details] [review] camel-url-scanner-better-ending.patch - Record the opening brace, if there's one - Finish off the string when we encounter the opening brace's match, not the first character that looks like it - don't remove "'" (single-quote), or ")" from the end of the URL as they are allowed, and will be removed by the matching above if extraneous (the other braces, and quotes aren't allowed unencoded in the URL by the RFC).
Patch looks good to me - however, let me go through it once more before taking it in. Thanks, Hadess.
Committed for 2.8.x and soon to be committed for HEAD. Thanks.