After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 349445 - Slightly better URL parsing in text msgs
Slightly better URL parsing in text msgs
Status: RESOLVED FIXED
Product: evolution-data-server
Classification: Platform
Component: Mailer
unspecified
Other Linux
: Normal normal
: ---
Assigned To: Veerapuram Varadhan
Evolution QA team
Depends on:
Blocks:
 
 
Reported: 2006-07-31 15:29 UTC by Bastien Nocera
Modified: 2006-11-21 17:27 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
camel-url-scanner-testing.patch (661 bytes, patch)
2006-09-13 13:17 UTC, Bastien Nocera
needs-work Details | Review
camel-url-scanner-better-ending.patch (2.50 KB, patch)
2006-09-13 13:55 UTC, Bastien Nocera
none Details | Review

Description Bastien Nocera 2006-07-31 15:29:29 UTC
+++ This bug was initially created as a clone of Bug #327874 +++

Consider the portion of mail:
 <http://mysystem/MSDOS/'SHARED.PZ00.PROD.SUPERBATCH.ZIPLIB(Z00055R)'>

The URL is actually:
http://mysystem/MSDOS/'SHARED.PZ00.PROD.SUPERBATCH.ZIPLIB(Z00055R)'
But Evolution only highlights:
http://mysystem/MSDOS/'SHARED.PZ00.PROD.SUPERBATCH.ZIPLIB(Z00055R

 From RFC 1738:
  Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
  reserved characters used for their reserved purposes may be used
  unencoded within a URL.
Comment 1 Bastien Nocera 2006-07-31 15:30:20 UTC
This part is for the highlighting problems when *receiving* mail.
Comment 2 Karsten Bräckelmann 2006-07-31 16:51:32 UTC
As does bugzilla. :-)
Comment 3 Bastien Nocera 2006-08-03 15:56:02 UTC
In camel_url_web_end() in e-d-s/camel/camel-url-scanner.c:

        /* urls are extremely unlikely to end with any
         * punctuation, so strip any trailing
         * punctuation off. Also strip off any closing
         * braces or quotes. */
        while (inptr > pos && strchr (",.:;?!-|)}]'\"", inptr[-1]))
                inptr--;

That's obviously a wrong assumption in our case.
Comment 4 Bastien Nocera 2006-08-03 16:01:54 UTC
We should go back in the URL, and check whether there is a matching bracket (useful), or a matching single-quote (hmm).

Could someone comment on this?
Comment 5 Karsten Bräckelmann 2006-08-03 16:40:16 UTC
Checking for matching chars and including the closing ones if the opening equivalent is part of the URL sounds pretty save to me.

Probably a good idea for all bracket and quote chars.
Comment 6 Bastien Nocera 2006-09-13 13:17:51 UTC
Created attachment 72697 [details] [review]
camel-url-scanner-testing.patch

For testing:
Apply this patch to your camel version, and compile the following program, changing the location of camel-url-scanner.c to match your setup.

---8<---
// gcc -g -o test `pkg-config --libs --cflags glib-2.0` test.c

#include <glib.h>
#include <sys/types.h>
#include <string.h>

typedef struct {
        const char *pattern;
        const char *prefix;
        off_t um_so;
        off_t um_eo;
} urlmatch_t;

#include "/home/bnocera/Projects/Cvs/evolution-data-server/camel/camel-url-scanner.c"

int main (int argc, char **argv)
{
        char *in, *pos, *inend;
        urlmatch_t match;
        char *str = " (http://mysystem/MSDOS/'SHARED.PZ00.PROD.SUPERBATCH.ZIPLIB(Z00055R)'>) ";

        memset (&match, 0, sizeof(urlmatch_t));
        match.pattern = "http://";
        match.prefix = "";
        in = str;
        pos = str + 2;
        inend = str + strlen (str);

        if (camel_url_web_start (in, pos, inend, &match) != FALSE) {
#if 0
                g_message ("!FALSE");
                g_message ("in: %s", in);
                g_message ("pos: %s", pos);
                g_message ("inend: %s", inend);
#endif
        } else {
                g_message ("FALSE");
                return 1;
        }

        if (camel_url_web_end (in, pos, inend, &match) != FALSE) {
#if 0
                g_message ("!FALSE");
                g_message ("in: %s", in);
                g_message ("pos: %s", pos);
                g_message ("inend: %s", inend);

                g_message ("from %llu to %llu", match.um_so, match.um_eo);
#endif
                char *url = g_strndup (str + match.um_so, match.um_eo - match.um_so);
                g_print ("orig: %s\n", str);
                g_print ("url   %s\n", url);
        } else {
                g_message ("FALSE");
                return 1;
        }

        return 0;
}
---8<---
Comment 7 Bastien Nocera 2006-09-13 13:55:40 UTC
Created attachment 72700 [details] [review]
camel-url-scanner-better-ending.patch

- Record the opening brace, if there's one
- Finish off the string when we encounter the opening brace's match, not the first character that looks like it
- don't remove "'" (single-quote), or ")" from the end of the URL as they are allowed, and will be removed by the matching above if extraneous (the other braces, and quotes aren't allowed unencoded in the URL by the RFC).
Comment 8 Veerapuram Varadhan 2006-11-13 15:17:06 UTC
Patch looks good to me - however, let me go through it once more before taking it in.  Thanks, Hadess.
Comment 9 Veerapuram Varadhan 2006-11-21 17:27:59 UTC
Committed for 2.8.x and soon to be committed for HEAD.  Thanks.