GNOME Bugzilla – Bug 545333
implement workaround for badly quoted addresses
Last modified: 2010-11-18 21:48:51 UTC
Please describe the problem: When dealing with a header field such as: > To: 'wrong@gmail.cam' <right@gmail.com> the value.addr field of the InternetAddress gets the wrong value: > 'wrong@gmail.cam' Steps to reproduce: 1. Parse a mail message containing an internet address whose display name is similar to an internet address (i.e. it contains a @ character) 2. Print the parsed field (i.e. using internet_address_list_to_string) 3. You'll get the display name instead of the internet address Actual results: I get: 'wrong@gmail.cam' Expected results: I would expect: right@gmail.com (or "'wrong@gmail.cam' <right@gmail.com>" using using internet_address_list_to_string) Does this happen every time? Yes Other information:
the name component of an email address is supposed to be either an atom or a quoted-string. the parsing of your example is likely as follows (I haven't tested to be 100% sure, but assuming the code works the way I remember it...): token: 'wrong token: @ token: gmail token: . token: com' since a list of word tokens is followed by an @, it is interpreted as a simple mailbox addr-spec token (as it should be according to the spec). the single-quote token is not a special character. Also note that '@' is illegal as part of a word token unless part of a quoted-string (which it is not in your example). here are the relevant BNF grammar token definitions: address = mailbox ; one addressee / group ; named list addr-spec = local-part "@" domain ; global address atom = 1*<any CHAR except specials, SPACE and CTLs> domain = sub-domain *("." sub-domain) domain-literal = "[" *(dtext / quoted-pair) "]" domain-ref = atom ; symbolic reference dtext = <any CHAR excluding "[", ; => may be folded "]", "\" & CR, & including linear-white-space> local-part = word *("." word) ; uninterpreted ; case-preserved mailbox = addr-spec ; simple address / phrase route-addr ; name & addr-spec phrase = 1*word ; Sequence of words route-addr = "<" [route] addr-spec ">" qtext = <any CHAR excepting <">, ; => may be folded "\" & CR, and including linear-white-space> quoted-pair = "\" CHAR ; may quote any char quoted-string = <"> *(qtext/quoted-pair) <">; Regular qtext or ; quoted chars. specials = "(" / ")" / "<" / ">" / "@" ; Must be in quoted- / "," / ";" / ":" / "\" / <"> ; string, to use / "." / "[" / "]" ; within a word. sub-domain = domain-ref / domain-literal word = atom / quoted-string That said... I'll look at seeing if it'd be too much trouble to work around this kind of brokenness when I get a chance, so leaving this report open for now.
I also received a mail with such a broken address. The To: field was To: <xxxxx@gmail.com>, escalade.orsay <escalade-orsay@googlegroups.com> It is a mail produced by MS Exchange 6.5. Gmime returns an InternetAddressMailbox with addr = "escalade.orsay" Would it be possible/difficult to change GMime to allow parsing broken email clients? I use the notmuch mail client and this parsing is necessary to find emails using searching for email addresses. I guess broken email clients still use < > as their delimiter for the email address part, but forget to use quotes for some special characters like .
which version of GMime are you using? I implemented a fair bit of logic to try and handle these sorts of cases in version 2.4.18 (latest is 2.4.20).
I used 2.4.11. I confirm that this bug is solved in 2.4.20. Unfortunately, the last version packed by Debian is 2.4.14...
cool, glad 2.4.20 fixes this. thanks for confirming the fixes!