After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 545333 - implement workaround for badly quoted addresses
implement workaround for badly quoted addresses
Status: RESOLVED FIXED
Product: gmime
Classification: Other
Component: general
2.2.x
Other All
: Normal minor
: ---
Assigned To: Jeffrey Stedfast
Jeffrey Stedfast
Depends on:
Blocks:
 
 
Reported: 2008-07-29 14:45 UTC by Danilo Sartori
Modified: 2010-11-18 21:48 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Danilo Sartori 2008-07-29 14:45:55 UTC
Please describe the problem:
When dealing with a header field such as:
> To: 'wrong@gmail.cam' <right@gmail.com>
the value.addr field of the InternetAddress gets the wrong value:
> 'wrong@gmail.cam'


Steps to reproduce:
1. Parse a mail message containing an internet address whose display name is similar to an internet address (i.e. it contains a @ character)
2. Print the parsed field (i.e. using internet_address_list_to_string)
3. You'll get the display name instead of the internet address


Actual results:
I get: 'wrong@gmail.cam'

Expected results:
I would expect: right@gmail.com (or "'wrong@gmail.cam' <right@gmail.com>" using using internet_address_list_to_string)

Does this happen every time?
Yes

Other information:
Comment 1 Jeffrey Stedfast 2008-07-29 15:53:42 UTC
the name component of an email address is supposed to be either an atom or a quoted-string.

the parsing of your example is likely as follows (I haven't tested to be 100% sure, but assuming the code works the way I remember it...):

token: 'wrong
token: @
token: gmail
token: .
token: com'

since a list of word tokens is followed by an @, it is interpreted as a simple mailbox addr-spec token (as it should be according to the spec).

the single-quote token is not a special character. Also note that '@' is illegal as part of a word token unless part of a quoted-string (which it is not in your example).


here are the relevant BNF grammar token definitions:

     address     =  mailbox                      ; one addressee
                 /  group                        ; named list
     addr-spec   =  local-part "@" domain        ; global address
     atom        =  1*<any CHAR except specials, SPACE and CTLs>
     domain      =  sub-domain *("." sub-domain)
     domain-literal =  "[" *(dtext / quoted-pair) "]"
     domain-ref  =  atom                         ; symbolic reference
     dtext       =  <any CHAR excluding "[",     ; => may be folded
                     "]", "\" & CR, & including
                     linear-white-space>
     local-part  =  word *("." word)             ; uninterpreted
                                                 ; case-preserved
     mailbox     =  addr-spec                    ; simple address
                 /  phrase route-addr            ; name & addr-spec
     phrase      =  1*word                       ; Sequence of words
     route-addr  =  "<" [route] addr-spec ">"
     qtext       =  <any CHAR excepting <">,     ; => may be folded
                     "\" & CR, and including
                     linear-white-space>
     quoted-pair =  "\" CHAR                     ; may quote any char
     quoted-string = <"> *(qtext/quoted-pair) <">; Regular qtext or
                                                 ;   quoted chars.

     specials    =  "(" / ")" / "<" / ">" / "@"  ; Must be in quoted-
                 /  "," / ";" / ":" / "\" / <">  ;  string, to use
                 /  "." / "[" / "]"              ;  within a word.
     sub-domain  =  domain-ref / domain-literal
     word        =  atom / quoted-string


That said... I'll look at seeing if it'd be too much trouble to work around this kind of brokenness when I get a chance, so leaving this report open for now.
Comment 2 racin 2010-11-17 21:50:12 UTC
I also received a mail with such a broken address. The To: field was
To: <xxxxx@gmail.com>, escalade.orsay <escalade-orsay@googlegroups.com>
It is a mail produced by MS Exchange 6.5. Gmime returns an     InternetAddressMailbox with addr = "escalade.orsay" 

Would it be possible/difficult to change GMime to allow parsing broken email clients? I use the notmuch mail client and this parsing is necessary to find emails using searching for email addresses. I guess broken email clients still use < > as their delimiter for the email address part, but forget to use quotes for some special characters like .
Comment 3 Jeffrey Stedfast 2010-11-17 22:01:26 UTC
which version of GMime are you using? I implemented a fair bit of logic to try and handle these sorts of cases in version 2.4.18 (latest is 2.4.20).
Comment 4 racin 2010-11-18 18:22:35 UTC
I used 2.4.11. I confirm that this bug is solved in 2.4.20. Unfortunately, the last version packed by Debian is 2.4.14...
Comment 5 Jeffrey Stedfast 2010-11-18 21:48:51 UTC
cool, glad 2.4.20 fixes this. thanks for confirming the fixes!