GNOME Bugzilla – Bug 709031
gmime 2.6.16-2.6.18 split references headers in the wrong place, 2.6.15 works fine
Last modified: 2013-09-30 11:20:42 UTC
Created attachment 256015 [details] Goofed up references header, space between dot and net I'm a long-time pan user, active on the mailing list tho not a dev. Back in mid-August we got a report that pan was splitting references headers incorrectly, and after some investigation, the problem was traced to gmime -- the OP reverted to the gmime 2.4 series and the problem disappeared. Back then I hadn't seen the problem myself, but shortly thereafter someone complained about my posts breaking threads as well. That was my excuse to trace down the problem further, and I found that 2.6.15 was the last "good" release, just as I suspected it'd be after reading the gmime changelog and coming across commit d39fd2a07f71fc9b5b5dd8d9edabf5f5c7234532 . However, while I'm on gentoo it's not particularly convenient for me to build from git and bisect, so I'm not SURE it's that commit, only that 2.6.15 works and 2.6.16,17 are broken. You can see the thread along with an example of the breakage on pan's user list, via gmane. Here's the web link, altho it doesn't list the references header unfortunately. You'll have to fire up a news client and visit the group/list via news.gmane.org in ordered to get those. http://comments.gmane.org/gmane.comp.gnome.apps.pan.user/14350 I note that the references headers don't look as broken when I check them via pan with a good gmime as they do with a bad one, and in particular, that a kmail post, Message-ID: <201308241128.15795.pan@nospamdgmm.net> , that I complained about having a bad references header looks just fine with 2.6.15. In my cache and with gmime 2.6.15, I see: References: <201308200021.15865.pan@dgmm.net> <201308201421.25554.pan@dgmm.net> <5217C93F.6080807@gmail.com> With 2.6.16+ I got (reconstructed from gmane's munging of what it thought was an email address, as I posted it in a message body in that thread): References: <201308200021.15865.pan@dgmm.net> <201308201421.25554.pan@dgmm. net> <5217C93F.6080807@gmail.com> Wrapping at the dot, CLEARLY illegal based on the RFC. Attached is a sample post (from my cache) with an errant space between the dot and net in .net. (The name being message-id based and thus a bit strange, I hope bugzie doesn't choke on it... or maybe only firefox sees the local name?)
It's actually not illegal (CFWS is allowed between lexical tokens of which each msgid token is made up of multiple lexical tokens). Anyway... that said, I agree it's not ideal as it seems to break a number of mail and news clients. This should be fixed by commit a248cc044c6ad55505939363aa858c930867a014 and 1373f11f9b3ecdbfb58a70a742506f3f6d5c57d8
You're correct about lexical tokens, but take a look at the RFC, the tokens in question are defined as including a dot (dot-atom and dot-atom-text, the latter of which allows dots but NOT CFWS, comment/folding-white-space, in the language of the RFC), with the split to the left and to the right of the @, not including the <> delimiters which are their own tokens. (However, while splitting inside the <> delimiters at all is allowed by the MUST, it's denied by the SHOULD.) Here's a link to RFC 5322: http://tools.ietf.org/html/rfc5322 In particular, see section 3.2.3 where dot-atom and dot-atom-text are defined, and section 3.6.4, which builds on those definitions to define a message-ID field as [CFWS] "<" id-left "@" id-right ">" [CFWS], with id-left and id-right further defined as (basically, see the RFC for specifics) dot-atom-text, so no "inside" CFWS by dots, tho CFWS is allowed directly to either side of the @ and of course directly inside the <> delimiters (tho the SHOULD says split outside the <> delimiters). The references header in turn contains a series of message-ID fields. I spent a couple hours some days ago reading this RFC and others to try to ensure that what I was seeing, dot-folding, was indeed illegal before reporting the bug, and another half hour or so reviewing it for this comment. So unless I'm missing something big, which is possible (especially with the obsolete values, but they're discouraged anyway, and I /did/ check them to some extent), yes, it's illegal to fold at the dot... which is why I could quite confidently make that statement in the first place. Regardless of whether it's specifically allowed or not, however, as you said it does seem to break things, and you said that's fixed, so all is good. =:^) I'm going to try grabbing that commit as a patch to apply against 2,6,17 and test it here. If that works, I'll update this bug once more and will be filing a gentoo bug to include it as a patch against current versions. Thanks. =:^)
I was referring to the obsolete syntax from rfc822 which is what GMime is based on (and 2822 to some extent). In any event, looking more closely at rfc2822, I found this, so you seem to be correct: Since the msg-id has a similar syntax to angle-addr (identical except that comments and folding white space are not allowed) rfc822 does not make this distinction between normal angle-addr syntax and msg-id syntax.
I should note that it wasn't the intention of GMime to fold msg-id tokens, it's just that because the References header is technically a structured header, I used my structured header tokenizer on it to find lexicographically correct locations to fold. Another possible solution to the one I implemented would probably have been to have my unstructured header tokenizer tokenize it instead.
(Don't you hate when RFC syntax specifications are contradictory?)