GNOME Bugzilla – Bug 84056
strange (illegal?) chars in author names causes endless loop in group download
Last modified: 2006-06-18 05:15:16 UTC
If an article header contains strange characters like in "Micha³" (found in comp.lang.perl.misc) in the author name, pan loops endlessly in function g_mime_iconv_strndup. This seems due to the fact that the function iconv erroneously sets errno to E2BIG, thus causing the do...while loop not to terminate. A quick and dirty fix is to add a statement like if (inleft==0) break; at line 98. At least it seems to work. Cheers Roman
Although this specific problem could/should be fixed in GMime, it's actually part of a bigger problem: Pan and GMime (correctly) expect only 7bit text in the news headers. Clearly some news agents do not follow the standards (I'm looking at you, OE :-)). What we could do is to encode the headers to 7bit ourselves (e.g. when we get them via nntp). That way, we're sure to have standards-compliant headers elsewhere. This would also fix related bug 81181. Thoughts ?
fejj: that's gmime-iconv-utils.c:98 that roman's referring to... chris: you mean something like nntp.c: author_str = get_author_from_xover (); if (g_mime_utils_text_is_8bit (author_str, strlen(author_str)) { char * encoded = g_mime_utils_8bit_header_encode (author_str); replace_gstr (&author_str, encoded); } /* repeat for subject header */ ?
I know it's a hack, but yes, something like that. Note that for the author, we may be better off using GMime's InternetAddress class, because g_mime_utils_8bit_header_encode() can mess up an author header when the first word needs to be encoded. See my changes from May 20th for more info.
Created attachment 8985 [details] This specific looping problem seem to occur for encoded headers too. Here's a sample message. It's looping on g_mime_iconv_strndup (cd=0x8bdb478,string=0xbfffdeb0 "problème\027", n=8)
looks like I had fixed this in the gmime-1 branch but forgot to pull it up to the HEAD branch. Just fixed that now and updated Pan as well.
Yep, problem fixed. Thanks. fejj: what's your view on the extra encode for subjects and authors ?
Well, Charles example code seems wrong I think, unless I'm missing out on the big picture here. I'm assuming that get_author_from_xover (); returns the raw header? If so, then checking if it is 8bit is ok, but passing it on to 8bit_header_encode() would be wrong because GMime needs UTF-8 text otherwise it'll fail. This means that you have to convert to UTF-8 first, and then have GMime re-encode it (as your example above shows - and yea, I agree with Chris - probably should use GMime's InternetAddress class to encode addresses). I don't know the code too well, so this may not be doable - but I might suggest converting all the headers that this problem could happen on to UTF-8 right there, so that higher levels don't have to worry about it. This means that if the raw header contains 8bit text, you convert it to UTF-8 but don't have to re-encode just to decode it a few seconds later :-) Any header that is not 8bit, you'd just decode into UTF-8. I assume these are header values that are stored on the Article object? anyways, hopefully my thoughts aren't completely useless :-)