After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 84056 - strange (illegal?) chars in author names causes endless loop in group download
strange (illegal?) chars in author names causes endless loop in group download
Status: RESOLVED FIXED
Product: Pan
Classification: Other
Component: general
pre-0.12.0 betas
Other Linux
: Normal normal
: 0.12.0
Assigned To: Charles Kerr
Charles Kerr
Depends on:
Blocks:
 
 
Reported: 2002-06-04 07:59 UTC by Roman Möller
Modified: 2006-06-18 05:15 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
This specific looping problem seem to occur for encoded headers too. Here's a sample message. It's looping on g_mime_iconv_strndup (cd=0x8bdb478,string=0xbfffdeb0 "problème\027", n=8) (1.51 KB, text/plain)
2002-06-04 20:08 UTC, Christophe Lambin
Details

Description Roman Möller 2002-06-04 07:59:09 UTC
If an article header contains strange characters like in "Micha³" (found in
comp.lang.perl.misc) in the author name, pan loops endlessly in function
g_mime_iconv_strndup. 
This seems due to the fact that the function iconv erroneously sets errno to
E2BIG, thus causing the do...while loop not to terminate.

A quick and dirty fix is to add a statement like

if (inleft==0) break;

at line 98. At least it seems to work.

Cheers
Roman
Comment 1 Christophe Lambin 2002-06-04 11:46:35 UTC
Although this specific problem could/should be fixed in GMime, it's
actually part of a bigger problem: Pan and GMime (correctly) expect
only 7bit text in the news headers. Clearly some news agents do not
follow the standards (I'm looking at you, OE :-)).

What we could do is to encode the headers to 7bit ourselves (e.g. when
we get them via nntp). That way, we're sure to have
standards-compliant headers elsewhere.

This would also fix related bug 81181.

Thoughts ?
Comment 2 Charles Kerr 2002-06-04 15:28:42 UTC
fejj: that's gmime-iconv-utils.c:98 that roman's referring to...

chris: you mean something like
nntp.c:

   author_str = get_author_from_xover ();
   if (g_mime_utils_text_is_8bit (author_str, strlen(author_str)) {
      char * encoded = g_mime_utils_8bit_header_encode (author_str);
      replace_gstr (&author_str, encoded);
   }

   /* repeat for subject header */

?
Comment 3 Christophe Lambin 2002-06-04 18:56:51 UTC
I know it's a hack, but yes, something like that.  

Note that for the author, we may be better off using GMime's
InternetAddress class, because g_mime_utils_8bit_header_encode() can
mess up an author header when the first word needs to be encoded. See
my changes from May 20th for more info.
Comment 4 Christophe Lambin 2002-06-04 20:08:36 UTC
Created attachment 8985 [details]
This specific looping problem seem to occur for encoded headers too. Here's a sample message. It's looping on g_mime_iconv_strndup (cd=0x8bdb478,string=0xbfffdeb0 "problème\027", n=8)
Comment 5 Jeffrey Stedfast 2002-06-05 03:28:55 UTC
looks like I had fixed this in the gmime-1 branch but forgot to pull
it up to the HEAD branch. Just fixed that now and updated Pan as well.
Comment 6 Christophe Lambin 2002-06-05 06:49:29 UTC
Yep, problem fixed. Thanks.

fejj: what's your view on the extra encode for subjects and authors ?
Comment 7 Jeffrey Stedfast 2002-06-05 18:10:02 UTC
Well, Charles example code seems wrong I think, unless I'm missing out
on the big picture here.

I'm assuming that get_author_from_xover (); returns the raw header? If
so, then checking if it is 8bit is ok, but passing it on to
8bit_header_encode() would be wrong because GMime needs UTF-8 text
otherwise it'll fail.

This means that you have to convert to UTF-8 first, and then have
GMime re-encode it (as your example above shows - and yea, I agree
with Chris - probably should use GMime's InternetAddress class to
encode addresses).

I don't know the code too well, so this may not be doable - but I
might suggest converting all the headers that this problem could
happen on to UTF-8 right there, so that higher levels don't have to
worry about it. This means that if the raw header contains 8bit text,
you convert it to UTF-8 but don't have to re-encode just to decode it
a few seconds later :-)
Any header that is not 8bit, you'd just decode into UTF-8.

I assume these are header values that are stored on the Article
object?

anyways, hopefully my thoughts aren't completely useless :-)