Bug 84056 – strange (illegal?) chars in author names causes endless loop in group download

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 84056 - strange (illegal?) chars in author names causes endless loop in group download


Summary:	strange (illegal?) chars in author names causes endless loop in group download


Status:	RESOLVED FIXED

Product:	Pan
Classification:	Other
Component:	general
Version:	pre-0.12.0 betas
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	0.12.0
Assigned To:	Charles Kerr
QA Contact:	Charles Kerr

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2002-06-04 07:59 UTC by Roman Möller
Modified:	2006-06-18 05:15 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
This specific looping problem seem to occur for encoded headers too. Here's a sample message. It's looping on g_mime_iconv_strndup (cd=0x8bdb478,string=0xbfffdeb0 "problème\027", n=8) (1.51 KB, text/plain) 2002-06-04 20:08 UTC, Christophe Lambin	Details

Description Roman Möller 2002-06-04 07:59:09 UTC

If an article header contains strange characters like in "Micha³" (found in
comp.lang.perl.misc) in the author name, pan loops endlessly in function
g_mime_iconv_strndup. 
This seems due to the fact that the function iconv erroneously sets errno to
E2BIG, thus causing the do...while loop not to terminate.

A quick and dirty fix is to add a statement like

if (inleft==0) break;

at line 98. At least it seems to work.

Cheers
Roman

Comment 1 Christophe Lambin 2002-06-04 11:46:35 UTC

Although this specific problem could/should be fixed in GMime, it's
actually part of a bigger problem: Pan and GMime (correctly) expect
only 7bit text in the news headers. Clearly some news agents do not
follow the standards (I'm looking at you, OE :-)).

What we could do is to encode the headers to 7bit ourselves (e.g. when
we get them via nntp). That way, we're sure to have
standards-compliant headers elsewhere.

This would also fix related bug 81181.

Thoughts ?

Comment 2 Charles Kerr 2002-06-04 15:28:42 UTC

fejj: that's gmime-iconv-utils.c:98 that roman's referring to...

chris: you mean something like
nntp.c:

   author_str = get_author_from_xover ();
   if (g_mime_utils_text_is_8bit (author_str, strlen(author_str)) {
      char * encoded = g_mime_utils_8bit_header_encode (author_str);
      replace_gstr (&author_str, encoded);
   }

   /* repeat for subject header */

?

Comment 3 Christophe Lambin 2002-06-04 18:56:51 UTC

I know it's a hack, but yes, something like that.  

Note that for the author, we may be better off using GMime's
InternetAddress class, because g_mime_utils_8bit_header_encode() can
mess up an author header when the first word needs to be encoded. See
my changes from May 20th for more info.

Comment 4 Christophe Lambin 2002-06-04 20:08:36 UTC

Created attachment 8985 [details]
This specific looping problem seem to occur for encoded headers too. Here's a sample message. It's looping on g_mime_iconv_strndup (cd=0x8bdb478,string=0xbfffdeb0 "problème\027", n=8)

Comment 5 Jeffrey Stedfast 2002-06-05 03:28:55 UTC

looks like I had fixed this in the gmime-1 branch but forgot to pull
it up to the HEAD branch. Just fixed that now and updated Pan as well.

Comment 6 Christophe Lambin 2002-06-05 06:49:29 UTC

Yep, problem fixed. Thanks.

fejj: what's your view on the extra encode for subjects and authors ?

Comment 7 Jeffrey Stedfast 2002-06-05 18:10:02 UTC

Well, Charles example code seems wrong I think, unless I'm missing out
on the big picture here.

I'm assuming that get_author_from_xover (); returns the raw header? If
so, then checking if it is 8bit is ok, but passing it on to
8bit_header_encode() would be wrong because GMime needs UTF-8 text
otherwise it'll fail.

This means that you have to convert to UTF-8 first, and then have
GMime re-encode it (as your example above shows - and yea, I agree
with Chris - probably should use GMime's InternetAddress class to
encode addresses).

I don't know the code too well, so this may not be doable - but I
might suggest converting all the headers that this problem could
happen on to UTF-8 right there, so that higher levels don't have to
worry about it. This means that if the raw header contains 8bit text,
you convert it to UTF-8 but don't have to re-encode just to decode it
a few seconds later :-)
Any header that is not 8bit, you'd just decode into UTF-8.

I assume these are header values that are stored on the Article
object?

anyways, hopefully my thoughts aren't completely useless :-)