GNOME Bugzilla – Bug 101510
multipart/mixed / begin-end Bug
Last modified: 2006-06-18 05:25:20 UTC
Pan is subject to the annoying begin-end bug invented by Outlook Express: A text/plain Message containing "begin" and "end" Keywords is interpreted as multipart/mixed (or uuencoded). Pan seems to even add a multipart/mixed header which is not there in the original message. Problem can be seen i.e. in <slrnavtto8.9u1.news@news.jors.net>.
Created attachment 13096 [details] message showing this behaviour. notice that this message intentionally tries to fool the reader into thinking it's a uuencoded message. how cute ...
*sigh* I thought this was fixed already? :)
The problem with this attachment is that it's a syntactically correct uuencoded message, so clipping out the contents is a valid response. The poster who's been putting these in the de.* hierarchy seems to have realized that too, as he's now making the "begin" line incorrect by removing the file permission number, so that compliant newsreaders will bomb out of uu mode and show his messages as text. Pan does show the contents of his newer messages.
Created attachment 14070 [details] a newer post that tries to break OE
*** Bug 111852 has been marked as a duplicate of this bug. ***
Date: Fri, 31 Oct 2003 18:14:06 +0100 From: "Juergen P. Meier" <bugzilla-pan@jors.net> To: charles@rebelbase.com Cc: pan-qa-maint@bugzilla.gnome.org Subject: Pan - 0.13.2.93 Bug 101510 User-Agent: Mutt/1.4.1i Hello, some PAN users informed me that the pan newsreader shows the same stupid bug as Microsoft Outlook Express and older versions of IBM Lotus Notes. I read your comment about me prudcing uuencoded lookalikes. Well, i stongly reject this and have to inform you, that i use the english wort "begin" (see http://www.m-w.com/cgi-bin/dictionary?book=Dictionary&va=begin) on a quite regular basis. And neither the Microsoft Corporation nor free software autors can force me to abandon this word in favor of alternatives like "start", "commence" or similar words. I do not know how much experience you have with the Usenet media, but in more than a decade of experience, i have only found two software vendors who actually believe they can ignore common sense and invent new definitions for "text/plain" content type Postings by posing the pretty naive assumption that all uu-code solely relies on the occurance of a common english word followed by an optional number and another even more common english word occuring somewhere later in the text. In my opinion this is a pretty silly idea. (Google finds more then 35 *million* websites containing "begin", and more than *one hundred million* hits on "end".) Even the /usr/bin/uudcode Unix tool from the 1980s is more intelligent than both Microsoft Outlook Express and your software, by correctly identifying my postings as not beeing uuencoded content. Now i do not ask you to change your product, nor do i speak for the Users of your product. I just feel inclined to explain why your comment about my posting style is quite silly. Side Note: In postings that comply to the MIME standard, uuendoded content is MIME encoded and the content is declared as content-transfer-encoding: uuencode Now, my postings are pretty obviously declared as text/plain and the transfer-encoding beeing 8bit (meaning unencoded). regards, Juergen Meier
''content-transfer-encoding: uuencode'' is not a valid MIME encoding; there's only "binary", "8bit", "7bit", "base64" and "quoted-printable". It's quite stupid to use UUENCODE when a superior solution (MIME with base64) is available but it might happen if the user decides to be stupid (or cater for old software of the intended recipient). Detecting UUENCODDE in plain text (and yes, it might be labelled as "text/plain" with "8bit" or even "quoted-printable" encoding -- the whole point of UUENCODE is embedding binary data in plain text) is always based on heuristics. These heuristics have to be more sophisticated, however, than just looking for "begin"/"end" because plain text can include anything. There are a lot of ways to make them more reliable: . Don't use /^begin [0-9]* / but /^begin [0-7]{3,} / (at least three octal digits for the file permissions). . Check the byte count of each line. Display lines with a wrong byte count after the "attachment" (and don't assume an attachment if too many lines have a wrong byte count). . Do a frequency analysis on the encoding alphabet used. UUENCODEers usually don't mix different alphabets (actually, there are only two known alphabets -- one with " " as 0 and one with "`", so more than 65 different characters (one more "just in case") indicate non-UUENCODEd data. . Check the line lengths. If there are more than 3 ("standard" length, one shorter line at the end, one just in case) different line lengths don't assume UUENCODE. . Check for the last zero-length line (usually just a `) at the end.
The following message breaks pan too: begin 1 followup Ignatios Souvatzis <ignatios@newton.cs.uni-bonn.de>: > Andreas Bogk schrieb: > >> Solange man keine schmutzigen Tricks macht, und ich meine *wirklich* >> schmutzige Tricks, wie bei einer doppelt verketteten Liste beide >> Pointer XORen und in nur einem Word speichern, > > Wie soll das funktionieren? Doppelt verkettete Liste. Du kennst immer einen der beiden Pointer. (solange du dir gemerkt hast, woher du kamst.) Das ist Eklig. Juergen -- Juergen P. Meier - "This World is about to be Destroyed!" end If you think technology can solve your problems you don't understand technology and you don't understand your problems. (Bruce Schneier) So if you would look at begin a little bit more carefully, it would work.
The test cases here pass if Pan is more stringent about checking for three octal digits for the mode as suggested by claus. http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&subdir=pan/pan/base&command=DIFF_FRAMESET&file=util-mime.c&rev1=1.58&rev2=1.59&root=/cvs/gnome
*** Bug 130522 has been marked as a duplicate of this bug. ***