GNOME Bugzilla – Bug 122078
Pan should support Perl-Compatable Regular Expressions
Last modified: 2004-12-22 21:47:04 UTC
Normally, regexes are case-sensitive unless certain flags or options are used to change this. Thus, the line "Subject: ^.*\<FREE\>.*$" should match "FREE SEX NOW", but not "GIMP and free fonts?". Currently, it will match both, which decreases its value in catching spam posts. If I read the documentation for grep and regex(7) correctly, there is no way to toggle case-sensitivity within those regexes, but a PCRE may be changed with the option (?i). (Thus, if Pan supported PCREs -- and I'm not asking that it do so! -- normally "FREE" would only match "FREE", but "(?i)free" would match "FREE" or "free" or even "fREe". I do not see any option in a PRCE to turn case-matching ON, merely one to turn it off.) Since regexes should be case-sensitive unless an effort is made to do otherwise, I submit that Pan's current behavior is a bug in regex implementation.
I just had a look at that, and here is the results I got : It seems that in fact, Pan does support case-sensitive regexps in score file. It just does not have a way to toggle case-sensitivity in the GUI. As explained in <pan/filters/score.c>, Pan "follow[s] XNews' idiom for specifying case sensitivity: '=' as the delimiter instead of ':'" ; in your case, you should have a line that looks like : Subject: ^.*\<FREE\>.*$ .. in your score file; just turn the ':' into a '=' and the regexp will be case-sensitive. The *real* problem is that make_scorefile_entry, the function in <pan/filters/score.c> that creates the entry which is then written in the scorefile, uses the ':' separator unconditionally; There should probably be a checkbox somewhere in the `Add Score' dialog where you could set case sensitivity for the regexp you add, or another way to set this, and then : g_string_append_printf (str, "%c\t%s%s: %s" EOL, (item->on?' ':'%'), (item->negate?"~":""), item->key, value); could be changed in something like : g_string_append_printf (str, "%c\t%s%s%c %s" EOL, (item->on ? ' ' : '%'), (item->negate ? "~" : ""), item->key, item->case_sensitive ? '=' : ':', value); (this code is in the make_scorefile_entry function, of course.) Hope that helps.
Yes, that helps a lot! I tried it and it works. I guess I'm changing my request, then, from "fix the regexes" to "fix the GUI". :) (I suppose I should read the source code from time to time.)
Partial fix is now in CVS: case_sensitive is now supported in the add-score backend, but it's not in the GUI yet. I'm not sure what the most space-conscious way is to add this to the "create score" dialog and would welcome suggestions. http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&subdir=pan/pan/filters&command=DIFF_FRAMESET&file=score.c&rev1=1.22&rev2=1.23&root=/cvs/gnome http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&subdir=pan/pan/filters&command=DIFF_FRAMESET&file=score.h&rev1=1.4&rev2=1.5&root=/cvs/gnome http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&subdir=pan/pan&command=DIFF_FRAMESET&file=article-actions.c&rev1=1.62&rev2=1.63&root=/cvs/gnome http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&subdir=pan/pan&command=DIFF_FRAMESET&file=score-add-ui.c&rev1=1.7&rev2=1.8&root=/cvs/gnome
Looking at this dialog, my initial thought is to have a "column" of checkboxes between "Criteria" and the column of pick-lists. Such a check would only make sense to apply to Subject and Author; although you technically *could* apply it to References, that might be silly, and of course it doesn't apply to Lines. On second thought, I think that "Match Case" or "Case Sensitive" could be even longer in other languages than English, so what about an icon-button? It could be labeled something like "A=a" for "Don't care about case" (default, not pressed), and "A=A" for "Match case" (pressed), with a tooltip to explain what it really means. That way, you would only need a square, of the same height as the pick-list (to be aesthetically pleasing), and not a column with a (possibly very long) label. The dialog would not need to become any bigger than it already is. (I see a lot of "slack" space in the "Criteria" section, apparently caused by the columnar layout of this box, and the "-9999" from "Score" moving everything over quite a lot. Some of this could be used.) Of course, I haven't ever looked at the GNOME HIG... would something like that fly in its face? ;)
Hmm, now that I think about this some more, I think this is a mistake and I'm going to back out the code. Parsing XNews' case-sensitive tokens is GOOD because it helps Pan to parse both slrn and XNews scorefiles. Generating XNews' case-sensitive tokens is BAD because it lets Pan generate scorefiles that slrn can't parse. IOW, Pan should follow the old saying about being generous in what it parses but conservative in what it generates. I will do some more research on this. IIRC slrn _doesn't_ accept PCRE but instead has its own slang regular expressions that have their own idiom for case-sensitivity, which would make it harder to strive for compatability.
s-lang version 1.4.8 adds a module to use PCRE, and s-lang's maintainer has his intention to integrate PCRE directly into version 2 of s-lang: Message-ID: <slrnb69tfc.hu8.davis@aluche.mit.edu> > I'm quite new to this and I wonder is pcre *better* than > slang's ordinary regular expressions? Generally speaking, yes. I wrote the slang regular expression engine sometime in the early 90s based upon the "ed" man page on an old Ultrix system from the late 80s. S-Lang v2, will use the pcre regular expressions; primarily because pcre supports the UTF-8 encoding of unicode. I will say more about the progress of v2 in a future posting. In sum: PCRE is better than Pan's posix regular expressions; PCRE is better than s-lang's current regular expressions; PCRE is used by XNews; PCRE is being phased into slrn and can be used now; PCRE allows for the case-sensitive control Carl needs. IMO Pan 0.14.3 should use libpcre both for its features and to increase the portability of its scorefiles. I'm highjacking this issue to read "Pan should support Perl-Compatable Regular Expressions". :) The glib people have talked about supporting PCRE in glib someday; until then, libPCRE is ubiquitous, is written in portable ANSI C, and has no prerequisites, so it would be suitable even for the Windows version of Pan.
http://www.pcre.org/
Fixed in CVS: http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&subdir=pan/pan/filters&command=DIFF_FRAMESET&file=filter-phrase.c&rev1=1.27&rev2=1.28&root=/cvs/gnome http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&subdir=pan/pan/filters&command=DIFF_FRAMESET&file=filter-phrase.h&rev1=1.15&rev2=1.16&root=/cvs/gnome http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&subdir=pan/pan/filters&command=DIFF_FRAMESET&file=score.c&rev1=1.24&rev2=1.25&root=/cvs/gnome http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&subdir=pan&command=DIFF_FRAMESET&file=ANNOUNCE.html&rev1=1.220&rev2=1.221&root=/cvs/gnome http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&subdir=pan&command=DIFF_FRAMESET&file=configure.in&rev1=1.292&rev2=1.293&root=/cvs/gnome http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&subdir=pan&command=DIFF_FRAMESET&file=pan.spec.in&rev1=1.52&rev2=1.53&root=/cvs/gnome http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&subdir=pan/pan&command=DIFF_FRAMESET&file=Makefile.am&rev1=1.214&rev2=1.215&root=/cvs/gnome