After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 122078 - Pan should support Perl-Compatable Regular Expressions
Pan should support Perl-Compatable Regular Expressions
Status: RESOLVED FIXED
Product: Pan
Classification: Other
Component: general
0.14.2
Other Linux
: Normal normal
: 0.14.3
Assigned To: Charles Kerr
Pan QA Team
Depends on:
Blocks: 123246
 
 
Reported: 2003-09-12 04:06 UTC by Carl Hudkins
Modified: 2004-12-22 21:47 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Carl Hudkins 2003-09-12 04:06:14 UTC
Normally, regexes are case-sensitive unless certain flags or options are 
used to change this.  Thus, the line "Subject: ^.*\<FREE\>.*$" should 
match "FREE SEX NOW", but not "GIMP and free fonts?".  Currently, it will 
match both, which decreases its value in catching spam posts.  If I read 
the documentation for grep and regex(7) correctly, there is no way to 
toggle case-sensitivity within those regexes, but a PCRE may be changed 
with the option (?i).  (Thus, if Pan supported PCREs -- and I'm not 
asking that it do so! -- normally "FREE" would only match "FREE", but 
"(?i)free" would match "FREE" or "free" or even "fREe".  I do not see any 
option in a PRCE to turn case-matching ON, merely one to turn it off.) 
Since regexes should be case-sensitive unless an effort is made to do 
otherwise, I submit that Pan's current behavior is a bug in regex 
implementation.
Comment 1 Manuel Menal 2003-09-12 10:10:58 UTC
I just had a look at that, and here is the results I got :

It seems that in fact, Pan does support case-sensitive regexps in
score file. It just does not have a way to toggle case-sensitivity in
the GUI. As explained in <pan/filters/score.c>, Pan "follow[s] XNews'
idiom for specifying case sensitivity: '=' as the delimiter instead of
':'" ; in your case, you should have a line that looks like :

        Subject: ^.*\<FREE\>.*$

.. in your score file; just turn the ':' into a '=' and the regexp
will be case-sensitive. 

The *real* problem is that make_scorefile_entry, the function in
<pan/filters/score.c> that creates the entry which is then written in
the scorefile, uses the ':' separator unconditionally;  There should
probably be a checkbox somewhere in the `Add Score' dialog where you
could set case sensitivity for the regexp you add, or  another way to
set this, and then :

  g_string_append_printf (str, "%c\t%s%s: %s" EOL,
                          (item->on?' ':'%'),
                          (item->negate?"~":""),
                          item->key, value);

could be changed in something like :

  g_string_append_printf (str, "%c\t%s%s%c %s" EOL,
                          (item->on ? ' ' : '%'),
                          (item->negate ? "~" : ""),
                          item->key, 
                          item->case_sensitive ? '=' : ':',
                          value);


(this code is in the make_scorefile_entry function, of course.)

Hope that helps.
Comment 2 Carl Hudkins 2003-09-12 12:20:30 UTC
Yes, that helps a lot!  I tried it and it works.  I guess I'm 
changing my request, then, from "fix the regexes" to "fix the GUI".  
:) 
 
(I suppose I should read the source code from time to time.) 
Comment 4 Carl Hudkins 2003-11-27 04:23:03 UTC
Looking at this dialog, my initial thought is to have a "column" of 
checkboxes between "Criteria" and the column of pick-lists.  Such a 
check would only make sense to apply to Subject and Author; although 
you technically *could* apply it to References, that might be silly, 
and of course it doesn't apply to Lines.   
On second thought, I think that "Match Case" or "Case Sensitive" 
could be even longer in other languages than English, so what about 
an icon-button?  It could be labeled something like "A=a" for "Don't 
care about case" (default, not pressed), and "A=A" for "Match case" 
(pressed), with a tooltip to explain what it really means.  That 
way, you would only need a square, of the same height as the 
pick-list (to be aesthetically pleasing), and not a column with a 
(possibly very long) label.  The dialog would not need to become any 
bigger than it already is.  (I see a lot of "slack" space in the 
"Criteria" section, apparently caused by the columnar layout of this 
box, and the "-9999" from "Score" moving everything over quite a 
lot.  Some of this could be used.) 
Of course, I haven't ever looked at the GNOME HIG... would something 
like that fly in its face?  ;) 
Comment 5 Charles Kerr 2003-11-27 04:24:56 UTC
Hmm, now that I think about this some more, I think this is a mistake
and I'm going to back out the code.

Parsing XNews' case-sensitive tokens is GOOD because it helps Pan to
parse both slrn and XNews scorefiles.

Generating XNews' case-sensitive tokens is BAD because it lets Pan
generate scorefiles that slrn can't parse.

IOW, Pan should follow the old saying about being generous in what it
parses but conservative in what it generates.

I will do some more research on this.  IIRC slrn _doesn't_ accept PCRE
but instead has its own slang regular expressions that have their own
idiom for case-sensitivity, which would make it harder to strive for
compatability.
Comment 6 Charles Kerr 2003-11-27 05:57:11 UTC
s-lang version 1.4.8 adds a module to use PCRE, and s-lang's
maintainer has his intention to integrate PCRE directly into
version 2 of s-lang:

   Message-ID: <slrnb69tfc.hu8.davis@aluche.mit.edu>

   > I'm quite new to this and I wonder is pcre *better* than
   > slang's ordinary regular expressions?
   
   Generally speaking, yes.  I wrote the slang regular
   expression engine sometime in the early 90s based
   upon the "ed" man page on an old Ultrix system from
   the late 80s.

   S-Lang v2, will use the pcre regular expressions;
   primarily because pcre supports the UTF-8 encoding of
   unicode.  I will say more about the progress of v2 in
   a future posting.

In sum:
PCRE is better than Pan's posix regular expressions;
PCRE is better than s-lang's current regular expressions;
PCRE is used by XNews;
PCRE is being phased into slrn and can be used now;
PCRE allows for the case-sensitive control Carl needs.

IMO Pan 0.14.3 should use libpcre both for its features and
to increase the portability of its scorefiles.

I'm highjacking this issue to read
"Pan should support Perl-Compatable Regular Expressions". :)

The glib people have talked about supporting PCRE in glib someday;
until then, libPCRE is ubiquitous, is written in portable ANSI C, and
has no prerequisites, so it would be suitable even for the Windows
version of Pan.
Comment 7 Charles Kerr 2003-11-27 06:01:34 UTC
http://www.pcre.org/