GNOME Bugzilla – Bug 391472
Add ability to match headers by words
Last modified: 2012-02-07 17:44:05 UTC
Forwarding this from a downstream bug report: http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=135918 I was able to reproduce the filtering behavior using Evolution 2.9.4. Description of problem: I have a filter that checks if the subject contains pto and then moves the message to a folder. I would expect the filter to look for the word "pto". The filter though matches on the substring "pto" so while subject lines such as "PTO today", "Taking pto" and "PTO" trigger the filter so does any email with the phrase "laptop" which is not what I intended. The other option would be to use a subject is filter which would catch "pto", but not "pto today". The help documentaition does not make clear what "contains" actually means for the rule. Version-Release number of selected component (if applicable): evolution 2.0.2 How reproducible: Always Steps to Reproduce: 1. set up a subject filter with a contains line (pto) 2. send mail containing the word "laptop" in the subject. 3. Actual results: laptop is filtered. Expected results: laptop not be filtered. Additional info: There are a number of ways to approach this. One possible way would be to split contains into 2 selections: contains word, and contains substring. Or you could make contains search on word and add a substring selection for filter. It seems as if the code is making any substring match trigger the filter. It should probably parse the field with a space token and then try to exact match the filter text to the subject line pieces and not match on substring. Workaround is to add a leading and trailing whitespace around " pto " in the filter text. (which one could argue is not all that good of behavior either). Anyhow filter rules in general should be better documented.
(In reply to comment #0) > There are a number of ways to approach this. One possible way would be > to split contains into 2 selections: contains word, and contains > substring. Or you could make contains search on word and add a > substring selection for filter. It seems as if the code is making any > substring match trigger the filter. It should probably parse the field > with a space token and then try to exact match the filter text to the > subject line pieces and not match on substring. Splitting on whitespace and then trying to do an exact match won't work in cases where there's punctuation characters adjacent to the word (e.g. "laptop,"). But maybe replacing all punctuation characters with spaces (using ispunct()) and THEN splitting on whitespace would get us closer to a reasonable behavior. Would that work for all locales though? I think I'd prefer to avoid presenting the user with the somewhat technical term "substring". My suggestion would be has word does not have word contains does not contains with "has word" listed first since that seems like the most common case. Would that sufficiently disambiguate the word "contains"? > Anyhow filter rules in general should be better documented. Definitely agree.
how would you implement this for IMAP? or any other backend where the messages are stored remotely? you're talking about adding complex string matching which would probably have to work like pango's word boundary logic.
ignore me as far as IMAP goes, I was thinking this was body word matching (which, for local mail would be "easy" since that's how ibex indexes, but a pita for IMAP where it is only able to do substring matching). subject word matching should be doable as those strings are all locally cached, could just add a new method for "word match" which could be implemented as suggested above quite easily (bonus points for using pango word breaking logic, but probably not required)
Bumping version to a stable release.
Created attachment 207004 [details] [review] eds patch for evolution-data-server; The eds part, defining "header-has-words". I'm not sure whether ispunct() is not too much, because for example from "desktop-devel-list" you get 3 words, which might not be always expected, same as from "3:30 pm" one gets 3 words. But who knows, let's see what will users think.
Created attachment 207005 [details] [review] evo patch for evolution; Using the new "header-has-words" for filtering.
Created commit e3da65f in eds master (3.3.90+) Created commit 569cdde in evo master (3.3.90+)