Bug 758478 – Filtered text highlight is incorrect when multiple filters match

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 758478 - Filtered text highlight is incorrect when multiple filters match


Summary:	Filtered text highlight is incorrect when multiple filters match


Status:	RESOLVED FIXED

Product:	meld
Classification:	Other
Component:	filediff
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	meld-maint
QA Contact:	meld-maint

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2015-11-21 23:56 UTC by Kai Willadsen
Modified:	2016-01-25 23:24 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Helper for merging intervals (2.39 KB, patch) 2015-11-22 00:11 UTC, Kai Willadsen	committed	Details \| Review
new test intervals (1.16 KB, patch) 2015-11-23 19:46 UTC, David Rabel	committed	Details \| Review
Patch proposal (3.56 KB, patch) 2015-11-24 08:22 UTC, David Rabel	none	Details \| Review
Patch proposal 2 (3.48 KB, patch) 2015-11-29 13:25 UTC, David Rabel	committed	Details \| Review
new test cases for text filtering test (1.29 KB, patch) 2015-12-24 13:05 UTC, David Rabel	none	Details \| Review

Description Kai Willadsen 2015-11-21 23:56:13 UTC

(Carried over from bug 633515)
> The new way to filter text would be to:
> - For each enabled filter, apply it as we currently do to the buffer text, but *do not* actually replace anything.
> - Add the range(s) ignored by each filter to a list of ranges.
> - Merge the lists of ranges to create the overall list of ranges that have been ignored; I think we can do this fairly naively, given how many items we'll be dealing with.
> - Use the overall list of ranges to do the highlighting, and then use it to remove the associated text as we do currently.

Comment 1 Kai Willadsen 2015-11-22 00:11:48 UTC

Created attachment 316037 [details] [review]
Helper for merging intervals

(In reply to David Rabel from comment #20)
> Created attachment 315729 [details]
> FilterRanges Class
> 
> Hi again,
> 
> is there actually a class that provides the functionality to handle ranges
> in the way we want to? I did not find one, so I started to write a little
> class on my own. I attached the mechanism. 
> I am not sure if it is self-explainable. I use a helper function and list
> comprehensions to delete all overlapping ranges and join them to the newly
> added range at the same time. I just iterated over the list before, but it
> does not work, because I removed items from the list while I was iterating
> over it.
> I also added a print_ranges() method and some test cases.

While I can see what you were going for, I think we can do this faster and more simply if we always only merge the final list, rather than re-merging overlapping ranges whenever we add a new range.

In other words, we do one pass to identify the matched ranges, store each range in a list, then do one pass to merge the lists and use the resulting list to highlight.

I've attached a simple patch containing a one-pass merger and tests. If you think this will meet your needs, I'm happy to just commit this to current master for ease.

> If you like the way I am implementing this, I would start working on a patch
> proposal for the new filtering mechanism.

I'd be very happy if you were keen to look at the new filtering mechanism.

Comment 2 David Rabel 2015-11-23 19:46:54 UTC

Created attachment 316116 [details] [review]
new test intervals

Ok, your helper class looks better than mine. I will gladly use it. But for the test cases, I would like to modify them slightly:

1) To have an overlap that is bigger than one character.
2) To have two identical ranges.
3) To have three ranges that overlap consecutively.


But of course I'm fine if you don't add this patch, because the class seems to be working fine anyways. :-)

I'll let you know when I have found the time to have a look at the filtering.

Regards,
  David

Comment 3 David Rabel 2015-11-24 08:22:02 UTC

Created attachment 316142 [details] [review]
Patch proposal

I already found the time. :-)

I hope again it is self-explaining. Please let me know, what you think about it.

Regards,
  David

Comment 4 Kai Willadsen 2015-11-28 22:46:59 UTC

Review of attachment 316142 [details] [review]:

Excellent. This is just *so* much nicer and more obvious than our current filtering mechanism... I'll be very happy to get this applied. A couple of small comments inline.

Also, I think we should probably add some tests for this, at least one for a non-grouped and one for a grouped ignore pattern, operating on a GtkTextBuffer.

Test intervals patch above looks good to me.

::: meld/filediff.py
@@ +775,1 @@
 

Now that we're not doing any more complicated lookup of offsets in the match, I think it might be cleaner to get rid of this function entirely and inline the lookups. Unless I'm mistaken, this would just be:

    start_iter = txt_start_iter.copy().forward_chars(start)

and the same for end, which is a lot shorter and still pretty clear.

@@ -776,3 @@
-                return start, end
-
-            assert m.group().count("\n") == 0

We can't get rid of this assertion. It's what we use to tell whether the regex has changed the number of lines in the file, because this breaks way-too-many assumptions elsewhere.

Possible just before you do the txt[:start] + txt[end:] slice below, you could slice out the ignored text there and check to see that there are no newline characters in that?

Comment 5 Kai Willadsen 2015-11-28 22:49:49 UTC

Patch review thing made my first suggestion there not entirely clear. I was suggesting that we could probably get rid of get_match_iters and make the resulting code both simpler and shorter.

Comment 6 David Rabel 2015-11-29 13:25:10 UTC

Created attachment 316471 [details] [review]
Patch proposal 2

Ok, so I added the assertion again and deleted the get_match_iters() function.

Unfortunately we cannot do

> start_iter = txt_start_iter.copy().forward_chars(start)

because this way, start_iter gets the return value of the forward_chars() call, which is a bool and not the TextIter object. So I split it into two lines for eacht TextIter, which is probably still cleaner than using get_match_iters()

Comment 7 David Rabel 2015-11-29 13:26:11 UTC

Ah, I forgot: I am not quite sure who and where to create tests for this.

Comment 8 Kai Willadsen 2015-12-06 00:33:31 UTC

I had some time this morning, so I spent a while writing some tests and fixing an issue I found with regex group matches.

I've pushed your patch to current master along with the new tests. I'm pretty happy about how this looks, but I'd appreciate it if you could take a look over the code and new tests to see that I haven't missed any edge cases.

I've also added at least one filter regex that isn't actually currently being tested, so more tests would always be nice.

Comment 9 David Rabel 2015-12-11 10:22:36 UTC

Unfortunately I am very busy at the moment. But I will look at it as soon as possible. :-)

Comment 10 David Rabel 2015-12-24 13:05:54 UTC

Created attachment 317861 [details] [review]
new test cases for text filtering test

Today I finally found the time to have a look at it and everything looks good to me. 
I added one test for the last filter regex in which I also did overlapping filters, one with and one without groups. 
I also added two non overlapping filters in different lines, which I am not sure if it adds anything to the test coverage. But it is something different to the others, so maybe it is good to have it. 
I also added a CVS keyword filter, which is the most complicated I could think of. ;-) I am not so much into regular expressions, so maybe there is some fancier stuff that we could test.
The tests cover everything I can think of, like no-group, single group, multiple groups, overlapping, non-overlapping.


By the way: I think the filters, that are included by default in meld, contain some mistakes:

1) The CVS keyword filter did not work, because the '$' were not escaped. And when I did escape them, the colon was filtered out, which makes not too much sense and now with the visual indication looks a bit strange. Also the whole filter makes no sense Without the group matching, so the ? behind it can be deleted. I suggest the following:
\$\w+:([^\n$]+)\$ instead of $\w+(:[^\n$]+)?$
This is what I used for the test anyway.

2) The C comment contains a ? right behind a * which is probably just a typo (and for example done right in the test you wrote) and has to be removed.

Unfortunately I don't have a clue where to change this.

Happy holidays,
David

Comment 11 Kai Willadsen 2016-01-22 23:24:41 UTC

(In reply to David Rabel from comment #10)
> Created attachment 317861 [details] [review] [review]
> new test cases for text filtering test
> 
> Today I finally found the time to have a look at it and everything looks
> good to me. 
> I added one test for the last filter regex in which I also did overlapping
> filters, one with and one without groups. 
> I also added two non overlapping filters in different lines, which I am not
> sure if it adds anything to the test coverage. But it is something different
> to the others, so maybe it is good to have it. 
> I also added a CVS keyword filter, which is the most complicated I could
> think of. ;-) I am not so much into regular expressions, so maybe there is
> some fancier stuff that we could test.
> The tests cover everything I can think of, like no-group, single group,
> multiple groups, overlapping, non-overlapping.

Sorry it took me so long, but I've pushed that patch now.

> By the way: I think the filters, that are included by default in meld,
> contain some mistakes:
> 
> 1) The CVS keyword filter did not work, because the '$' were not escaped.

So I'm pretty sure that the "$" escaping thing was fixed in commit 557d0d1c. Unfortunately because of how schema defaults work, if you've changed your filters you won't have the bugfix.

> And when I did escape them, the colon was filtered out, which makes not too
> much sense and now with the visual indication looks a bit strange. Also the
> whole filter makes no sense Without the group matching, so the ? behind it
> can be deleted. I suggest the following:
> \$\w+:([^\n$]+)\$ instead of $\w+(:[^\n$]+)?$
> This is what I used for the test anyway.

As for the optional clause and the colon grouping change, I think this depends on whether you expect the filter to remove non-expanded keywords. I think that whoever wrote the original filter *did* want this, and since that's what we've done for ages I'm inclined to leave it like that. Otherwise, e.g., filtering out a blank "$Rev$" would break if we moved the colon outside of the clause.

However, I use neither CVS or SVN so I may be wrong here...

> 2) The C comment contains a ? right behind a * which is probably just a typo
> (and for example done right in the test you wrote) and has to be removed.

Having that glob be non-greedy seems correct to me? or at least without doing real lexer/parser things I'm not sure how we could do anything better. 

> Unfortunately I don't have a clue where to change this.

These are schema defaults, so they live in data/org.gnome.meld.gschema.xml, and get compiled to a gschemas.compiled file.

Anyway, this seems to me to be working well, so I'm going to called this fixed. I'm happy to keep discussing the default filters, but maybe we should open another bug (or take it to the mailing list, in case other users have some suggestions).

Thanks very much for the patches and your testing.

Comment 12 Alan Suran 2016-01-23 05:32:02 UTC

Great job!

However, shouldn't you update dirdiff with the new text filter logic to be consistent? (DirDiff still uses "nested" text filters.)

Comment 13 Kai Willadsen 2016-01-23 21:04:55 UTC

(In reply to Alan Suran from comment #12)
> However, shouldn't you update dirdiff with the new text filter logic to be
> consistent? (DirDiff still uses "nested" text filters.)

Yes, good call. I've just created bug 761028 for this.

Comment 14 David Rabel 2016-01-24 18:49:54 UTC

(In reply to Kai Willadsen from comment #11)
> (In reply to David Rabel from comment #10)
> ...
> So I'm pretty sure that the "$" escaping thing was fixed in commit 557d0d1c.
> Unfortunately because of how schema defaults work, if you've changed your
> filters you won't have the bugfix.

Can I manually apply the bugfix?

> 
> > ...
> 
> As for the optional clause and the colon grouping change, I think this
> depends on whether you expect the filter to remove non-expanded keywords. I
> think that whoever wrote the original filter *did* want this, and since
> that's what we've done for ages I'm inclined to leave it like that.
> Otherwise, e.g., filtering out a blank "$Rev$" would break if we moved the
> colon outside of the clause.
> 
> However, I use neither CVS or SVN so I may be wrong here...
> 

Okay, I think I understand it now. Maybe it should be changed in the filter tests to be consistent. Or maybe it does not matter.

> > 2) The C comment contains a ? right behind a * which is probably just a typo
> > (and for example done right in the test you wrote) and has to be removed.
> 
> Having that glob be non-greedy seems correct to me? or at least without
> doing real lexer/parser things I'm not sure how we could do anything better. 
> 

I'm not sure if I understand it right. But I thought is that a ? right behind a * just makes no sense.

> > Unfortunately I don't have a clue where to change this.
> 
> These are schema defaults, so they live in data/org.gnome.meld.gschema.xml,
> and get compiled to a gschemas.compiled file.
> 

Could you tell me how to compile it? It would be interesting to know. And is this also the way how I can reset my text filters?

> 
> Thanks very much for the patches and your testing.

Your welcome. :-)

Comment 15 Kai Willadsen 2016-01-25 23:24:33 UTC

(In reply to David Rabel from comment #14)
> (In reply to Kai Willadsen from comment #11)
> > (In reply to David Rabel from comment #10)
> > ...
> > So I'm pretty sure that the "$" escaping thing was fixed in commit 557d0d1c.
> > Unfortunately because of how schema defaults work, if you've changed your
> > filters you won't have the bugfix.
> 
> Can I manually apply the bugfix?

You mean in order to test? The only way to do that is to delete the key (e.g., dconf reset /org/gnome/meld/text-filters) so that it reverts to the default, but that will get rid of any custom text filters you've defined.

> Okay, I think I understand it now. Maybe it should be changed in the filter
> tests to be consistent. Or maybe it does not matter.

Filter tests were really just a few cases; they don't need to be consistent with what we have in the defaults.

> I'm not sure if I understand it right. But I thought is that a ? right
> behind a * just makes no sense.

The "?" makes the preceeding glob non-greedy. e.g., for text
  axxxbxxxb
"a(.*)b" will match "xxxbxxx" as the group, whereas "a(.*?)b" will match "xxx" as the group.

> > > Unfortunately I don't have a clue where to change this.
> > 
> > These are schema defaults, so they live in data/org.gnome.meld.gschema.xml,
> > and get compiled to a gschemas.compiled file.
> > 
> 
> Could you tell me how to compile it? It would be interesting to know. And is
> this also the way how I can reset my text filters?

If you're running from an installed package, you'll need to run glib-compile-schemas with the correct path; I can't tell you what that will be for your distribution.

If you're running from a checkout, then we do this automatically in bin/meld in setup_settings() so that we can run uninstalled, but usually this is done as part of the build/install process. Basically you just run glib-compile-schemas on the directory containing the schema (data/ in this case).

However, this won't reset your text filters. To do that you need to use that dconf command from above, but again that will get rid of any custom filters you've added.