Bug 633515 – Text ignored by filters should have some visual indication

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 633515 - Text ignored by filters should have some visual indication


Summary:	Text ignored by filters should have some visual indication


Status:	RESOLVED FIXED

Product:	meld
Classification:	Other
Component:	filediff
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	meld-maint
QA Contact:	meld-maint

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2010-10-29 23:05 UTC by Daniel Wong
Modified:	2015-11-21 23:59 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
repro repo (13.61 KB, application/x-gzip) 2010-10-31 05:06 UTC, Daniel Wong		Details
Patch proposal as discussed on meld-list@gnome.org (3.12 KB, patch) 2015-10-11 20:17 UTC, David Rabel	needs-work	Details \| Review
Second patch proposal (5.24 KB, patch) 2015-10-14 20:11 UTC, David Rabel	none	Details \| Review
Code cleanup with helper function (2.36 KB, patch) 2015-11-08 08:34 UTC, David Rabel	none	Details \| Review
FilterRanges Class (1.49 KB, text/x-python) 2015-11-17 07:27 UTC, David Rabel		Details

Description Daniel Wong 2010-10-29 23:05:04 UTC

Repro Steps
-----------

 1. untar + unzip the attached file
 2. cd repro
 3. meld .
 4. view the modifications to f

Result: Expected vs. Observed
-----------------------------

The first green region on the right covers the first couple lines, which are actually unchanged, instead of the next couple, where there are some additions.

I'd add a screenshot, but I can't figure out how to attach more than one file. It should become clear once you stare at the UI for a few seconds.

Clues (possibly red herrings)
-----------------------------

I believe this has something to do with the contents at the top of the file:

# -*- python -*-

because I can't seem to get the same wrong behavior when it's not present.
</possible-red-herring>

Versions
--------

$ meld --version
meld 1.3.0

From "System > About Ubuntu":
You are using Ubuntu 10.04 LTS - the Lucid Lynx - released in April 2010 and supported until April 2013.

From "System > About GNOME":
Version: 2.30.2

Comment 1 Kai Willadsen 2010-10-30 07:46:00 UTC

I think you forgot to attach the file?

Also, Meld 1.3.0 is really quite old at this point, could you check whether you can reproduce this problem with something more recent (1.4.0 for example).

Comment 2 Daniel Wong 2010-10-31 05:06:43 UTC

Created attachment 173584 [details]
repro repo

I know I had this filled out in my original report, but apparently, submitting the bug did not submit my attachment along with??

Comment 3 Daniel Wong 2010-10-31 05:09:06 UTC

Hey Kai,

Strange. Not sure why my attachment wasn't included.

Anyway, I installed meld recently using apt-get or synaptic, so it should be pretty recent, unless my download source is old.

Comment 4 Kai Willadsen 2010-10-31 20:47:24 UTC

Thanks for the test case. However, I can only reproduce the behaviour you describe when the "Script comment" (#.*) filter is enabled. In that case, what you're seeing is correct, as Meld essentially sees that there are four blank lines on the right side, and two on the left side, and offers to add two of those lines to the left side.

If you *don't* have that filter enabled, then that would seem to be a bug, but I can't reproduce it here (with 1.3.0 or anything later).

Also, Meld 1.3.0 is pretty old (one and a half years) at this point. Debian has apparently upgraded to 1.3.2 recently.

Comment 5 Daniel Wong 2010-11-05 19:07:53 UTC

Ah yes. I probably had that enabled setting. Even so, this behavior could be less user-hostile.

One possible solution is to dim ignored text. This would make what's going on more transparent (haha) to the user, which would be helpful in many situations.

Another solution (not mutually exclusive) is to match up lines before applying filters. I believe this would give better line matching when lines containing nothing but filtered text are added or removed.

Comment 6 Kai Willadsen 2010-11-05 19:12:55 UTC

Dimming or otherwise indicating ignored text seems like a reasonable request. Changing the title accordingly.

Matching up lines before applying filters doesn't really make much sense, as there's every chance that the filter will change the resulting comparison. I can't off the top of my head think of a way to do this that would actually help much.

Comment 7 Daniel Wong 2010-11-08 01:46:47 UTC

I agree that filtering would change what's different between the two files, but I don't see any problem with this.

Another idea: the comment lines that I added should not show up as a change at all (since that's the intent), but they did, because of the newlines that were added. Perhaps the filter pattern should be

\s*#.\r?\n|#.*

The C++ style comment filter could be modified in a similar way as well.

Comment 8 Kai Willadsen 2010-11-22 21:02:11 UTC

If the filtering changes the differences between the two files, but we've already lined the files up before filtering, then we're effectively reducing the effect of the filtering itself. It changes what filtering does.

There's already a bug somewhere about your second suggestion. Unfortunately, it's not currently possible for a filter to change the number of lines in a comparison; doing so will break other parts of the code, specifically the code that does incremental updating of the diffs. This would be great to fix, but is likely to be tricky.

Comment 9 Kai Willadsen 2015-08-29 22:38:59 UTC

Implementing this dimming should be manageable with a GtkTextTag for ignored text, though this *might* interact in a weird way with GtkSourceView themes. If we can rely on GtkSourceView themes giving us a 'deemphasise this text' tag, then we should use that. If not then this should come down to adding the tag to text ignored by the regex, and then clearing/reapplying as required.

Comment 10 David Rabel 2015-10-08 21:24:00 UTC

I am looking at this at the moment.

Comment 11 David Rabel 2015-10-11 20:17:37 UTC

Created attachment 313076 [details] [review]
Patch proposal as discussed on meld-list@gnome.org

Ready for finetuning. :-)

Comment 12 Kai Willadsen 2015-10-12 21:28:36 UTC

Review of attachment 313076 [details] [review]:

As mentioned on list, this is looking pretty good. I've commented on things I think should probably be fixed before it gets in. However, I think this is overall a definite improvement, so if at any point you'd like to bail on this patch just let me know and I'll pick up any remaining fine-tuning.

Detailed comments are inline below, but I have two more general comments. I'm being picky because you indicated that you were interested in getting involved, so I may as well make all the little suggestions up-front.

Firstly, all code should be formatted to match PEP8 (https://www.python.org/dev/peps/pep-0008/). There will probably be a plugin for whatever your chosen editor is to help with this, but the main non-PEP8 things in this patch are: there should be whitespace around operators, we shouldn't be using whitespace to align e.g., assignment statements, and lines should be limited to 79 characters. Because Meld is a fairly old code base, not everything in Meld is PEP8, but all new code should be; if PEP8 and the surrounding code style differ, PEP8 wins.

Secondly, it's best if you can submit patch in a `git format-patch` style, as that retains authorship details, commit messages, etc. See https://wiki.gnome.org/Newcomers/CodeContributionWorkflow for a detailed suggested workflow; for Meld you can ignore the jhbuild instructions, and using git-bz is optional.

Finally, thanks for the patch. This has been a long standing request in one form or another, and it'll be nice to get this in.

::: meld/filediff.py
@@ +763,3 @@
             self.queue_draw()
 
+    def _filter_text(self, txt, buf, start_offset, end_offset):

I haven't made certain that we can do this, but I think we should be passing in GtkTextMarks (or possibly GtkTextIters, but I'd be nervous about iterator validity issues) here instead of offsets. We can then just copy the iters and use forward_chars() with an offset instead of doing offset calculations relative to the whole buffer.

The main reason for this is that we're doing a *lot* of get_iter_at_offset() calls, which isn't that cheap and also just looks somewhat redundant.

@@ +778,3 @@
+                            end_iter   = buf.get_iter_at_offset(m.start()+s2.find(g, i)+len(g)+start_offset)
+                            buf.apply_tag(dimmed_tag, start_iter, end_iter)
+                            i = s2.find(g, i)+len(g)

So I mentioned this on list, but I think what we're doing here is just weird and kind of wrong. This isn't anything wrong with your patch though! You've just reproduced existing behaviour.

I would like to at least try just doing the obvious thing and only replacing the actually matched characters in the group, rather than re-running a find(). I think this should make this bit a *lot* simpler, as we should be able to just use `m.start(g)` and `m.end(g)` instead of these find() calls.

::: meld/sourceview.py
@@ +128,3 @@
+        dimmed_tag = buf.get_tag_table().lookup("dimmed")
+        dimmed_tag.set_property("foreground", "#999999")
+        dimmed_tag.set_property("foreground-set", True)

Creating the tag like this is fine, but we should probably make the foreground colour part of the style. The styles live in data/styles/meld-{base,dark}.xml, and the colours get associated with a tag in MeldSourceView.on_setting_changed(); see what we do for the "inline" tag there.

Comment 13 David Rabel 2015-10-14 20:11:09 UTC

Created attachment 313332 [details] [review]
Second patch proposal

I overworked the patch according to your annotations. You are right that I want to get involved and therefore we should do it the correct way from the beginning. I'm still motivated to put some time into this patch. ;)

I tried to use git-bz, but it was not working. Maybe because I am using Iceweasel as a web browser and it expects Firefox? But because you said it's optional, I don't want to concentrate on this.

Now GtkTextIters are passed to _filter_text() and forward_chars() is used instead of offset calculations. I also replaced the replace() call, so now only the actual match of the subgroup is filtered.

I have a general question about commenting: I haven't seen too many comments in the source code and by now no comments are included in the patch. But I am not sure how detailed would be good.

Comment 14 Kai Willadsen 2015-10-18 20:20:11 UTC

Sorry I haven't had the time to test this out. I was actually planning to test it and just commit to master, and we can clean it up more there if needs be.

My one remaining issue is again the iterator math. I see that you're reversing back over the matches, presumably to avoid invalidating the iterator. However, I think our lives would be much easier if we just created and passed in a GtkTextMark at the start/end points, and cloned GtkTextIters off that. That way we don't need to worry about doing the forward_chars()/backward_chars() thing and can just reclone an iterator each time, and we don't need to worry about doing the reversal (because GtkTextMarks don't invalidate when the document changes).

As I said though, I'll happily just apply this patch as-is, as soon as I've had a chance to test it properly with groups, etc.

Thanks again.

Comment 15 David Rabel 2015-10-19 21:50:38 UTC

Hi Kai,

I am reversing back over the matches because I did not want to clone a new iterator in each loop cycle. Working with text marks instead of iterators would mean a get_iter_at_mark() call in every place where was a get_iter_at_offset() call before, followed by the forward_chars() call. 

If you mean making TextMarks out of the m.start(i) and m.end(i) points, I don't think that is a good idea. I think to do this we still need to pass the offset to do the calculation of where to put the TextMarks and I think we would have to do it with numbers, so it would still be kind of the same as before, just more complicated. (And including get_iter_at_mark() calls.)

In addition we would have to take care about deleting the TextMarks.

With the reverse iterating, TextMarks won't help us. There are no changes made to the TextBuffer in any case, I think. The reverse iterating is because of the cutting of the return string.


The forward_chars()/backward_chars() stuff is not too beautiful, but probably the cleanest way to reuse the TextIters and therefore to avoid all those get_iter_at_...() calls.


Thank you for your patience. ;-)

By the way: I still would like to know what you think on the topic of source code comments.

Comment 16 Kai Willadsen 2015-10-25 21:46:29 UTC

(In reply to David Rabel from comment #15)
> Hi Kai,
> 
> I am reversing back over the matches because I did not want to clone a new
> iterator in each loop cycle. Working with text marks instead of iterators
> would mean a get_iter_at_mark() call in every place where was a
> get_iter_at_offset() call before, followed by the forward_chars() call. 

Ah okay. That's fair enough, but cloning an iterator from an existing iterator is basically free; I'm pretty sure it's just a struct copy. 

> If you mean making TextMarks out of the m.start(i) and m.end(i) points, I
> don't think that is a good idea. I think to do this we still need to pass
> the offset to do the calculation of where to put the TextMarks and I think
> we would have to do it with numbers, so it would still be kind of the same
> as before, just more complicated. (And including get_iter_at_mark() calls.)
> 
> In addition we would have to take care about deleting the TextMarks.

This isn't quite what I meant. I meant that we'd create a single mark at the start of the killit() call, and then clone iterators from that as needed. Creating an iterator from an existing mark isn't quite as cheap as cloning an iterator, but I believe it will be a lot cheaper than index-based options. It's also (to me at least) clearer in that the character counting stuff will look more obvious in what it's doing.

> With the reverse iterating, TextMarks won't help us. There are no changes
> made to the TextBuffer in any case, I think. The reverse iterating is
> because of the cutting of the return string.

Sure, that's all good. I don't have any problem with the reverse iterating.

> The forward_chars()/backward_chars() stuff is not too beautiful, but
> probably the cleanest way to reuse the TextIters and therefore to avoid all
> those get_iter_at_...() calls.

So I was thinking of a helper that takes the offsets, clones the iters from a mark and then discards everything. Something like:

    start_mark = buf.create_mark("dim-ignored", start_iter)

    def get_match_iters(match, match_index=0):
        start = buf.get_iter_at_mark(start_mark)
        end = start.copy()
        start.forward_chars(match.start(match_index))
        end.forward_chars(match.end(match_index))
        return start, end

And then this would also help avoid the code duplication between the cases where we have regex groups and those where we don't. Maybe have a look and see what you think about something like this?

> Thank you for your patience. ;-)

No problem! The patches are good and I'm very happy with the feature.

Since it works fine I've just pushed your current patch to master, so see how you go with the marks/iters stuff I mentioned, and we can just fix it there.

> By the way: I still would like to know what you think on the topic of source
> code comments.

I tend not to put in code comments unless I think the code is doing something non-obvious and/or deliberately weird. I'm more in favour of docstrings for functions that need some clarification. The current Meld code base has very little in the way of both comments and docstrings (as you will have noticed), but generally has fairly good commit logs and (usually) well-named functions.

Now of course, I'm not going to say no to better code documentation! So it's just about deciding what documentation is useful.

Comment 17 David Rabel 2015-10-27 18:29:54 UTC

(In reply to Kai Willadsen from comment #16)
> So I was thinking of a helper that takes the offsets, clones the iters from
> a mark and then discards everything. Something like:
> 
>     start_mark = buf.create_mark("dim-ignored", start_iter)
> 
>     def get_match_iters(match, match_index=0):
>         start = buf.get_iter_at_mark(start_mark)
>         end = start.copy()
>         start.forward_chars(match.start(match_index))
>         end.forward_chars(match.end(match_index))
>         return start, end
> 
> And then this would also help avoid the code duplication between the cases
> where we have regex groups and those where we don't. Maybe have a look and
> see what you think about something like this?

This seems nice. Only I would omit the TextMark and use the TextIter directly instead:

     def get_match_iters(match, match_index=0):
         start = start_iter.copy()
         end = start_iter.copy()
         start.forward_chars(match.start(match_index))
         end.forward_chars(match.end(match_index))
         return start, end


In my opinion the TextMark is not neccessary. What do you think?


> Since it works fine I've just pushed your current patch to master, so see
> how you go with the marks/iters stuff I mentioned, and we can just fix it
> there.

Unfortunately I already found a bug in the implementation. When there are multiple filters handled by one _filter_text() call, it does not work properly. The reason is, that txt is changed for every filter:

  txt = filt.filter.sub(killit, txt) 

and the offsets in m.begin() and m.end() refer to txt and not to the buffer. I am not quite sure about a clean solution. What came first to my mind is to do the dimming first and afterwards do the actual filtering. 
But since that would be kind of calling sub() twice (with different killit-functions) for every filter, it probably would mean a lot of redundant computing.


How I reproduce this bug: I activate the filter "Script comment" and under that my custom filter (\d)*(\d) 
That I compare to files. One is empty, the other one contains
12
34
Then i put a # in front of the first line. The second line is no longer dimmed, although it should be. 

When I save, close and reopen it, the second line is dimmed again, because now _filter_text() is called twice, once for each line (maybe you know better why). 

Maybe you have some idea?


Kind regards,
  David  

PS: If my explanation of the bug is confusing, please let me know. I'll try to explain it more detailed then.

Comment 18 Kai Willadsen 2015-11-06 22:21:02 UTC

(In reply to David Rabel from comment #17)
> This seems nice. Only I would omit the TextMark and use the TextIter
> directly instead:
> 
>      def get_match_iters(match, match_index=0):
>          start = start_iter.copy()
>          end = start_iter.copy()
>          start.forward_chars(match.start(match_index))
>          end.forward_chars(match.end(match_index))
>          return start, end
> 
> 
> In my opinion the TextMark is not neccessary. What do you think?

Yeah that seems better. I was using the TextMark reflexively because we were changing the buffer, but since we're only changing tags and not the content my paranoia is unnecessary.

<snip>
> I am not quite sure about a clean solution. What came first to my mind is to
> do the dimming first and afterwards do the actual filtering. 
> But since that would be kind of calling sub() twice (with different
> killit-functions) for every filter, it probably would mean a lot of
> redundant computing.

So the problem you've found is real, and it's because the way filters work is that they're applied in the specified order to the text. This has the effect (often unexpected) that later filters can be effected by earlier filters.

Doing two passes won't actually fix this, because we'll be doing the highlighting as though we *don't* have this cumulative filter effect, while still filtering text as though we do.

<snip>
> Maybe you have some idea?

The only idea I currently have is to change how filters apply. I'm okay with this because I doubt that many people are relying on the current behaviour.

The new way to filter text would be to:
 - For each enabled filter, apply it as we currently do to the buffer text, but *do not* actually replace anything.
 - Add the range(s) ignored by each filter to a list of ranges.
 - Merge the lists of ranges to create the overall list of ranges that have been ignored; I think we can do this fairly naively, given how many items we'll be dealing with.
 - Use the overall list of ranges to do the highlighting, and then use it to remove the associated text as we do currently.

Let me know whether that makes sense. I might look at picking this up some time if you're not interested in keeping going.

This *would* change how filters apply, but I'm okay with just making sure the release notes mention this.

While it's unfortunate that the implementation currently in master is slightly buggy, I'm also pretty confident that most users won't ever hit this problem (and even if they do, it's only slightly misleading).

Comment 19 David Rabel 2015-11-08 08:34:06 UTC

Created attachment 315073 [details] [review]
Code cleanup with helper function

Ok, so I added a patch proposal for the helper function.

I would really like to have a closer look on the filtering. What you say sound like a good idea how to do it properly.

Comment 20 David Rabel 2015-11-17 07:27:53 UTC

Created attachment 315729 [details]
FilterRanges Class

Hi again,

is there actually a class that provides the functionality to handle ranges in the way we want to? I did not find one, so I started to write a little class on my own. I attached the mechanism. 
I am not sure if it is self-explainable. I use a helper function and list comprehensions to delete all overlapping ranges and join them to the newly added range at the same time. I just iterated over the list before, but it does not work, because I removed items from the list while I was iterating over it.
I also added a print_ranges() method and some test cases.

Do you want to handle this issue here or shall we file a new bug for it?

If you like the way I am implementing this, I would start working on a patch proposal for the new filtering mechanism.

Regards
  David

Comment 21 Kai Willadsen 2015-11-21 23:59:53 UTC

(In reply to David Rabel from comment #19)
> Created attachment 315073 [details] [review] [review]
> Code cleanup with helper function
> 
> Ok, so I added a patch proposal for the helper function.

Sorry for the delay. I've pushed that cleanup patch now.

(In reply to David Rabel from comment #20)
> Do you want to handle this issue here or shall we file a new bug for it?

Yeah I think a new bug is a good idea. I've just created bug 758478, and I'll add you to the CC list.