GNOME Bugzilla – Bug 709083
Consistent whitespace changes (indentation level) is not shown consistently
Last modified: 2017-12-13 19:06:01 UTC
When diffing files which has a consistently applied change in indentation level (e.g. a certain number of spaces added or removed), the changes are displayed in a way that is hard to understand at a single glance. Meld sees such a situation as a number of spaces added on each line on one of the files. If N spaces are added, meld (arbitrarily) shows this as N spaces are added at the end of the indentation. While a technically correct interpretation, it would be more helpful if meld marked the *first* N spaces as being added -- at the very least if the same changes is done on a line before or after the current one. Consider the attached example. The effect is that the sequence of added spaces jumps about, following the outline of the code. It is hard to understand that a common indentation has been added. If, on the other hand, the leftmost spaces had been marked, a visible column had been shown, clearly showing that a common indentation has been applied (and any deviation from that change would stand out!).
Created attachment 256073 [details] demo file 1
Created attachment 256074 [details] demo file 2
I agree this is a little annoying. Looks like it's due to the diff algorithm used (which appears to be greedy) To reproduce it directly in python: >>> from matchers import MyersSequenceMatcher >>> from matchers import MyersSequenceMatcher as Seq >>> a=(' test1\n test2\n', u'test1\n test2') >>> b=(' test1\n test2', u'test1\n test2') >>> Seq(None, *a) <matchers.MyersSequenceMatcher object at 0x7fbabf975050> >>> Seq(None, *a).get_opcodes() [DiffChunk(tag='delete', start_a=0, end_a=4, start_b=0, end_b=0), DiffChunk(tag='equal', start_a=4, end_a=14, start_b=0, end_b=10), DiffChunk(tag='delete', start_a=14, end_a=18, start_b=10, end_b=10), DiffChunk(tag='equal', start_a=18, end_a=23, start_b=10, end_b=15), DiffChunk(tag='delete', start_a=23, end_a=24, start_b=15, end_b=15)] >>> Seq(None, *b).get_opcodes() [DiffChunk(tag='delete', start_a=0, end_a=4, start_b=0, end_b=0), DiffChunk(tag='equal', start_a=4, end_a=10, start_b=0, end_b=6), DiffChunk(tag='delete', start_a=10, end_a=14, start_b=6, end_b=6), DiffChunk(tag='equal', start_a=14, end_a=23, start_b=6, end_b=15)] >>> Of note is the difference between DiffChunk(tag='equal', start_a=4, end_a=14, start_b=0, end_b=10), DiffChunk(tag='delete', start_a=14, end_a=18, start_b=10, end_b=10), and DiffChunk(tag='equal', start_a=4, end_a=10, start_b=0, end_b=6), DiffChunk(tag='delete', start_a=10, end_a=14, start_b=6, end_b=6), Using difflib's SequenceMatcher directly instead (which meld overrides) you get the correct DiffChunk(tag='equal', start_a=4, end_a=10, start_b=0, end_b=6), DiffChunk(tag='delete', start_a=10, end_a=14, start_b=6, end_b=6),
Additionally, I can't actually find the algorithm that's apparently used: The code links to http://research.janelia.org/myers/Papers/np_diff.pdf Which apparently no longer exists.
Try http://web.archive.org/web/20100415103528/http://research.janelia.org/myers/Papers/np_diff.pdf
Yup. This amounts to implementing some semantic alignment in Meld (see for example https://neil.fraser.name/writing/diff/, section 3.2.2). It would indeed be really nice to have.
Interesting read. :) It would indeed be fun to experiment with different strategies to achieve a good, human-readable alignment. I suspect the best way to start with that is to get setup a test framework, to test that such changes does what they are supposed to to without breaking anything. Oh, I wish I had some time over to do some hacking on this. :)
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/meld/issues/65.