After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 697812 - meld sometimes shows lines as completely different when they only partially differ
meld sometimes shows lines as completely different when they only partially d...
Status: RESOLVED FIXED
Product: meld
Classification: Other
Component: filediff
git master
Other Linux
: Normal normal
: ---
Assigned To: meld-maint
meld-maint
Depends on:
Blocks:
 
 
Reported: 2013-04-11 15:41 UTC by Adam Dingle
Modified: 2013-04-22 21:17 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
first log file (3.92 KB, application/octet-stream)
2013-04-11 15:41 UTC, Adam Dingle
Details
second log file (3.92 KB, application/octet-stream)
2013-04-11 15:41 UTC, Adam Dingle
Details

Description Adam Dingle 2013-04-11 15:41:33 UTC
Created attachment 241264 [details]
first log file

Sometimes meld indicates that two lines are completely different when they actually only differ in a few characters.  For example, use meld to compare the two attached files (excerpts from X server logs).  meld will show every line as being completely different, even though all the lines differ in their timestamps only.
Comment 1 Adam Dingle 2013-04-11 15:41:57 UTC
Created attachment 241265 [details]
second log file
Comment 2 Kai Willadsen 2013-04-11 21:20:04 UTC
This happens because like diff or similar tools, we do a line-based comparison and then follow up with inline highlighting. However, because of the way that inline highlighting works, in can be very slow on long chunks; so that very large chunks don't slow down the user experience too much, we bail on highlighting those.

In this case, the whole file is a single chunk, because there are *no* identical lines to line up on. You can verify this by inserting a couple of corresponding blank (or otherwise identical) lines on either side and you'll see that Meld starts highlighting again.

There are two ways to fix this. The first is to use a text filter (see preferences) to remove the timestamp so that Meld can only see the relevant text. The other way is by manually adding syncpoints, which we've added in what-will-soon-be-1.7.2. However, I've just noticed a bug that these don't actually fix your problem because they won't actually split a block like this.

We could also probably up the limit for when we give up on calculating the inline highlighting; it's currently 800 characters between the two sides, but that's an old limit and we're quite a lot faster now.
Comment 3 Adam Dingle 2013-04-12 10:35:24 UTC
Thanks for the explanation.  It's up to you how to fix this, but I'd love to see a solution that works automatically rather than requiring manual effort (suchas a text filter or syncpoints) from the user.  When I was comparing the X log files in question, I would have been willing to wait even several seconds (with a progress bar, ideally) to be able to see inline highlighting since I really needed to see the differences on each line.  Perhaps you could add a preference which says to always perform inline highlighting.  That would put the user in control - they could have that enabled if they're willing to wait, or disabled for faster performance.
Comment 4 Kai Willadsen 2013-04-17 20:45:21 UTC
(In reply to comment #3)
> When I was comparing the X
> log files in question, I would have been willing to wait even several seconds
> (with a progress bar, ideally) to be able to see inline highlighting since I
> really needed to see the differences on each line.

So we're clear, we could easily be talking minutes on slow systems if we didn't bound the highlighting. In this case you only just hit the threshold, so certainly a bit longer wouldn't have been a problem, but it can be pretty bad.

The progress bar would be nice, but highlighting is done in separate processes, and getting anything useful back would be a fair bit more work than we currently do.

> Perhaps you could add a
> preference which says to always perform inline highlighting.  That would put
> the user in control - they could have that enabled if they're willing to wait,
> or disabled for faster performance.

I really don't want a preference. I think that the Right Way to fix this would be something like:
 * Remove the size-based limitation in favour of a time-based limitation; we won't spend any longer than 5 seconds highlighting any individual chunk. Most comparisons would never hit this.
 * When all of the highlighting is done (or aborted due to time constraints), show a 'we stopped cause it was slow' info bar with a 'no really, just keep going' option.
Comment 5 Adam Dingle 2013-04-17 22:46:41 UTC
That sounds fine to me.
Comment 6 Kai Willadsen 2013-04-20 20:39:12 UTC
The previous plan turned out to be annoyingly difficult due to the Python thread pool that we're using and other considerations. I went with the simpler option of prompting as soon as we skip a chunk, and relying on our comparison cache to not re-do too much work.

The change is in current git (https://git.gnome.org/browse/meld/commit/?id=7ef3a8) and will be in Meld 1.7.3. Thanks for your bug report.
Comment 7 Adam Dingle 2013-04-21 10:12:56 UTC
OK, this looks good.  On my machine when I use Meld to compare the two X log files I've attached here, I see the "Change highlighting incomplete" message, and then when I press "Keep highlighting" I see the  full set of diffs instantly.  And my machine is not particularly fast (it's a 1.8 Ghz Intel Core i5, to be exact).  This suggests to me that we could significantly increase the triggering threshold for this message, unless you know of some case where there would be a user-noticeable delay at or near the current threshold.
Comment 8 Kai Willadsen 2013-04-22 21:17:43 UTC
I do think you're right, so I've bumped it slightly. However, the case you're testing is one of the fastest cases for the highlighting algorithm (i.e., long identical stretches, interspersed with occasional contiguous mismatches). It can get much worse.

The most noticeable case is when you're editing a chunk that's near the threshold. When you're loading a file, a 500ms delay in highlighting is fine; when you're editing one, it's really not good.