GNOME Bugzilla – Bug 602933
Unicode U+2029 (PARAGRAPH SEPARATOR) causes meld to get out of sync
Last modified: 2010-11-23 10:57:29 UTC
Created attachment 148440 [details] A file with a U+2029 symbol If a file contains the PARAGRAPH SEPARATOR Unicode symbol U+2029, strange things happen. Wrong lines get marked as mismatches. I'll attach files 1.txt and 2.txt which illustrate the problem if compared with meld.
Created attachment 148441 [details] A file almost identical to 1.txt but without the U+2029 symbol Compare this to 1.txt with meld
The attached file 1.txt contains a U+2029 at the end of line 2 just before the line feed. Meld shows an empty line between lines 2 and 3, but colors the differences as if the empty line didn't exist.
What's the expected behaviour here? We could normalise things so that Meld interprets U+2029 as a simple line break (so Meld would just show an extra line inserted between lines 2 and 3). The other option is to treat it as an unknown arbitrary character, which is basically what we're doing now (though maybe we could display it better).
(In reply to comment #3) > What's the expected behaviour here? I'd prefer some kind of an indicator for a special character to be displayed. It depends on the use case whether it makes sense to additionally insert a line break. In any case, the highlighting of differences should not get out of sync with the actual content displayed. I'll attach a screenshot of what meld shows when comparing the two attachments I sent earlier. Notice that: - on the right, the text "line 2" shouldn't be highlighted - the empty line on the left should be marked as an extra line - "line 4" on the left should be marked as an extra line I'll also attach three mock-ups of different suggested behaviour.
Created attachment 148590 [details] screenshot of current incorrect behavior: on the left, line 2 has a trailing U+2029
Created attachment 148591 [details] mock-up of U+2029 interpreted as a regular newline
Created attachment 148592 [details] mock-up of U+2029 drawn as a symbol without a newline
Created attachment 148593 [details] mock-up of U+2029 drawn as a symbol followed by a regular newline
Could you please test the patch I've just attached to bug 627940? It should fix the problem, though it doesn't actually do any nice display of the different linebreak.
I've pushed that patch to head, so closing this bug. I've opened bug 635593 about taking into account newlines and showing newline differences. Thanks for your bug report.
I just checked out the git repository which already contains Kai's patch, and it works beautifully.