GNOME Bugzilla – Bug 578613
Conflict solver mode (auto merge with ancestor)
Last modified: 2012-11-09 09:33:20 UTC
This patch adds conflict solver mode to meld. The patch includes 2 new modules: matchers.py - alternative matchers: 1) MyersMatcher (seems to be the most suitable matcher for merging/conflict solving) 2) LCSMatcher (well, pretty useless :)) merge.py - enhanced Differ class that handles some conflicts automatically The patch also addresses some issues I have spotted in the code (see tickets #578303, #577092). There are as few UI-related changes as possible: only the middle pane is editable when running in conflict solver mode and icons and mouse actions are handled accordingly. In order to start meld in conflict solver mode it needs to be run with 4 file parameters as follows: meld myfile basefile theirfile mergedfile This is the initial version, but if such a functionality is welcome I am eager to participate in further development.
Created attachment 132475 [details] [review] Conflict solver mode patch
Hi Piotr, that's a feature several people have been asking for and new contributors are always welcome. BTW we're gearing up for a release soon so if you want early feedback, join the mailing list and post a link to here.
Created attachment 132641 [details] [review] 2nd version auto merge with bug fixes and improvements This is the 2nd version of the auto merge patch. What's new: - bug fixes, performance improvements and code clean-up - proper support for text filters - indicator of remaining unresolved conflicts (statusbar) - option to use additional text filters when for conflicting chunks (e.g. ignore whitespace changes)
Created attachment 132749 [details] [review] auto merge v.3 A couple of additional bug fixes and improvements: - fixed mouse actions - fix in Merger class - improved conflict size calculations The patch is now fully functional IMHO.
Created attachment 132894 [details] [review] auto merge v.3 patch re-based to ref f995505fcfcea8815dd106acbe0630f6e1aed50f after repository migration to git.
How soon will this code be available in a release? I'd like to test it and I'm willing to grab the source from git if necessary, but since I use meld for work then I prefer to run stable bits when possible :)
Created attachment 133051 [details] [review] auto merge v.4 2 changes: 1) mouse actions are fixed 2) the way conflicting lines are shown (marker + original text)
Created attachment 134085 [details] [review] auto merge v.5 Changes: - refactoring and clean-up - smarter conflict solver - Myers matcher performance improvements
Hi Piotr, thanks for the continuing patches. I have a few questions. Is the myers matcher much better/faster than the python difflib one? Or is there some other reason it's included? For the "merge" feature, don't we want to be able to auto merge at any time there is a 3 way merge open? i.e. just hook the operation up to a button. I'd aim to have a non modal interface where possible. I notice for instance merge-mode makes two panes non editable. How should the "4 file" (yours,base,mine,output) merge work? Should the user have to explicitly confirm that the merge was a) complete and b) successful when the tab is closed? I think currently tools currently regard any write to the base file as "success" which is not great.
Hi Stephen, Quite a lot of questions, but I'll try to address all of them :) 1) Myers diff. This is probably the best known diff algorithm and it is commonly used in almost all VCS. It is much slower than difflib.SequenceMatcher algorithm, but it is guaranteed to return the shortest edit script (or the longest common sequence, which means exactly the same) while SequenceMatcher does not guarantee anything but the fact that the diff will look nice in some cases. See http://piastucki.bdl.pl/matchers.html for comparison. Original paper by Myers: http://www.xmailserver.org/diff2.pdf. (BTW: nice reading about diff optimizations http://neil.fraser.name/writing/diff/) 2) 4 files. Actually, auto merge requires 3 source files and the fourth file name (name only!) is used as the name of the merged file. My implementation is based on the assumption that this mode is used to resolve conflicts reported by SVN, CVS or similar VCS. In this case there is no point in changing the source files, because they are just kind of "links": base - ancestor (some revision in the past in repository, cannot be changed) theirs - current version in the repository (cannot be changed either) mine - modified local copy before the conflict occurred (changing this file does not make any sense either as we can modify the merged one instead) In case of SVN the files are named as follows (NNNN - some revision number): mine - filename.ext.working base - filename.ext.merge-left.rNNNNN theirs - filename.ext.merge-right.rNNNNN merged - filename.ext That's why both panes are disabled. This is exactly my use case and I was desperately searching for a decent tool I could use to resolve SVN conflicts without the need to run TortoiseMerge under Wine. So far I managed to complete 3 simple (up to 10 conflicting files, up to 100 conflicting lines in a file) merges with meld :) 3) Auto merge in 3-way diff mode. If auto merge is invoked to resolve conflicts from 3-way diff then all panes should probably remain editable as they represent physical files in different folders for instance. Adding a "magic wand" button that will perform auto merge on demand in 3-way diff mode sounds reasonable too. 4) The way auto merge works. Technically speaking: we compute 3-way diff between base, mine and theirs. Then we apply the changes to the base file (all non-conflicting and all conflicting that we can resolve automatically). Finally, we compute 3-way diff between merged (base + changes), mine and theirs. There is a simple counter of unresolved conflicts in the statusbar. When the counter reaches 0 the user should be able to save the middle file, otherwise there should be a warning asking the user to confirm if the file with unresolved conflicts should really be saved. Moreover, there may be a "Resolve" button that will be enabled as soon as the counter reaches 0. 5) Missing features. I think there are at least 2 missing features: integration with VCS plugins (dbl click on conflicting file in VCS view should run auto merge) and buttons to jump to the next/previous conflict. Cheers, Piotr
1) Ah thats interesting. I'm surprised that the Myers diff is slower because thats the algoritm that gnu diff uses (albeit with major tweaks). And according to http://selenic.com/pipermail/mercurial/2005-December/006065.html it's still faster than a c implementation of the difflib algorithm (bdiff) The one thing which attracted me about the Myers algorithm is that it looks more friendly for incremental update. I've not seen any papers on it, but it looks like a "greedy local snake update" algorithm would handle the most common case for meld which is updating while interactively editing. For this patch, diffutil already has logic to locate conflicts. Isn't it enough to go through the list given by _merge_diffs in reverse order? 2) I really like the idea of providing an interface to explicitly confirm that a conflicts have been resolved. One thing about the editability though - sometimes a conflict will require manual editing which won't match either theirs or mine. Won't that leave a conflict marker? I often apply the merged changes into theirs or mine as a reminder to self that the conflict has been solved. One other implementation issue is that it would be great if to reduce the number of "if merge_mode" blocks to the absolute minimum. It might be easier to have a merge_output member instead of having to check len(files) and merge_mode.
Stephen, >For this patch, diffutil already has logic to locate conflicts. Isn't it >enough to go through the list given by _merge_diffs in reverse order? IMHO, it is not good enough. As mentioned before the edit script generated by difflib.SequenceMatcher is far from optimal in many cases. As a result the number of changes is higher than it needs to be (i.e. the number of changes returned by Myers diff). Obviously it results in more conflicts. I'd rather wait e.g. 200ms more when diff is computed than solve these additional conflicts manually. >Ah thats interesting. I'm surprised that the Myers diff is slower because >thats the algoritm that gnu diff uses Yes, there are some tweaks in gnu implementation. It may not yield the shortest edit script because of them sometimes, though it happens very rarely. For me the performance of Myers diff implementation included in this patch is satisfactory, but I can consider porting some of these tweaks from gnu diff if it is too slow. I cannot see any way Myers diff can be faster than difflib algorithm apart from situations when the two files are almost the same. If the number of differences is high Myers diff will need to perform much more iterations and string comparisons to find the middle snake. And if the files are completely different there is no way we can significantly speed up Myers diff. >One thing about the editability though - sometimes a conflict will require >manual... That's why I introduced the indicator or unresolved conflicts (it actually counts lines of code). The number of unresolved conflicts shown by this patch includes only the conflicts that could not be resolved automatically when the merged file was created. You're right that if there is an unresolved conflict that is replaced by a piece of code which does not match either theirs and mine it will be marked as a conflict, however, the number of unresolved conflicts will be lowered at the same time. > if merge_mode Yes, I fully agree this code may need some cleanup.
See http://piastucki.bdl.pl/matchers2.html on how to fool difflib easily :) >I've not seen any papers on it, but it >looks like a "greedy local snake update" algorithm would handle Currently I store the list of snakes locally inside a method, but the middle snake may be made turned into a class field and updated on text changes instead of recomputing the diff.
Created attachment 137671 [details] [review] auto merge v6 I have been testing and improving this patch for a couple of weeks and here are the results :) The auto merge logic is much smarter and more reliable then before. More info regarding the algorithm can be found here: http://piastucki.bdl.pl/meld/automerge.html Regards, Piotr
Created attachment 137880 [details] [review] auto merge v7 Another bunch of changes and improvements: 1) significant performance optimizations in MyersMatcher (pre-processing) 2) as suggested by Stephen, merge_mode flag has been removed and a new class called FileMerge has been added. A couple of methods have been introduced in FileDiff by extracting the existing code to allow FileMerge to override them: - _get_custom_status_text() - _get_custom_textbuffer() - _after_files_loaded() - _linkmap_draw_icon() - _linkmap_process_event()
This feature is obviously great, but the patch is *huge* and touches lots of files. It would be a whole lot easier to merge this if the patch was split up and had commit messages. There are quite a lot of changes that appear unrelated to auto-merge itself. Some of these changes look fine and can probably be merged straight away, whereas other bits, like the status bar changes, are IMO just taping over existing problems and shouldn't go in in their current state. As a start, I'd be happy to merge most of the diffutil.py changes if they were accompanied by real commit messages. Just commenting on the diffutil.py parts of the patch for the moment... + def _get_matcher(self, a, b): + return IncrementalSequenceMatcher(None, a, b) + The matcher class should probably be a class member variable, not an instance method. + def _auto_merge(self, using, texts): + l0, h0, l1, h1, l2, h2 = self._merge_blocks( using ) + if h0-l0 == h2-l2 and texts[0][l0:h0] == texts[2][l2:h2]: + if l1 != h1 and l0 == h0: + tag = "delete" This looks like a bugfix. If it is, then what does it fix, and can it be committed separately? + out0 = (tag, l1, h1, l0, h0) + out1 = (tag, l1, h1, l2, h2) + yield out0, out1 Why is _auto_merge a generator? + if seq0[0][LO] == seq1[0][LO]: + if seq0[0][0] == 'insert': + high_seq = 0 + elif seq1[0][0] == 'insert': + high_seq = 1 Another bugfix? Also, the rest of the file uses double quotes for string literals. + if high_mark == other_diff[LO] and (high_diff[0] != 'insert' or other_diff[0] != 'insert'): + break Unless I've messed something up, if high_mark == other_diff[LO] and not (high_diff[0] == other_diff[0] == "insert"): looks more natural to me. Is this another bugfix? Also, double quotes. Unrelatedly, the webpage explaining the auto-merge stuff is really nice. When this feature goes in, I'd suggest that maybe that page could live somewhere under http://live.gnome.org/Meld.
Created attachment 137932 [details] [review] auto merge v8 No new functionality, just some refactoring.
Created attachment 137933 [details] auto merge v8 split into smaller patches
Kai, As you wish :) Here is the patch divided into 8 smaller patches. Some explanation can be found below or (with some graphs) at http://piastucki.bdl.pl/meld/automerge_patches.html BTW: I am not a Python developer (90% Java, 10% C), so do not hesitate to change anything in my code if you find it awkward. 0001-Improve-conflict-detection-for-insert-chunks.patch Fix conflict detection between insert hunks and other hunks. A conflict was reported when replace or delete hunk started at the same position as insert. As for the change in FileDiff class, I do not see any reason why this code was there. Hunks should be highlighted according to the results returned by Differ as it is too late to change anything when it comes to drawing to the screen. BTW: IMHO, Blank lines should be removed before running diff algorithm not at highlighting. 0002-Improve-highlighting-of-conflicting-delete-hunks.patch Change the way the same overlapping delete hunks are highlighted and make it consistent witht the way non-overlapping delete hunks are shown. 0003-Extract-methods-in-Differ-class.patch Make room for extending Differ class needed by Merger class. _auto_merge() is a generator because Merger may return more than one chunk in a single call. 0004-Replace-get_iter_at_line-with-get_iter_at_line_or_eo.patch Minor fix for incorrect highlighting in some rare cases (EOF). Feel free to skip this patch when in doubt. 0005-Introduce-methods-dummy-that-will-be-overriden-lat.patch Now, add a bunch of methods that do not do anything useful in Differ, but they will be overridden in Merger to make the auto merge mode work. 0006-Extract-linkmap-drawing-methods.patch Extract some methods based on code related to drawing linkmap. They will be used by Merger to change the look of linkmap in auto merge mode. 0007-Add-auto-merge-classes.patch Just add all new classes. 0008-Accept-4-parameters-and-run-auto-merge-in-this-case.patch Change command line arguments check to accept 4 files and run auto merge mode.
*** Bug 169233 has been marked as a duplicate of this bug. ***
The first two patches look good to me, so I've pushed them. The get_iter_at_line_or_eof() patch looks reasonable to me as well, but I don't know how to reproduce the bug. Do you have a testcase? I'm still looking at the rest of the patches.
Looking at the pictures from the kink you sent, I wonder why for the picture for the first patch miss the blue highlighting on line 8 in the first pane, before the line 6 & 7 deletion, this looks like a bug. Is that bug a regression or is it already present ?
I do not quite see your point. Would you please clarify? The pictures called before and after refer to results before and after applying the patch. There is no difference between line 8 in left and middle pane so there is no highlighting.
Ah OK I thought the line could be highlighted blue with no inline diffs showing, but maybe it would be misleading, so it is fine as is.
Created attachment 137980 [details] [review] Fixed MyersMatcher optimizations There was a bug in MyersMatcher introduced with performance improvements. I am attaching fixed version of 0007-Add-auto-merge-classes.patch.
Created attachment 138013 [details] Rebased and slightly modified version of the v8 patchset This archive contains the remaining patches in the series from attachment #137933 [details], including the updated version of MyersMatcher from attachment #137980 [details]. I've removed the _get_matcher method in favour of a class member, and updated the later patch accordingly.
Created attachment 138628 [details] Slightly modified v8.1 (bugfixes + cleanup) A couple of minor fixes and improvements + some code clean-up
The lastest version doesn't apply against head. The first two patches have already been pushed; the fifth fails to apply. I'll push the get_iter_at_line_or_eof() patch if you can provide a test case. Speaking more generally, I wonder if it would be reasonable to add Merger to diffutil.py, and move IncrementalSequenceMatcher to matchers.py.
Created attachment 138667 [details] Auto merge v9 I have removed obsolete patches and fixed the one that did not apply against head. Regarding get_iter_at_line_or_eof(), the issue is reproducible only if there are 2 replace chunks that are longer than 8000 characters and they end at EOF. I suppose it happens very rarely, so the patch is not high priority. The issue is fixed by the following line: ends = [get_iter_at_line_or_eof(b, l) for b, l in zip(bufs, (c[2], c[4]))] The other 2 lines modified by the patch do not fix anything, they are changed just to make the code use get_iter_at_line_or_eof() consistently everywhere. See http://piastucki.bdl.pl/meld/automerge_patches.html for a screenshot. Moving all kinds all SequenceMatchers to matchers.py module - good idea, actually that's why I named the module like this. Adding Merger to diffutil.py - I am not sure, but it seems to be OK as long as the code in diffutil.py remains easy to read.
I'd be really keen to see this patch make its way in to Meld. Meld is a great diff tool but it'd be great if it functioned as a merge tool, as well - this would make it a complete graphical SVN solution I could use with RapidSVN. Any news on how long this might take to get in? The sooner, the better. :-)
This dropped off my radar for a while, but I too would love to get it in. Other than necessary minor fixes, there are some questionable additions to FileDiff. It feels like FileDiff has been hacked up a bit to fit in things for the new FileMerge class, and I'd much rather that any odd workarounds went into FileMerge. Specific things I'm not mad keen about include: * the _after_files_loaded() hook - it does nothing in FileDiff * the _get_custom_textbuffer() accessor - I'm not sure why this is even necessary * the extraction of paint_pixbuf_at() - I guess it's okay, but it should probably just be a top-level utility function * the status text changes - I'm working on a patch which changes the implementation of Meld's status bar so that adding stuff to the statusbar can be done in a less hacky way. However, these changes can just be left out and applied later anyway. In the quick review I did of the new classes, they looked pretty good. The challenge is minimising the collateral damage associated with getting them in.
Hi, I think it should be up to meld developers how, when and if this functionality is included, so the integration provided in the patch has merely 2 goals: - it should work - it should make as few changes to the existing code as possible (e.g. avoid any changes but method extraction) Of course, I will try to help if any additional info is needed. * _after_files_loaded() - I think having empty methods is acceptable as long as FileDiff is considered to be a base class that exposes some extension points to its children, otherwise it should be reworked * _get_custom_textbuffer() - kind of a hack. I added the 4th hidden buffer that is returned by this method to re-use existing logic that can load 3 files and to avoid showing one of them (common ancestor) at the same time. Currently I am trying to improve Myers matcher performance by removing some Python code (which is slow) and replacing it with operations on maps (which are fast).
Created attachment 146847 [details] Rebased patchset with improved matcher I am attaching a rebased version of the patch. Moreover, it contains a new version of matchers.py which includes 4 implementations of Myers algorithm. By default the fastest one (based on http://research.janelia.org/myers/Papers/np_diff.pdf) is used. The remaining implementation may be removed, I left them for the ones who are curious :)
Created attachment 146913 [details] Rebased patchset with matcher improvements This patchset contains an additional patch that removes all unneeded stuff from matchers.py and fixes a couple of issues I have spotted 0004-Enable-auto-merge-mode.patch needs to be reworked as it is just an ugly hack that allows users to pass 4 command line arguments. BTW: I have been looking at PatienceSequenceMatcher (both Python and C implementations included in Bazaar) which is supposed to simplify merges by making sure that no unique matching lines are missed. After a couple of tests I think it looks quite interesting and using it instead of Myers sequence matcher may also be an option. I wonder what bzr users think about it.
Created attachment 147079 [details] Changes in integration with FileDiff Here is another patch which tries to address some of the issues mentioned by Kai. Instead of introducing dummy methods in FileDiff, _set_files_internal() has been split.
I'll have a proper look when I have time (next week?) but from a quick glance this looks *much* better to me. Thanks for pushing through with this.
Created attachment 147265 [details] Fixed patch This patchset includes an additional patch that fixes linkmap actions and Myers matcher optimisations.
I've just pushed the first two patches. I'm attaching an updated patch set with formatting fixes for the classes added in patch 1 (was patch 3). Patches 1, 3-6 should all be squashed into a single commit, but I'm giving them individually here for convenience. There are PEP8-related formatting issues remaining in patch 2 (no spaces after commas) that need to be fixed. As you said above, this patch is also not really the right way to do it. I haven't thought about it much, but maybe a --merge command line flag would be appropriate? I've read through the main patch in a cursory fashion, and it looks reasonable to me. However, the lack of a good commit message is a real problem for this patch. The patch adds three new classes in 619 lines of code, but the commit message is just "Add auto merge classes". Could I ask that you please write a reasonably detailed commit message explaining what was added, what it does, and why? For a better explanation than I can give, see: http://lists.cairographics.org/archives/cairo/2008-September/015092.html As an imperfect example, look at the commit message I wrote for my (still-not-applied) cursor tracking patch, which is *much* smaller than this addition: http://mail.gnome.org/archives/meld-list/2009-September/msg00010.html (the big patch is 0002) Otherwise, this patch set is basically ready to apply.
Created attachment 148251 [details] Updated patch set
Created attachment 148484 [details] Updated patch with more elaborated commit message I have combined patches 1, 3-6 together and added some explanation to the commit message. Please let me know if more info may be needed.
I've committed these patches now. There are some remaining problems (see below) but they can be worked out post-commit. The big thing that should be fixed before a release is figuring out what the command-line interface should be; the four-argument version that's there now works, but it's not good. It would also be nice to have a way of starting an auto-merge from within Meld, but that's not essential. Questions, things that would be nice to fix, and future options: - find_common_prefix/find_common_suffix seem unnecessarily complicated. Unless it's actually faster to do the binary split (is it? I can't see why it would be) this should almost be a copy of os.path.commonprefix. - In MyersSequenceMatcher.__init__, there is the comment: "fields needed by preprocessor so that preprocessing may shared by more than 1 LCS algorithm" but there is only one LCS algorithm present. Is this a left-over from experiments with PatienceSequenceMatcher? - There are two pre-processing optimisations in MyersSequenceMatcher, and switches to turn them on and off. Why would anyone want to turn them off? If they're effective optimisations, can the options simply be removed? - In MyersSequenceMatcher.preprocess, we have: self.lines_discarded = m - j > 10 or n - k > 10 is the constant a threshold for whether it is worthwhile using the post-discard version? It would be nice to have an explanatory comment. - In FileMerge.__init__ takes a num_panes argument, but surely three is the only sensible value? - In FileMerge._merge_files, why do we create a second merge.Merger? - FileMerge.set_files looks like it deals with fewer than four files, but as far as I can see this shouldn't happen, and won't work even if it does. - The start of Merger._auto_merge duplicates Differ._auto_merge. Why not just call Differ._auto_merge first and then do a pass to resolve any 'conflict' blocks? - It would be nice to have some test files available. We don't currently have any regression tests or anything, but it would be reasonable to start keeping a few simple testcases, etc. A good start might be the example files you use for http://piastucki.bdl.pl/meld/automerge.html? Anyway - this is all pretty nice, and should definitely be featured in the release notes. Thanks for all the work.
1) prefix/suffix matching - yes, it is much faster to do the binary split, for more details see http://neil.fraser.name/news/2007/10/09/. Basically, Python built-in functions are much faster than the code written in Python so it is a good idea to replace loops in the code with calls to built-in stuff. That's why I would like to find a good way to use maps more heavily in my implementation of Myers diff, however, I have not found it yet :) 2) fields needed by preprocessor - yes, this is a left-over from the experiments 3) I cannot see any reason to switch them off, the switches are left just in case the optimizations cause any problems 4) is the constant a threshold for whether it is worthwhile using the post-discard version? Yes, it is. This is kind of a heuristic based on the tests I ran and it definitely requires a good comment. 5) Yes, that's right. 6) Hmmmm, good question. I remember that I had some issues when the highlighting code tried to call the main Merger instance while the files were being merged (AFAIR highlithing was broken). But there were a lot of changes in this code since then and maybe this is not the case any more... 7) FileMerge.set_files, it does happen when you click Refresh for instance and it actually seems to work. 8) Yes, calling Differ._auto_merge is a good idea.
Created attachment 149683 [details] [review] Add call to Differ_auto_merge from Merger I am attaching a patch that addresses issue #8 by adding a proper call to Differ._auto_merge in Merger._auto_merge.
Can someone clarify whether this patch would allow support for 2-way auto merge? The focus seems to have been for scm-style merging which takes 4 file arguments (3 inputs, one output). But an equally valid need exists for 2-way auto merge (2 inputs, one output). I use Gentoo, and it's package management system includes configuration file merging that uses a command format like the following: sdiff -s -o %merged %orig %new Using an alternative like xxdiff, the command becomes: xxdiff --decision --show-merged-pane --merged-filename %merged %orig %new Or with kdiff3, it's: kdiff3 --auto -m -o %merged %orig %new But since Meld doesn't provide any command-line options for output filename nor explicit auto-merge, it's unusable for this purpose. Thanks!
Created attachment 155959 [details] [review] Refactor auto merge code I am attaching an enhanced version of the previous patch. Apart from replacing some redundant code with a proper call to Differ._auto_merge it also extracts AutoMergeDiffer class from Merger class.
When invoking auto-merge mode with this patch applied, I get a traceback ending in:
+ Trace 220938
for panetext[1] in merger.merge_3_files():
I assume that this is supposed to be merge_3_way. Also, while the commit message itself is fine, please keep in mind line length; lines should be wrapped to 72 characters.
Created attachment 156082 [details] [review] Corrected refactoring patch Here is a fixed version of the previous patch.
Looks good, thanks. Pushed.
Created attachment 156717 [details] Two minor patches addressing issues 3 and 4 above These patches should address issues 3 and 4 that I mentioned above. Piotr, could you please have a look and check that they're okay?
Kai, both patches look fine.
Implemented in meld 1.7