After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 608800 - alt-dragging gimp windows crashes gnome-shell
alt-dragging gimp windows crashes gnome-shell
Status: RESOLVED FIXED
Product: mutter
Classification: Core
Component: general
unspecified
Other Linux
: Normal critical
: ---
Assigned To: mutter-maint
mutter-maint
Depends on:
Blocks:
 
 
Reported: 2010-02-02 14:24 UTC by Jean-François Fortin Tam
Modified: 2010-02-10 19:38 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Fix crash when struts change during grab operation (6.25 KB, patch)
2010-02-09 22:01 UTC, Owen Taylor
committed Details | Review

Description Jean-François Fortin Tam 2010-02-02 14:24:42 UTC
To reproduce on my end with the gnome shell jhbuild version:

1. start gnome-shell
2. right-click an image in nautilus, open with GIMP 2.6
3. press and hold Alt, drag the image window

gnome-shell then crashes on my end. To reproduce the crash again, you need to start gimp *after* gnome-shell, it seems.

I see this in the terminal:
mutter:ERROR:core/boxes.c:1161:meta_rectangle_edge_aligns: code should not be reached
Shell killed with signal 6
Comment 1 Raphael Freudiger 2010-02-02 17:24:42 UTC
I have the same problem. Shell even crashes without dragging the window but only open an image in gimp 2.6. When doing nothing gnome-shell crashes regularly.
I don't know how to get more detailed information what is going wrong. But if you could tell me how to do it, I will try it.
Comment 2 Jean-François Fortin Tam 2010-02-04 01:39:12 UTC
This seems to be fixed with today's updates...
Comment 3 Owen Taylor 2010-02-04 04:36:18 UTC
(In reply to comment #2)
> This seems to be fixed with today's updates...

Think you just got lucky today, no changes have been made that would have fixed this. (I've seen this crash too a few times.)
Comment 4 Owen Taylor 2010-02-04 04:37:04 UTC
Most likely a Mutter problem, reassigning
Comment 5 Jon Nettleton 2010-02-04 04:51:33 UTC
I traced this error back to line 726 in edge_resistance.c.  It is happening during the g_free called for each key of edges_to_be_freed as specified in g_hash_table_new_full. 

This may not be the fix, but right now it looks like the simplest extraneous piece to remove. 


diff --git a/src/core/edge-resistance.c b/src/core/edge-resistance.c
index 8409ff8..30a45d4 100644
--- a/src/core/edge-resistance.c
+++ b/src/core/edge-resistance.c
@@ -713,7 +713,7 @@ meta_display_cleanup_edges (MetaDisplay *display)
                * array that it is in.  So store it in a hash table for later
                * freeing.  Could also do this in a simple linked list.
                */
-              g_hash_table_insert (edges_to_be_freed, edge, edge);
+              g_hash_table_insert (edges_to_be_freed, edge, NULL);
             }
         }
     }
Comment 6 Jon Nettleton 2010-02-04 06:12:40 UTC
After looking at this further the code is failing in two places but on the same structure.  So somehow MetaEdge Edge is getting corrupted.  The one place where I see a MetaEdge->side_type getting set to something other than META_SIDE_* is in boxes.c where overlap->side_type is set to -1 because the comment is "We don't know what else to set it to"
Comment 7 Owen Taylor 2010-02-04 17:22:33 UTC
(In reply to comment #5)
> I traced this error back to line 726 in edge_resistance.c.  It is happening
> during the g_free called for each key of edges_to_be_freed as specified in
> g_hash_table_new_full. 

What do you mean by this? The assertion failure definitely isn't happening at line 726, which is, in my copy:

  g_array_free (edge_data->right_edges, TRUE);

> This may not be the fix, but right now it looks like the simplest extraneous
> piece to remove. 
> 
> 
> diff --git a/src/core/edge-resistance.c b/src/core/edge-resistance.c
> index 8409ff8..30a45d4 100644
> --- a/src/core/edge-resistance.c
> +++ b/src/core/edge-resistance.c
> @@ -713,7 +713,7 @@ meta_display_cleanup_edges (MetaDisplay *display)
>                 * array that it is in.  So store it in a hash table for later
>                 * freeing.  Could also do this in a simple linked list.
>                 */
> -              g_hash_table_insert (edges_to_be_freed, edge, edge);
> +              g_hash_table_insert (edges_to_be_freed, edge, NULL);
>              }
>          }
>      }

I can't see how this change makes a difference. The value inserted into the hash table should be irrelevant. (This code would make considerably more sense and be more efficient if the writer had taken their own advice and used "a simple linked list")
Comment 8 Owen Taylor 2010-02-04 19:40:17 UTC
Tracing backwards. The assertion failure means that:

gboolean
meta_rectangle_edge_aligns (const MetaRectangle *rect, const MetaEdge *edge)

was called with edge->side_type not one of TOP/BOTTOM/LEFT/RIGHT.

This function is called from exactly 3 places in edge-resistance.c:

from find_nearest_position() - here edge is one of the edges array passed in from apply_edge_snapping() (unless there is something wrong with the bsearch, but I don't see anything wrong there), and that will be one of edge_data->left_edges/right_edges/top_edges/bottom_edges.

from apply_edge_resistance() - here edge is one of the edges array passed in from apply_edge_resistance_to_each_side() (unless there is something wrong with the iteration - but it seems pretty safe), and that will be one of edge_data->left_edges/right_edges/top_edges/bottom_edges.

This arrays are created in edge-resistance.c:cache_edges() and never modified. In cache_edges(), they are created with the code:

      while (tmp)
        {
          MetaEdge *edge = tmp->data;
          switch (edge->side_type)
            {
            case META_SIDE_LEFT:
            case META_SIDE_RIGHT:
              g_array_append_val (edge_data->left_edges, edge);
              g_array_append_val (edge_data->right_edges, edge);
              break;
            case META_SIDE_TOP:
            case META_SIDE_BOTTOM:
              g_array_append_val (edge_data->top_edges, edge);
              g_array_append_val (edge_data->bottom_edges, edge);
              break;
            default:
              g_assert_not_reached ();
            }
          tmp = tmp->next;
        }

[ side note. It seems that the left/right and top/bottom edges arrays are always absolutely identical! ]

So, at the point that the edges were created, there cannot be anything in the edge array with side_type != -1.

So, my only guess from that analysis is that we have memory scribbling going on somewhere. There could also be potentially be a problem if the cached edges got invalidated while we were inside a grab operation, but that should trigger the assertion failure in apply_edge_resistance_to_each_side:

  g_assert (display->grab_edge_resistance_data != NULL);

and not a problem like this. I don't see anything that could happen *while* apply_edge_resistance_to_each_side() is running that could cause invalidation - all that code sticks to edge-resistance.c and boxes.c.
Comment 9 Owen Taylor 2010-02-04 19:58:02 UTC
Don't have much of an idea of what further to do here without being able to reliably reproduce. If someone finds a good recipe for that, please add it here.
Comment 10 Jon Nettleton 2010-02-07 02:15:08 UTC
Program received signal SIGABRT, Aborted.
0x00000034802326c5 in raise () from /lib64/libc.so.6
(gdb) bt
  • #0 raise
    from /lib64/libc.so.6
  • #1 abort
    from /lib64/libc.so.6
  • #2 g_assertion_message
    from /lib64/libglib-2.0.so.0
  • #3 meta_rectangle_edge_aligns
    at core/boxes.c line 1161
  • #4 apply_edge_resistance
    at core/edge-resistance.c line 391
  • #5 apply_edge_resistance_to_each_side
    at core/edge-resistance.c line 639
  • #6 meta_window_edge_resistance_for_move
    at core/edge-resistance.c line 1153
  • #7 update_move
    at core/window.c line 7607
  • #8 meta_window_handle_mouse_grab_op_event
    at core/window.c line 8000
  • #9 event_callback
    at core/display.c line 1935
  • #10 filter_func
    at ui/ui.c line 84
  • #11 ??
    from /usr/lib64/libgdk-x11-2.0.so.0
  • #12 ??
    from /usr/lib64/libgdk-x11-2.0.so.0
  • #13 ??
    from /usr/lib64/libgdk-x11-2.0.so.0
  • #14 g_main_context_dispatch
    from /lib64/libglib-2.0.so.0
  • #15 ??
    from /lib64/libglib-2.0.so.0
  • #16 g_main_loop_run
    from /lib64/libglib-2.0.so.0
  • #17 main
    at core/main.c line 686

Comment 11 Jon Nettleton 2010-02-07 02:16:45 UTC
I have been able to reproduce this crash by opening gscan2pdf and while scanning something grab the modal progress dialog and quickly move it top to bottom over the existing preferences modal box.
Comment 12 Owen Taylor 2010-02-09 21:39:55 UTC
Jon managed to create a reliable reproducer and got this very informative Valgrind trace:

==10584== Invalid read of size 4
==10584== at 0x43F1E6: find_index_of_edge_near_position (edge-resistance.c:108)
==10584== by 0x43F8B3: apply_edge_resistance (edge-resistance.c:371)
==10584== by 0x43FF87: apply_edge_resistance_to_each_side (edge-resistance.c:639)
==10584== by 0x440F26: meta_window_edge_resistance_for_move (edge-resistance.c:1153)
==10584== by 0x472E5E: update_move (window.c:7607)
==10584== by 0x47389A: meta_window_handle_mouse_grab_op_event (window.c:8000)
==10584== by 0x4382B6: event_callback (display.c:1935)
==10584== by 0x49A853: filter_func (ui.c:84)
==10584== by 0x348865D505: ??? (in /usr/lib64/libgdk-x11-2.0.so.0.1800.6)
==10584== by 0x348865F557: ??? (in /usr/lib64/libgdk-x11-2.0.so.0.1800.6)
==10584== by 0x348865FA8D: ??? (in /usr/lib64/libgdk-x11-2.0.so.0.1800.6)
==10584== by 0x3481A3920D: g_main_context_dispatch (in /lib64/libglib-2.0.so.0.2200.4)
==10584== Address 0xe86c114 is 4 bytes inside a block of size 24 free'd
==10584== at 0x4A04D72: free (vg_replace_malloc.c:325)
==10584== by 0x3481A36BFB: g_list_foreach (in /lib64/libglib-2.0.so.0.2200.4)
==10584== by 0x421A25: meta_rectangle_free_list_and_elements (boxes.c:770)
==10584== by 0x47667E: meta_workspace_invalidate_work_area (workspace.c:725)
==10584== by 0x477127: meta_workspace_set_builtin_struts (workspace.c:969)
==10584== by 0x3480E05DBB: ffi_call_unix64 (in /usr/lib64/libffi.so.5.0.6)
==10584== by 0x3480E05B43: ffi_call (in /usr/lib64/libffi.so.5.0.6)
==10584== by 0x51756D1: g_function_info_invoke (ginvoke.c:240)
==10584== by 0xD18EE75: gjs_invoke_c_function (function.c:676)
==10584== by 0xD18F62D: function_call (function.c:860)
==10584== by 0x348B650300: js_Invoke (in /usr/lib64/xulrunner-1.9.1/libmozjs.so)
==10584== by 0x348B641582: ??? (in /usr/lib64/xulrunner-1.9.1/libmozjs.so)

What it shows is that the problem is that when the work area is invalidated, the edges that are referenced from the edge resistance data are freed without recomputing the edge resistance data. Will work on a patch.
Comment 13 Owen Taylor 2010-02-09 22:01:43 UTC
Created attachment 153368 [details] [review]
Fix crash when struts change during grab operation

Since meta_workspace_invalidate_work_area() frees the edges
workspace->screen_edges and workspace->monitor_edges, we must clean up
our cached edge resistance data when the invalidate_work_area() is
called on the active workspace, or when the workspace changes.

Make the computation of the edge resistance data lazy so that it
will be recomputed the next time we try to access it.
meta_display_compute_resistance_and_snapping_edges() is made
private to edge-resistance.c

Invaliding the data when active workspace changes also will improve
correctness for edge resistance when the current workspace changes
during a grab operation. (Even with this fix we still don't try to
handle window positions changing during a grab operation; that can't
cause a crash since, unlike screen and monitor edges, the window edges
are freshly allocated, it will just cause slight oddness in that
corner case.)

Root cause tracked down due to much effort by Jon Nettleton.
Comment 14 Colin Walters 2010-02-10 15:48:51 UTC
Review of attachment 153368 [details] [review]:

::: src/core/workspace.c
@@ +482,3 @@
+  meta_display_cleanup_edges (workspace->screen->display);
+   * a current resize or move operation */
+  /* Free any cached pointers to the workspaces's edges from

Why does this one need a call?  For the move window case?  I guess it doesn't hurt too much to invalidate in all, but maybe if that's the case this call could be done under if (move_window != NULL)?
Comment 15 Colin Walters 2010-02-10 15:48:53 UTC
Review of attachment 153368 [details] [review]:

::: src/core/workspace.c
@@ +482,3 @@
+  meta_display_cleanup_edges (workspace->screen->display);
+   * a current resize or move operation */
+  /* Free any cached pointers to the workspaces's edges from

Why does this one need a call?  For the move window case?  I guess it doesn't hurt too much to invalidate in all, but maybe if that's the case this call could be done under if (move_window != NULL)?
Comment 16 Owen Taylor 2010-02-10 16:54:45 UTC
Review of attachment 153368 [details] [review]:

::: src/core/workspace.c
@@ +482,3 @@
+   * a current resize or move operation */
+  meta_display_cleanup_edges (workspace->screen->display);
+

I'm not 100% sure if Metacity/Mutter allows the old Enlightenment-style of switching workspaces while moving (if you have switch workspaces bound to Alt-F1/F2 this can be pretty handy), but I would like the code to be correct if that is allowed or is allowed in the future. 

The workspace could also be switch due to arbitrary reentrancy from plugin code. 

meta_display_cleanup_edges() is a no-op if you don't have edges cached, so there's no practical performance gain from moving it around. If the workspaces switches without moving the window being moved or resized along with then I'm sure we have bigger problems, but conceptually we'd still want to invalidate the edges, so I don't see it belonging in the move_window != NULL case.

Considering that cleanup_edges() is a no-op if we have no grab operation or the grab operation is something like a switch-workspaces grab, do you see any problem with leaving it like this?
Comment 17 Colin Walters 2010-02-10 17:05:41 UTC
This is a good explanation, thanks!  Looks good to go as-is then.
Comment 18 Owen Taylor 2010-02-10 19:38:09 UTC
Attachment 153368 [details] pushed as 2a823ef - Fix crash when struts change during grab operation