After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 611838 - expose sub-stage redraws by streaming raw updates to ClutterX11TexturePixmap
expose sub-stage redraws by streaming raw updates to ClutterX11TexturePixmap
Status: RESOLVED FIXED
Product: mutter
Classification: Core
Component: general
git master
Other Linux
: Normal normal
: ---
Assigned To: mutter-maint
mutter-maint
Depends on:
Blocks:
 
 
Reported: 2010-03-04 20:00 UTC by Robert Bragg
Modified: 2010-06-03 23:45 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
mutter-window: stream raw updates to ClutterX11TexturePixmap (4.30 KB, patch)
2010-03-04 20:00 UTC, Robert Bragg
none Details | Review
mutter-window: stream raw updates to ClutterX11TexturePixmap (7.58 KB, patch)
2010-03-16 15:37 UTC, Robert Bragg
needs-work Details | Review
make sure we create a pixmap for all new mutter windows (1.28 KB, patch)
2010-03-19 19:51 UTC, Robert Bragg
committed Details | Review
mutter-window: stream raw updates to ClutterX11TexturePixmap (9.52 KB, patch)
2010-03-19 19:53 UTC, Robert Bragg
needs-work Details | Review
An experimental optimization to avoid being starved by damage events (5.26 KB, patch)
2010-03-22 15:59 UTC, Robert Bragg
rejected Details | Review
mutter-window: stream raw updates to ClutterX11TexturePixmap (12.48 KB, patch)
2010-04-12 17:32 UTC, Robert Bragg
accepted-commit_now Details | Review
mutter-window: optimization for damage handling (5.34 KB, patch)
2010-04-12 17:37 UTC, Robert Bragg
none Details | Review
mutter-window: stream raw updates to ClutterX11TexturePixmap (12.53 KB, patch)
2010-04-19 14:08 UTC, Robert Bragg
committed Details | Review
mutter-window: optimization for damage handling (5.28 KB, patch)
2010-04-19 14:10 UTC, Robert Bragg
none Details | Review
mutter-window: request BoundingBox damage changes to avoid DOS (3.07 KB, patch)
2010-05-14 13:52 UTC, Robert Bragg
none Details | Review
mutter-window: request BoundingBox damage changes to avoid DOS (3.12 KB, patch)
2010-06-03 22:52 UTC, Robert Bragg
committed Details | Review

Description Robert Bragg 2010-03-04 20:00:27 UTC
Created attachment 155259 [details] [review]
mutter-window: stream raw updates to ClutterX11TexturePixmap

Clutter 1.2 was recently released with some basic support for sub-stage region redraws in response to ClutterX11TexturePixmap updates, I've attached a patch for mutter-window.c that allows mutter to take advantage of this.

I'd be interested to get some review for this patch.
Comment 1 Robert Bragg 2010-03-16 15:37:12 UTC
Created attachment 156274 [details] [review]
mutter-window: stream raw updates to ClutterX11TexturePixmap

I've attached a replacement patch that fixes a bug whereby calling mutter_window_new for a Window that is already mapped wouldn't result in XCompositeNameWindowPixmap being called. Previously this worked because damage handling would call mutter_window_mark_for_repair resulting in a call to check_needs_repair that would name the pixmap.

This version of the patch also renames check_needs_repair and mutter_window_mark_for_repair to check_needs_pixmap and mutter_window_queue_create_pixmap respectively to better reflect what they now do.
Comment 2 Robert Bragg 2010-03-16 15:44:38 UTC
It seems worth also noting that besides being able to only redraw a small region of the stage in response to window damage the change to process raw damage events may reduce lag by avoiding multiple synchronous X requests for each window damaged each time we come to redraw.
Comment 3 Jason Smith 2010-03-17 06:12:13 UTC
I have done some testing with this patch. It has quite a large performance impact on, especially on larger screens. Very nice work.
Comment 4 Tomas Frydrych 2010-03-17 11:13:07 UTC
We have pushed this into Moblin for testing, so far looking good.
Comment 5 Owen Taylor 2010-03-18 18:30:18 UTC
Review of attachment 156274 [details] [review]:

Generally looks OK and like a better way of doing things - hopefully there are no obscure timing issues - damage can be tricky that way, though the raw rectangle mode is the simplest case. Some comments below. Also see http://bugzilla.openedhand.com/show_bug.cgi?id=2040 - if I'm right about the problems there that should cause various update artifacts.

::: src/compositor/mutter-window.c
@@ -825,2 +826,2 @@
 static void
-mutter_window_mark_for_repair (MutterWindow *self)
+mutter_window_queue_create_pixmap (MutterWindow *self)

Since you removed the queue_redraw() call it doesn't actually schedule anything to be done - it just sets a flag. So, the "queue" name is inappropriate. But why did you remove the queue_redraw() call?

@@ +896,1 @@
     clutter_actor_queue_redraw (priv->actor);

We don't really control whether we can freeze updates when the pixmap doesn't change - if there's a zero-copy TFP implementation then not rebinding the pixmap doesn't help. Still, I think it's uncomfortable that with this patch that during effects we freeze updating to a new pixmap, but still pass damage effects on to the TFP implementation. I think we probably should block damage while freezing as well, and then update the whole window when done.

@@ +1254,3 @@
   priv->mapped = meta_window_toplevel_is_mapped (priv->window);
+  if (priv->mapped)
+    mutter_window_queue_create_pixmap (self);

This looks plausible, but it needs to be a separate patch, it doesn't have anything apparent to do with this patch.

@@ +1613,3 @@
+  MutterWindowPrivate *priv = self->priv;
+  ClutterX11TexturePixmap *texture_x11 =
+    CLUTTER_X11_TEXTURE_PIXMAP (priv->actor);

Ugly line break - Mutter is only approximately line-wrapped to 80 characters not strictly line wrapped.
Comment 6 Tomas Frydrych 2010-03-19 14:23:57 UTC
(In reply to comment #5)
> @@ +1254,3 @@
>    priv->mapped = meta_window_toplevel_is_mapped (priv->window);
> +  if (priv->mapped)
> +    mutter_window_queue_create_pixmap (self);
> 
> This looks plausible, but it needs to be a separate patch, it doesn't have
> anything apparent to do with this patch.

This is necessary for this patch to work; in the old code were marking each window for repair when processing damage, but that no longer happens, so if the window is already mapped, we need to mark it at this point (otherwise things like gtk menus do not work).
Comment 7 drago01 2010-03-19 15:15:07 UTC
Just to add what I found out during testing:

It seems that a fast drawing app can cause rendering to stall completely with this patch applied.

A 14000fps drawing glxgears (on a gtx 285) results into no redraws for 10-30 seconds. Throttling glxgears (sync to vblank) seems to fix that i.e when it draws at 60 fps it is fine.

We probably should just not try to track damage for specific regions and just update the whole screen in this case (compiz uses a similar heuristic).

On a slower GPU (GM45) (where glxgears does not render _that_ many frames), the drawing of the shell does not stall but it is slower (as in the same situation without the patch).

OTOH a simple benchmark tool (that runs windowed) goes up from 18fps to 26fps when using this patch. (when using compiz or metacity it runs at 26-28fps).
Comment 8 Owen Taylor 2010-03-19 16:17:51 UTC
(In reply to comment #7)
> Just to add what I found out during testing:
> 
> It seems that a fast drawing app can cause rendering to stall completely with
> this patch applied.
> 
> A 14000fps drawing glxgears (on a gtx 285) results into no redraws for 10-30
> seconds. Throttling glxgears (sync to vblank) seems to fix that i.e when it
> draws at 60 fps it is fine.
> 
> We probably should just not try to track damage for specific regions and just
> update the whole screen in this case (compiz uses a similar heuristic).
> 
> On a slower GPU (GM45) (where glxgears does not render _that_ many frames), the
> drawing of the shell does not stall but it is slower (as in the same situation
> without the patch).
> 
> OTOH a simple benchmark tool (that runs windowed) goes up from 18fps to 26fps
> when using this patch. (when using compiz or metacity it runs at 26-28fps).

Can you find out (sysprof, or attaching to the process with gdb and hitting Control-c), what mutter is doing while the 14000fps drawing glxgears is running - is it doing anything other than simple processing of the damage events?
Comment 9 drago01 2010-03-19 16:33:34 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > Just to add what I found out during testing:
> > 
> > It seems that a fast drawing app can cause rendering to stall completely with
> > this patch applied.
> > 
> > A 14000fps drawing glxgears (on a gtx 285) results into no redraws for 10-30
> > seconds. Throttling glxgears (sync to vblank) seems to fix that i.e when it
> > draws at 60 fps it is fine.
> > 
> > We probably should just not try to track damage for specific regions and just
> > update the whole screen in this case (compiz uses a similar heuristic).
> > 
> > On a slower GPU (GM45) (where glxgears does not render _that_ many frames), the
> > drawing of the shell does not stall but it is slower (as in the same situation
> > without the patch).
> > 
> > OTOH a simple benchmark tool (that runs windowed) goes up from 18fps to 26fps
> > when using this patch. (when using compiz or metacity it runs at 26-28fps).
> 
> Can you find out (sysprof, or attaching to the process with gdb and hitting
> Control-c), what mutter is doing while the 14000fps drawing glxgears is running
> - is it doing anything other than simple processing of the damage events?

When drawing halts it is stuck in meta_compositor_process_event / meta_error_trap_pop_internal .
Comment 10 Robert Bragg 2010-03-19 19:48:04 UTC
(In reply to comment #5)
> Review of attachment 156274 [details] [review]:
> 
> Generally looks OK and like a better way of doing things - hopefully there are
> no obscure timing issues - damage can be tricky that way, though the raw
> rectangle mode is the simplest case. Some comments below. Also see
> http://bugzilla.openedhand.com/show_bug.cgi?id=2040 - if I'm right about the
> problems there that should cause various update artifacts.
> 
> ::: src/compositor/mutter-window.c
> @@ -825,2 +826,2 @@
>  static void
> -mutter_window_mark_for_repair (MutterWindow *self)
> +mutter_window_queue_create_pixmap (MutterWindow *self)
> 
> Since you removed the queue_redraw() call it doesn't actually schedule anything
> to be done - it just sets a flag. So, the "queue" name is inappropriate. But
> why did you remove the queue_redraw() call?

I think this was just a mistake; I'll attach an updated patch, thanks.

> 
> @@ +896,1 @@
>      clutter_actor_queue_redraw (priv->actor);
> 
> We don't really control whether we can freeze updates when the pixmap doesn't
> change - if there's a zero-copy TFP implementation then not rebinding the
> pixmap doesn't help. Still, I think it's uncomfortable that with this patch
> that during effects we freeze updating to a new pixmap, but still pass damage
> effects on to the TFP implementation. I think we probably should block damage
> while freezing as well, and then update the whole window when done.

Right I was assuming there isn't really anything you can do at this level to guarantee the updates stop. If you need better guarantees you can e.g. take a snapshot of the window before you start applying an effect when it's possible the contents may become inconsistent. (e.g. when destroying)

Not rebinding the pixmap seems more reasonable as it may have changed size which I guess would more likely upset effects.

I'll look at updating the patch in this way; though I'm not really convinced this is the right way to tackle this issue.

> 
> @@ +1254,3 @@
>    priv->mapped = meta_window_toplevel_is_mapped (priv->window);
> +  if (priv->mapped)
> +    mutter_window_queue_create_pixmap (self);
> 
> This looks plausible, but it needs to be a separate patch, it doesn't have
> anything apparent to do with this patch.

I had this in a separate fix patch for meego, but you're right I should split this out.

> 
> @@ +1613,3 @@
> +  MutterWindowPrivate *priv = self->priv;
> +  ClutterX11TexturePixmap *texture_x11 =
> +    CLUTTER_X11_TEXTURE_PIXMAP (priv->actor);
> 
> Ugly line break - Mutter is only approximately line-wrapped to 80 characters
> not strictly line wrapped.

sorry, just used to the clutter style, I can change that.
Comment 11 Robert Bragg 2010-03-19 19:51:20 UTC
Created attachment 156581 [details] [review]
make sure we create a pixmap for all new mutter windows

Attaches a small fix that will be required for the follow up patch that changes hos damage is handled, though since it's not significant without this change it doesn't seem to warrant a separate bug.
Comment 12 Robert Bragg 2010-03-19 19:53:04 UTC
Created attachment 156582 [details] [review]
mutter-window: stream raw updates to ClutterX11TexturePixmap

Updates the patch considering Owen's feedback
Comment 13 drago01 2010-03-19 21:02:04 UTC
(In reply to comment #9)

> When drawing halts it is stuck in meta_compositor_process_event /
> meta_error_trap_pop_internal .

OK, it turned out that this is caused by the error trapping in meta_compositor_process_event ... removing them makes it work just fine, this patch just happened to trigger the problem but it did not cause it.
Comment 14 Robert Bragg 2010-03-22 15:59:35 UTC
Created attachment 156763 [details] [review]
An experimental optimization to avoid being starved by damage events

Since drago01 seemed to be seeing performance issues that might be due to mutter being overwhelmed with huge amounts of damage events I cooked up an experimental optimization to destroy the damage object for windows when > 100 raw damage regions have been received in a single window and simply promote to a full window damage + recreate the damage object for the next frame.

I haven't been able to test this myself, as I don't currently have a way to reproduce the problem, and I'm not sure from drago01's last comment if in fact the problem has now gone away, but it's possible this patch could give a bump in performance for some damage heavy clients.
Comment 15 drago01 2010-03-22 16:09:11 UTC
(In reply to comment #14)
> Created an attachment (id=156763) [details] [review]
> An experimental optimization to avoid being starved by damage events
> 
> Since drago01 seemed to be seeing performance issues that might be due to
> mutter being overwhelmed with huge amounts of damage events I cooked up an
> experimental optimization to destroy the damage object for windows when > 100
> raw damage regions have been received in a single window and simply promote to
> a full window damage + recreate the damage object for the next frame.
> 
> I haven't been able to test this myself, as I don't currently have a way to
> reproduce the problem, and I'm not sure from drago01's last comment if in fact
> the problem has now gone away, but it's possible this patch could give a bump
> in performance for some damage heavy clients.

As I wrote in the last comments removing the traps seems to fix the issue I was having (https://bugzilla.gnome.org/show_bug.cgi?id=613398) ... I will try with your patch (with and without the traps to check if there is any difference).
Comment 16 drago01 2010-03-22 16:14:26 UTC
(In reply to comment #14)
> Created an attachment (id=156763) [details] [review]
> An experimental optimization to avoid being starved by damage events
> 
> Since drago01 seemed to be seeing performance issues that might be due to
> mutter being overwhelmed with huge amounts of damage events I cooked up an
> experimental optimization to destroy the damage object for windows when > 100
> raw damage regions have been received in a single window and simply promote to
> a full window damage + recreate the damage object for the next frame.
> 
> I haven't been able to test this myself, as I don't currently have a way to
> reproduce the problem, and I'm not sure from drago01's last comment if in fact
> the problem has now gone away, but it's possible this patch could give a bump
> in performance for some damage heavy clients.

This patch seems to fix the performance (starving) issues even without removing the traps.
Comment 17 Owen Taylor 2010-04-05 21:05:21 UTC
Review of attachment 156581 [details] [review]:

Looks good
Comment 18 Owen Taylor 2010-04-05 21:44:49 UTC
Review of attachment 156582 [details] [review]:

While I agree that freezing the damage events isn't really that meaningful, it seems to work pretty cleanly in just a few lines of code, and (to me) anyways makes this code more self contained - the code is doing the best job it can to freeze updates, and the ultimate effect ends up being an approximation to that based on what the TFP extension actually can do. Rather than having a half-job of freezing updates which can only be explained with a long comment relating to how the TFP extension typically is implemented.

Some small comments, otherwise looks good.

::: src/compositor/mutter-window.c
@@ +67,3 @@
   guint             redecorating           : 1;
 
+  guint		    needs_fake_damage      : 1;

Don't like this name. There's nothing "fake" about the damage, it just hasn't been sent on to Clutter yet. have_pending_damange, perhaps.

@@ +1646,3 @@
+   * fake update so we need to queue a redraw to ensure the pre_paint
+   * will be hit in a finite amount of time. */
+  clutter_actor_queue_redraw (priv->actor);

You don't want to do this. the whole point of "fake damage" (pending damage) is that it isn't sent on to clutter yet, it's when the window is unfrozen, that the redraw needs to be queued. With this removed, I don't think ta separate function is needed, it can just be inlined into process_damage.
Comment 19 Owen Taylor 2010-04-05 22:03:54 UTC
Review of attachment 156763 [details] [review]:

Since the starvation problem seemed to be largely the result of doing a round trip to the X server on each event, let's skip this complexity unless it proves necessary.
Comment 20 drago01 2010-04-05 22:09:57 UTC
(In reply to comment #19)
> Review of attachment 156763 [details] [review]:
> 
> Since the starvation problem seemed to be largely the result of doing a round
> trip to the X server on each event, let's skip this complexity unless it proves
> necessary.

And do what instead? (afaik there is no other fix for this issue merged yet).
Comment 21 Owen Taylor 2010-04-06 18:20:01 UTC
(In reply to comment #20)
> (In reply to comment #19)
> > Review of attachment 156763 [details] [review] [details]:
> > 
> > Since the starvation problem seemed to be largely the result of doing a round
> > trip to the X server on each event, let's skip this complexity unless it proves
> > necessary.
> 
> And do what instead? (afaik there is no other fix for this issue merged yet).

From comment 31, it sounded like removing the error traps as in bug 613398 was enough to fix things, and since that's definitely something we want to do - a big performance problem in general, that's the way I'd like to go. I've reviewed your patch and added an additional patch of my own there now.
Comment 22 drago01 2010-04-06 19:03:04 UTC
(In reply to comment #21)
> (In reply to comment #20)
> > (In reply to comment #19)
> > > Review of attachment 156763 [details] [review] [details] [details]:
> > > 
> > > Since the starvation problem seemed to be largely the result of doing a round
> > > trip to the X server on each event, let's skip this complexity unless it proves
> > > necessary.
> > 
> > And do what instead? (afaik there is no other fix for this issue merged yet).
> 
> From comment 31, it sounded like removing the error traps as in bug 613398 was
> enough to fix things,

Yeah it indeed does.

> and since that's definitely something we want to do - a
> big performance problem in general, that's the way I'd like to go. I've
> reviewed your patch and added an additional patch of my own there now.

OK
Comment 23 Robert Bragg 2010-04-12 17:32:44 UTC
Created attachment 158510 [details] [review]
mutter-window: stream raw updates to ClutterX11TexturePixmap

I updated this patch last week according to some of the comments, though still haven't had a chance to test this patch and reply to the comments but here's the patch for now.
Comment 24 Robert Bragg 2010-04-12 17:37:34 UTC
Created attachment 158511 [details] [review]
mutter-window: optimization for damage handling

although we may not want this optimization for now I did update the damage handling patch in line with the new "mutter-window: stream raw updates to ClutterX11TexturePixmap" patch for reference in case it proves to be worthwhile later.
Comment 25 Robert Bragg 2010-04-12 17:59:50 UTC
(In reply to comment #18)
> Review of attachment 156582 [details] [review]:
> 
> While I agree that freezing the damage events isn't really that meaningful, it
> seems to work pretty cleanly in just a few lines of code, and (to me) anyways
> makes this code more self contained - the code is doing the best job it can to
> freeze updates, and the ultimate effect ends up being an approximation to that
> based on what the TFP extension actually can do. Rather than having a half-job
> of freezing updates which can only be explained with a long comment relating to
> how the TFP extension typically is implemented.
> 
> Some small comments, otherwise looks good.
> 
> ::: src/compositor/mutter-window.c
> @@ +67,3 @@
>    guint             redecorating           : 1;
> 
> +  guint            needs_fake_damage      : 1;
> 
> Don't like this name. There's nothing "fake" about the damage, it just hasn't
> been sent on to Clutter yet. have_pending_damange, perhaps.

this code was reworked a bit in my latest patch so there is now a needs_damage_all boolean that's referenced when we unfreeze a window. If it's true when unfreezing then we do a full pixmap update_area()

> 
> @@ +1646,3 @@
> +   * fake update so we need to queue a redraw to ensure the pre_paint
> +   * will be hit in a finite amount of time. */
> +  clutter_actor_queue_redraw (priv->actor);
> 
> You don't want to do this. the whole point of "fake damage" (pending damage) is
> that it isn't sent on to clutter yet, it's when the window is unfrozen, that
> the redraw needs to be queued. With this removed, I don't think ta separate
> function is needed, it can just be inlined into process_damage.

I've slightly formalized the freeze mechanism now with two new functions: _mutter_window_freeze and _mutter_window_thaw that get used for those effects that want to try and disable updates during their progress and this gives us a single place to do things when a window is unfrozen.

the patch now sets priv->needs_damage_all = TRUE if damage is received for a window while it's frozen. If we see priv->needs_damage_all == TRUE when the window is unfrozen we issue a full pixmap update_area() as noted above.
Comment 26 Owen Taylor 2010-04-13 13:17:00 UTC
Review of attachment 158510 [details] [review]:

Looks very good to me with a couple of tiny style points.

A side-note about the freeze/thaw for damage is that even with zero-copy TFP, it works if we *arent'* redrawing the stage for some other reason. Not a factor here if we are animating an effect, but see bug 587255 for an old bug where I added the ability to freeze updates on a MutterWindow to get better window updates when an application is drawing.

::: src/compositor/mutter-window.c
@@ +49,3 @@
   GdkRegion        *bounding_region;
 
+  guint             freeze;

I'd slightly prefer freeze_count

@@ +860,3 @@
+    }
+
+  if (self->priv->freeze)

Think it's slightly better to write > 0, see below for what that is consistent with.

@@ +1706,3 @@
+is_frozen (MutterWindow *self)
+{
+  return self->priv->freeze ? TRUE : FALSE;

Clearer as:

 return self->priv->freeze_count > 0;
Comment 27 Robert Bragg 2010-04-19 14:08:29 UTC
Created attachment 159078 [details] [review]
mutter-window: stream raw updates to ClutterX11TexturePixmap

I've updated the path with the minor change suggested by owen; renaming the .freeze member to .freeze_count.
Comment 28 Robert Bragg 2010-04-19 14:10:35 UTC
Created attachment 159080 [details] [review]
mutter-window: optimization for damage handling

I've rebased the damage optimization patch. (here just for reference in case it becomes desirable in the future)
Comment 29 Robert Bragg 2010-04-23 10:59:03 UTC
ok I've pushed the patches (except the optimization for damage handling) to master as commits cfa30f98768a2f4d and 0c146403526f though I forgot to add a bug reference to the commit messages; sorry about that.
Comment 30 Robert Bragg 2010-05-14 13:49:55 UTC
Apparently I don't have the permissions reopen this bug? Can someone please do that for me?

Our QA team for MeeGo recently reported some regressions in the compositor performance for some use cases which we have pin-pointed back to this change.

The case they were testing appear to be DOSing the compositor with huge amounts of damage events which were slowing down the compositor and sometimes completely freezing updates until it could catch up.

Full details and some crude benchmarks can be found here:
http://bugs.meego.com/show_bug.cgi?id=42

In the end, of the three tests I ran there were three different options working best:
x11perf -shmput10 works best with the original code
glxgears works best with patch0
gtkperf works best with patch1

Where the "original code" is the code before we applied anything to change the damage handling, patch0 is the experimental "optimization for damage handling" patch I attached here and patch1 is a new patch I wrote to request BoundingBox damage regions changes instead of Raw events.

I'll attach the new patch for review here.
Comment 31 Robert Bragg 2010-05-14 13:52:15 UTC
Created attachment 161056 [details] [review]
mutter-window: request BoundingBox damage changes to avoid DOS

This patch requests BoundingBox damage region changes instead of streaming Raw
damage events to the compositor. This greatly reduces the amount of events we
receive and avoids overwhelming the compositor which can result in frozen
window contents.
Comment 32 drago01 2010-05-14 13:54:19 UTC
(In reply to comment #30)
> Apparently I don't have the permissions reopen this bug? Can someone please do
> that for me?

Done.
Comment 33 Robert Bragg 2010-05-14 14:11:22 UTC
OK so apparently the particular category the meego bug is currently filed under isn't public yet, so I'll try and copy over the comments of interest here...

Most of the start comments are very hand wavy so I'll ignore them.

Here are the introductory performance differences they were seeing with x11perf -shmput10:

 zhengkui      2010-04-13 18:54:04 PDT 

ok, I tested both 0312 and 0319 image. the result as following:
0312: mesa-7.7.99-2.2, xserver-1.7.99.3-6.1, xorg-x11-drv-intel-2.10.0-7.2,
libdrm-2.4.19-1.1, kernel-2.6.33-8.1, clutter-1.1.14-2.7,
mutter-2.28.1_0.10-1.1
0319: mesa-7.7.99.901~git346298c765-2.1, xserver-1.7.99.901~git67a8c659f2,
xorg-x11-drv-intel-2.10.902-1.1, libdrm-2.4.19-2.1, kernel-2.6.33.1-4.1,
clutter-1.2.2-2.4, mutter-2.29.0_0.0-1.1
1). run test without window manager, we got the similar result either on 0312
or 0319
 800000 reps @   0.0106 msec ( 94200.0/sec): ShmPutImage 10x10 square
 800000 reps @   0.0106 msec ( 94100.0/sec): ShmPutImage 10x10 square
 800000 reps @   0.0106 msec ( 94200.0/sec): ShmPutImage 10x10 square
 800000 reps @   0.0106 msec ( 94200.0/sec): ShmPutImage 10x10 square
 800000 reps @   0.0106 msec ( 94200.0/sec): ShmPutImage 10x10 square
 4000000 trep @   0.0106 msec ( 94200.0/sec): ShmPutImage 10x10 square
2). run test with the window manager, we find there is a drop down:
0312: 
 240000 reps @   0.0220 msec ( 45500.0/sec): ShmPutImage 10x10 square
 240000 reps @   0.0216 msec ( 46400.0/sec): ShmPutImage 10x10 square
 240000 reps @   0.0218 msec ( 45900.0/sec): ShmPutImage 10x10 square
 240000 reps @   0.0213 msec ( 46900.0/sec): ShmPutImage 10x10 square
 240000 reps @   0.0214 msec ( 46700.0/sec): ShmPutImage 10x10 square
1200000 trep @   0.0216 msec ( 46300.0/sec): ShmPutImage 10x10 square
0319:
 160000 reps @   0.0346 msec ( 28900.0/sec): ShmPutImage 10x10 square
 160000 reps @   0.0334 msec ( 30000.0/sec): ShmPutImage 10x10 square
 160000 reps @   0.0370 msec ( 27000.0/sec): ShmPutImage 10x10 square
 160000 reps @   0.0333 msec ( 30000.0/sec): ShmPutImage 10x10 square
 160000 reps @   0.0333 msec ( 30100.0/sec): ShmPutImage 10x10 square
 800000 trep @   0.0343 msec ( 29200.0/sec): ShmPutImage 10x10 square

The Sole difference in mutter between those images turns out to be:
mmit fc4d3f292ca45e05ab8c2f571b537f62fc834c4a
Author: Robert Bragg <robert@linux.intel.com>
Date:   Tue Mar 2 18:02:28 2010 +0000

    mutter-window: stream raw updates to ClutterX11TexturePixmap
Comment 34 Robert Bragg 2010-05-14 14:12:15 UTC
 rib      2010-05-13 16:59:27 PDT

It seems to be that the compositor is being DOS'd by the Raw damage events that
my patch made mutter request from the server.

The advantage of switching to Raw events was that it avoided 2 non-synchronous
and 2 synchronous X requests per window with damage, per frame:

* 2 (non-sync) to create/destroy a temporary region to copy the damage region
into.
* 1 to request the server to copy the damage region into our given region
* 1 to fetch that region back into the client.

The disadvantage is that some 2D benchmarks can DOS the compositor and updates
can freeze until the compositor manages to catch up.

I have tried two approaches to fixing this, and it seems which ever way it's
done there will be cases where one is better than the other so I'll attach both
patches.

These are crude "performance" measurements I made:

x11perf -smhput10
original: 280,000
raw: 160,000 + window updates freeze
patch0: 200,000
patch1: 240,000

glxgears
original: 600
raw: 1382
patch0: 1475
patch1: 1240

original:
before my patch to take advantage of glXCopySubBuffer + stream Raw damage
events

raw:
with the culprit patch to take advantage of glXCopySubBuffer + stream Raw
damage events.

patch0:
A patch I wrote a while ago to delete the damage object when > 100 events are
received, and recreate it when we start the next frame. I attached it to this
upstream bug, but we agreed not to apply it until there was evidence that it
was needed: https://bugzilla.gnome.org/show_bug.cgi?id=611838

patch1:
A new patch that requests the server for BoundingBox changes so we only get
notified when the region bounding all outstanding damage changes. This reduces
the number of events we receive which should avoid DOS issues, but it requires
a synchronous request to clear the damage region each frame, for each window
with damage.

It can be seen that neither patch gets x11perf back to its original
performance, but glxgears also provides an example that is significantly slower
with the original code so it's not just a simple case of reverting the patch.

Also although patch0 provides the best performance for glxgears, patch0 isn't
as good as patch1 for x11perf -shmput10. 

I think it's important to note that x11perf or glxgears are not good benchmarks
so it may require further testing to decide which patch is best.

On a related side note we should also cherry-pick these two commits from the
upstream gnome repo into Mutter:

9350c586ecdccbe Fix accounting of frozen with maximize/unmaximize
c69f8e4dfe4d951 Don't trap XErrors in meta_compositor_process_event
Comment 35 Robert Bragg 2010-05-14 14:13:00 UTC
 rib      2010-05-13 17:50:31 PDT

I've also done some tests with gtkperf (GtkPerf 0.40) since that was mentioned
in comment 14:

Please refer to comment 17 for a summary of each configuration:

original:
GtkEntry - time:  0.15
GtkComboBox - time:  3.08
GtkComboBoxEntry - time:  2.56
GtkSpinButton - time:  0.35
GtkProgressBar - time:  0.24
GtkToggleButton - time:  0.33
GtkCheckButton - time:  0.27
GtkRadioButton - time:  0.43
GtkTextView - Add text - time:  1.18
GtkTextView - Scroll - time:  0.80
GtkDrawingArea - Lines - time:  4.65
GtkDrawingArea - Circles - time:  9.06
GtkDrawingArea - Text - time:  4.07
GtkDrawingArea - Pixbufs - time:  0.56
 --- 
Total time: 27.75

raw:
GtkEntry - time:  0.16
GtkComboBox - time:  2.94
GtkComboBoxEntry - time:  2.63
GtkSpinButton - time:  0.32
GtkProgressBar - time:  0.21
GtkToggleButton - time:  0.36
GtkCheckButton - time:  0.33
GtkRadioButton - time:  0.49
GtkTextView - Add text - time:  1.20
GtkTextView - Scroll - time:  0.72
GtkDrawingArea - Lines - time:  7.90
GtkDrawingArea - Circles - time: 14.14
GtkDrawingArea - Text - time:  4.49
GtkDrawingArea - Pixbufs - time:  1.57
 --- 
Total time: 37.47

patch0:
GtkEntry - time:  0.16
GtkComboBox - time:  3.07
GtkComboBoxEntry - time:  2.92
GtkSpinButton - time:  0.33
GtkProgressBar - time:  0.22
GtkToggleButton - time:  0.36
GtkCheckButton - time:  0.32
GtkRadioButton - time:  0.50
GtkTextView - Add text - time:  1.22
GtkTextView - Scroll - time:  0.86
GtkDrawingArea - Lines - time:  6.79
GtkDrawingArea - Circles - time: 11.45
GtkDrawingArea - Text - time:  4.02
GtkDrawingArea - Pixbufs - time:  0.64
 --- 
Total time: 32.87

patch1:
GtkEntry - time:  0.16
GtkComboBox - time:  3.02
GtkComboBoxEntry - time:  2.58
GtkSpinButton - time:  0.32
GtkProgressBar - time:  0.21
GtkToggleButton - time:  0.36
GtkCheckButton - time:  0.32
GtkRadioButton - time:  0.49
GtkTextView - Add text - time:  1.24
GtkTextView - Scroll - time:  0.68
GtkDrawingArea - Lines - time:  4.26
GtkDrawingArea - Circles - time:  8.78
GtkDrawingArea - Text - time:  3.50
GtkDrawingArea - Pixbufs - time:  0.57
 --- 
Total time: 26.49

So this means we have three tests each with a different option working best:
x11perf -shmput10 works best with the original code
glxgears works best with patch0
gtkperf works best with patch1

At least this clarifies that we should to do *something* since none of them
perform best with the current code.

I don't think switching back to the original code is the right thing to do
since we know there are cases that are faster with the newer code (and anything
is more compelling that x11perf), and we also expect a power saving from being
able to update small regions of the screen via glXCopySubBuffer which wasn't
supported with the original code.

I'll talk with tf tomorrow and hopefully we can pick an option.
Comment 36 Robert Bragg 2010-05-14 14:14:53 UTC
 tf      2010-05-14 03:28:02 PDT

(In reply to comment #20)
> I don't think switching back to the original code is the right thing to do

Agreed. I think the gtkperf test is the most real-life like test of these, and
hence the one we really care about, so I would be for the the BoundingBox
approach (patch1).
Comment 37 Robert Bragg 2010-05-14 14:15:17 UTC
 ebassi      2010-05-14 05:30:50 PDT

I tend to agree with tf: x11perf is an awful "benchmark" which tests all the
wrong things, unless you find yourself stranded with you Delorean in 1996.

GtkPerf is not a great benchmark but at least is testing modern code paths.

since patch[1] is the one that gives a boost to GtkPerf and doesn't really
affect much the performance of glxgears, I'd second the +1 vote to the
BoundingBox approach.

as a side note: it might be interesting to talk to the X11 team and figure out
an extension to the XDAMAGE protocol to make it easier for compositors to do
the right thing.
Comment 38 Owen Taylor 2010-05-14 16:02:48 UTC
Reopened (and added Robert to the "Mutter Developers" group)
Comment 39 Robert Bragg 2010-06-03 22:52:41 UTC
Created attachment 162706 [details] [review]
mutter-window: request BoundingBox damage changes to avoid DOS

I've attached a replacement for my "mutter-window: request BoundingBox damage changes to avoid DOS" patch which fixes a bug in mutter_window_process_damage where the received_damage boolean might not be updated if the window is frozen.
Comment 40 Owen Taylor 2010-06-03 23:16:09 UTC
Review of attachment 162706 [details] [review]:

So, since Clutter only repaints a bounding box currently, this should mostly be a pretty straight-up performance win.

Using non-raw damage events can be tricky because it makes it easier to get race conditions, but I think this is OK. It is possible to have sequences like:

 <damage to +0+0x1x1>     (A: reports damage to +1+1x1x1)
 <damage to +99+99x1x1>   (B: reports damage to +1+1x100x100)
                                                                We decide to repaint
 <damage to +00x1x1>      (C: reports damage to +0+0x100x100)
                                                                We XSubtractDamage
                                                                We repaint +1+1x100x100
                                                                We get event C
                                                                We XSubtractDamage
                                                                We repaint +0+0x100x100
        
So we can end up painting more than we would with raw damage event,s but I can't figure out a case when we would paint less. 

(What we found out with GTK+ is that damage event serials have to be treated with caution for non-raw damage, since the X server is suppressing damage events. Just because the last damage you got occurred at serial T doesn't mean that damage didn't occur at serial T + 1. But we aren't even looking at serials here at all.)

OK to commit if you fix up the wrong comment and commit message text about synchronous requests.
Comment 41 Robert Bragg 2010-06-03 23:44:51 UTC
ok thanks, I've pushed this to master as:

commit 91d82bf8c78ac7d2ae8a39f5ed619a413e2d6320
Author: Robert Bragg <robert@linux.intel.com>
Date:   Thu May 13 21:43:30 2010 +0100

    mutter-window: request DamageReportBoundingBox report level
    
    In commit d34ae764769 I switched mutter-window to ask for Raw rectangles
    from the X server. This avoided 2 non synchronous and 2 synchronous X
    requests per window with damage, per frame; 2 (non-sync) to
    create/destroy a temporary region to copy the damage region into, 1 to
    request the server to copy the damage region into a our given region and
    another to fetch that region back into the client. The problem with raw
    events though is that it's possible to DOS the compositor with them.
    
    Instead of receiving an event for every bit of damage this patch instead
    asks the server to only report BoundingBox changes to the damage region.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=611838