GNOME Bugzilla – Bug 611838
expose sub-stage redraws by streaming raw updates to ClutterX11TexturePixmap
Last modified: 2010-06-03 23:45:36 UTC
Created attachment 155259 [details] [review] mutter-window: stream raw updates to ClutterX11TexturePixmap Clutter 1.2 was recently released with some basic support for sub-stage region redraws in response to ClutterX11TexturePixmap updates, I've attached a patch for mutter-window.c that allows mutter to take advantage of this. I'd be interested to get some review for this patch.
Created attachment 156274 [details] [review] mutter-window: stream raw updates to ClutterX11TexturePixmap I've attached a replacement patch that fixes a bug whereby calling mutter_window_new for a Window that is already mapped wouldn't result in XCompositeNameWindowPixmap being called. Previously this worked because damage handling would call mutter_window_mark_for_repair resulting in a call to check_needs_repair that would name the pixmap. This version of the patch also renames check_needs_repair and mutter_window_mark_for_repair to check_needs_pixmap and mutter_window_queue_create_pixmap respectively to better reflect what they now do.
It seems worth also noting that besides being able to only redraw a small region of the stage in response to window damage the change to process raw damage events may reduce lag by avoiding multiple synchronous X requests for each window damaged each time we come to redraw.
I have done some testing with this patch. It has quite a large performance impact on, especially on larger screens. Very nice work.
We have pushed this into Moblin for testing, so far looking good.
Review of attachment 156274 [details] [review]: Generally looks OK and like a better way of doing things - hopefully there are no obscure timing issues - damage can be tricky that way, though the raw rectangle mode is the simplest case. Some comments below. Also see http://bugzilla.openedhand.com/show_bug.cgi?id=2040 - if I'm right about the problems there that should cause various update artifacts. ::: src/compositor/mutter-window.c @@ -825,2 +826,2 @@ static void -mutter_window_mark_for_repair (MutterWindow *self) +mutter_window_queue_create_pixmap (MutterWindow *self) Since you removed the queue_redraw() call it doesn't actually schedule anything to be done - it just sets a flag. So, the "queue" name is inappropriate. But why did you remove the queue_redraw() call? @@ +896,1 @@ clutter_actor_queue_redraw (priv->actor); We don't really control whether we can freeze updates when the pixmap doesn't change - if there's a zero-copy TFP implementation then not rebinding the pixmap doesn't help. Still, I think it's uncomfortable that with this patch that during effects we freeze updating to a new pixmap, but still pass damage effects on to the TFP implementation. I think we probably should block damage while freezing as well, and then update the whole window when done. @@ +1254,3 @@ priv->mapped = meta_window_toplevel_is_mapped (priv->window); + if (priv->mapped) + mutter_window_queue_create_pixmap (self); This looks plausible, but it needs to be a separate patch, it doesn't have anything apparent to do with this patch. @@ +1613,3 @@ + MutterWindowPrivate *priv = self->priv; + ClutterX11TexturePixmap *texture_x11 = + CLUTTER_X11_TEXTURE_PIXMAP (priv->actor); Ugly line break - Mutter is only approximately line-wrapped to 80 characters not strictly line wrapped.
(In reply to comment #5) > @@ +1254,3 @@ > priv->mapped = meta_window_toplevel_is_mapped (priv->window); > + if (priv->mapped) > + mutter_window_queue_create_pixmap (self); > > This looks plausible, but it needs to be a separate patch, it doesn't have > anything apparent to do with this patch. This is necessary for this patch to work; in the old code were marking each window for repair when processing damage, but that no longer happens, so if the window is already mapped, we need to mark it at this point (otherwise things like gtk menus do not work).
Just to add what I found out during testing: It seems that a fast drawing app can cause rendering to stall completely with this patch applied. A 14000fps drawing glxgears (on a gtx 285) results into no redraws for 10-30 seconds. Throttling glxgears (sync to vblank) seems to fix that i.e when it draws at 60 fps it is fine. We probably should just not try to track damage for specific regions and just update the whole screen in this case (compiz uses a similar heuristic). On a slower GPU (GM45) (where glxgears does not render _that_ many frames), the drawing of the shell does not stall but it is slower (as in the same situation without the patch). OTOH a simple benchmark tool (that runs windowed) goes up from 18fps to 26fps when using this patch. (when using compiz or metacity it runs at 26-28fps).
(In reply to comment #7) > Just to add what I found out during testing: > > It seems that a fast drawing app can cause rendering to stall completely with > this patch applied. > > A 14000fps drawing glxgears (on a gtx 285) results into no redraws for 10-30 > seconds. Throttling glxgears (sync to vblank) seems to fix that i.e when it > draws at 60 fps it is fine. > > We probably should just not try to track damage for specific regions and just > update the whole screen in this case (compiz uses a similar heuristic). > > On a slower GPU (GM45) (where glxgears does not render _that_ many frames), the > drawing of the shell does not stall but it is slower (as in the same situation > without the patch). > > OTOH a simple benchmark tool (that runs windowed) goes up from 18fps to 26fps > when using this patch. (when using compiz or metacity it runs at 26-28fps). Can you find out (sysprof, or attaching to the process with gdb and hitting Control-c), what mutter is doing while the 14000fps drawing glxgears is running - is it doing anything other than simple processing of the damage events?
(In reply to comment #8) > (In reply to comment #7) > > Just to add what I found out during testing: > > > > It seems that a fast drawing app can cause rendering to stall completely with > > this patch applied. > > > > A 14000fps drawing glxgears (on a gtx 285) results into no redraws for 10-30 > > seconds. Throttling glxgears (sync to vblank) seems to fix that i.e when it > > draws at 60 fps it is fine. > > > > We probably should just not try to track damage for specific regions and just > > update the whole screen in this case (compiz uses a similar heuristic). > > > > On a slower GPU (GM45) (where glxgears does not render _that_ many frames), the > > drawing of the shell does not stall but it is slower (as in the same situation > > without the patch). > > > > OTOH a simple benchmark tool (that runs windowed) goes up from 18fps to 26fps > > when using this patch. (when using compiz or metacity it runs at 26-28fps). > > Can you find out (sysprof, or attaching to the process with gdb and hitting > Control-c), what mutter is doing while the 14000fps drawing glxgears is running > - is it doing anything other than simple processing of the damage events? When drawing halts it is stuck in meta_compositor_process_event / meta_error_trap_pop_internal .
(In reply to comment #5) > Review of attachment 156274 [details] [review]: > > Generally looks OK and like a better way of doing things - hopefully there are > no obscure timing issues - damage can be tricky that way, though the raw > rectangle mode is the simplest case. Some comments below. Also see > http://bugzilla.openedhand.com/show_bug.cgi?id=2040 - if I'm right about the > problems there that should cause various update artifacts. > > ::: src/compositor/mutter-window.c > @@ -825,2 +826,2 @@ > static void > -mutter_window_mark_for_repair (MutterWindow *self) > +mutter_window_queue_create_pixmap (MutterWindow *self) > > Since you removed the queue_redraw() call it doesn't actually schedule anything > to be done - it just sets a flag. So, the "queue" name is inappropriate. But > why did you remove the queue_redraw() call? I think this was just a mistake; I'll attach an updated patch, thanks. > > @@ +896,1 @@ > clutter_actor_queue_redraw (priv->actor); > > We don't really control whether we can freeze updates when the pixmap doesn't > change - if there's a zero-copy TFP implementation then not rebinding the > pixmap doesn't help. Still, I think it's uncomfortable that with this patch > that during effects we freeze updating to a new pixmap, but still pass damage > effects on to the TFP implementation. I think we probably should block damage > while freezing as well, and then update the whole window when done. Right I was assuming there isn't really anything you can do at this level to guarantee the updates stop. If you need better guarantees you can e.g. take a snapshot of the window before you start applying an effect when it's possible the contents may become inconsistent. (e.g. when destroying) Not rebinding the pixmap seems more reasonable as it may have changed size which I guess would more likely upset effects. I'll look at updating the patch in this way; though I'm not really convinced this is the right way to tackle this issue. > > @@ +1254,3 @@ > priv->mapped = meta_window_toplevel_is_mapped (priv->window); > + if (priv->mapped) > + mutter_window_queue_create_pixmap (self); > > This looks plausible, but it needs to be a separate patch, it doesn't have > anything apparent to do with this patch. I had this in a separate fix patch for meego, but you're right I should split this out. > > @@ +1613,3 @@ > + MutterWindowPrivate *priv = self->priv; > + ClutterX11TexturePixmap *texture_x11 = > + CLUTTER_X11_TEXTURE_PIXMAP (priv->actor); > > Ugly line break - Mutter is only approximately line-wrapped to 80 characters > not strictly line wrapped. sorry, just used to the clutter style, I can change that.
Created attachment 156581 [details] [review] make sure we create a pixmap for all new mutter windows Attaches a small fix that will be required for the follow up patch that changes hos damage is handled, though since it's not significant without this change it doesn't seem to warrant a separate bug.
Created attachment 156582 [details] [review] mutter-window: stream raw updates to ClutterX11TexturePixmap Updates the patch considering Owen's feedback
(In reply to comment #9) > When drawing halts it is stuck in meta_compositor_process_event / > meta_error_trap_pop_internal . OK, it turned out that this is caused by the error trapping in meta_compositor_process_event ... removing them makes it work just fine, this patch just happened to trigger the problem but it did not cause it.
Created attachment 156763 [details] [review] An experimental optimization to avoid being starved by damage events Since drago01 seemed to be seeing performance issues that might be due to mutter being overwhelmed with huge amounts of damage events I cooked up an experimental optimization to destroy the damage object for windows when > 100 raw damage regions have been received in a single window and simply promote to a full window damage + recreate the damage object for the next frame. I haven't been able to test this myself, as I don't currently have a way to reproduce the problem, and I'm not sure from drago01's last comment if in fact the problem has now gone away, but it's possible this patch could give a bump in performance for some damage heavy clients.
(In reply to comment #14) > Created an attachment (id=156763) [details] [review] > An experimental optimization to avoid being starved by damage events > > Since drago01 seemed to be seeing performance issues that might be due to > mutter being overwhelmed with huge amounts of damage events I cooked up an > experimental optimization to destroy the damage object for windows when > 100 > raw damage regions have been received in a single window and simply promote to > a full window damage + recreate the damage object for the next frame. > > I haven't been able to test this myself, as I don't currently have a way to > reproduce the problem, and I'm not sure from drago01's last comment if in fact > the problem has now gone away, but it's possible this patch could give a bump > in performance for some damage heavy clients. As I wrote in the last comments removing the traps seems to fix the issue I was having (https://bugzilla.gnome.org/show_bug.cgi?id=613398) ... I will try with your patch (with and without the traps to check if there is any difference).
(In reply to comment #14) > Created an attachment (id=156763) [details] [review] > An experimental optimization to avoid being starved by damage events > > Since drago01 seemed to be seeing performance issues that might be due to > mutter being overwhelmed with huge amounts of damage events I cooked up an > experimental optimization to destroy the damage object for windows when > 100 > raw damage regions have been received in a single window and simply promote to > a full window damage + recreate the damage object for the next frame. > > I haven't been able to test this myself, as I don't currently have a way to > reproduce the problem, and I'm not sure from drago01's last comment if in fact > the problem has now gone away, but it's possible this patch could give a bump > in performance for some damage heavy clients. This patch seems to fix the performance (starving) issues even without removing the traps.
Review of attachment 156581 [details] [review]: Looks good
Review of attachment 156582 [details] [review]: While I agree that freezing the damage events isn't really that meaningful, it seems to work pretty cleanly in just a few lines of code, and (to me) anyways makes this code more self contained - the code is doing the best job it can to freeze updates, and the ultimate effect ends up being an approximation to that based on what the TFP extension actually can do. Rather than having a half-job of freezing updates which can only be explained with a long comment relating to how the TFP extension typically is implemented. Some small comments, otherwise looks good. ::: src/compositor/mutter-window.c @@ +67,3 @@ guint redecorating : 1; + guint needs_fake_damage : 1; Don't like this name. There's nothing "fake" about the damage, it just hasn't been sent on to Clutter yet. have_pending_damange, perhaps. @@ +1646,3 @@ + * fake update so we need to queue a redraw to ensure the pre_paint + * will be hit in a finite amount of time. */ + clutter_actor_queue_redraw (priv->actor); You don't want to do this. the whole point of "fake damage" (pending damage) is that it isn't sent on to clutter yet, it's when the window is unfrozen, that the redraw needs to be queued. With this removed, I don't think ta separate function is needed, it can just be inlined into process_damage.
Review of attachment 156763 [details] [review]: Since the starvation problem seemed to be largely the result of doing a round trip to the X server on each event, let's skip this complexity unless it proves necessary.
(In reply to comment #19) > Review of attachment 156763 [details] [review]: > > Since the starvation problem seemed to be largely the result of doing a round > trip to the X server on each event, let's skip this complexity unless it proves > necessary. And do what instead? (afaik there is no other fix for this issue merged yet).
(In reply to comment #20) > (In reply to comment #19) > > Review of attachment 156763 [details] [review] [details]: > > > > Since the starvation problem seemed to be largely the result of doing a round > > trip to the X server on each event, let's skip this complexity unless it proves > > necessary. > > And do what instead? (afaik there is no other fix for this issue merged yet). From comment 31, it sounded like removing the error traps as in bug 613398 was enough to fix things, and since that's definitely something we want to do - a big performance problem in general, that's the way I'd like to go. I've reviewed your patch and added an additional patch of my own there now.
(In reply to comment #21) > (In reply to comment #20) > > (In reply to comment #19) > > > Review of attachment 156763 [details] [review] [details] [details]: > > > > > > Since the starvation problem seemed to be largely the result of doing a round > > > trip to the X server on each event, let's skip this complexity unless it proves > > > necessary. > > > > And do what instead? (afaik there is no other fix for this issue merged yet). > > From comment 31, it sounded like removing the error traps as in bug 613398 was > enough to fix things, Yeah it indeed does. > and since that's definitely something we want to do - a > big performance problem in general, that's the way I'd like to go. I've > reviewed your patch and added an additional patch of my own there now. OK
Created attachment 158510 [details] [review] mutter-window: stream raw updates to ClutterX11TexturePixmap I updated this patch last week according to some of the comments, though still haven't had a chance to test this patch and reply to the comments but here's the patch for now.
Created attachment 158511 [details] [review] mutter-window: optimization for damage handling although we may not want this optimization for now I did update the damage handling patch in line with the new "mutter-window: stream raw updates to ClutterX11TexturePixmap" patch for reference in case it proves to be worthwhile later.
(In reply to comment #18) > Review of attachment 156582 [details] [review]: > > While I agree that freezing the damage events isn't really that meaningful, it > seems to work pretty cleanly in just a few lines of code, and (to me) anyways > makes this code more self contained - the code is doing the best job it can to > freeze updates, and the ultimate effect ends up being an approximation to that > based on what the TFP extension actually can do. Rather than having a half-job > of freezing updates which can only be explained with a long comment relating to > how the TFP extension typically is implemented. > > Some small comments, otherwise looks good. > > ::: src/compositor/mutter-window.c > @@ +67,3 @@ > guint redecorating : 1; > > + guint needs_fake_damage : 1; > > Don't like this name. There's nothing "fake" about the damage, it just hasn't > been sent on to Clutter yet. have_pending_damange, perhaps. this code was reworked a bit in my latest patch so there is now a needs_damage_all boolean that's referenced when we unfreeze a window. If it's true when unfreezing then we do a full pixmap update_area() > > @@ +1646,3 @@ > + * fake update so we need to queue a redraw to ensure the pre_paint > + * will be hit in a finite amount of time. */ > + clutter_actor_queue_redraw (priv->actor); > > You don't want to do this. the whole point of "fake damage" (pending damage) is > that it isn't sent on to clutter yet, it's when the window is unfrozen, that > the redraw needs to be queued. With this removed, I don't think ta separate > function is needed, it can just be inlined into process_damage. I've slightly formalized the freeze mechanism now with two new functions: _mutter_window_freeze and _mutter_window_thaw that get used for those effects that want to try and disable updates during their progress and this gives us a single place to do things when a window is unfrozen. the patch now sets priv->needs_damage_all = TRUE if damage is received for a window while it's frozen. If we see priv->needs_damage_all == TRUE when the window is unfrozen we issue a full pixmap update_area() as noted above.
Review of attachment 158510 [details] [review]: Looks very good to me with a couple of tiny style points. A side-note about the freeze/thaw for damage is that even with zero-copy TFP, it works if we *arent'* redrawing the stage for some other reason. Not a factor here if we are animating an effect, but see bug 587255 for an old bug where I added the ability to freeze updates on a MutterWindow to get better window updates when an application is drawing. ::: src/compositor/mutter-window.c @@ +49,3 @@ GdkRegion *bounding_region; + guint freeze; I'd slightly prefer freeze_count @@ +860,3 @@ + } + + if (self->priv->freeze) Think it's slightly better to write > 0, see below for what that is consistent with. @@ +1706,3 @@ +is_frozen (MutterWindow *self) +{ + return self->priv->freeze ? TRUE : FALSE; Clearer as: return self->priv->freeze_count > 0;
Created attachment 159078 [details] [review] mutter-window: stream raw updates to ClutterX11TexturePixmap I've updated the path with the minor change suggested by owen; renaming the .freeze member to .freeze_count.
Created attachment 159080 [details] [review] mutter-window: optimization for damage handling I've rebased the damage optimization patch. (here just for reference in case it becomes desirable in the future)
ok I've pushed the patches (except the optimization for damage handling) to master as commits cfa30f98768a2f4d and 0c146403526f though I forgot to add a bug reference to the commit messages; sorry about that.
Apparently I don't have the permissions reopen this bug? Can someone please do that for me? Our QA team for MeeGo recently reported some regressions in the compositor performance for some use cases which we have pin-pointed back to this change. The case they were testing appear to be DOSing the compositor with huge amounts of damage events which were slowing down the compositor and sometimes completely freezing updates until it could catch up. Full details and some crude benchmarks can be found here: http://bugs.meego.com/show_bug.cgi?id=42 In the end, of the three tests I ran there were three different options working best: x11perf -shmput10 works best with the original code glxgears works best with patch0 gtkperf works best with patch1 Where the "original code" is the code before we applied anything to change the damage handling, patch0 is the experimental "optimization for damage handling" patch I attached here and patch1 is a new patch I wrote to request BoundingBox damage regions changes instead of Raw events. I'll attach the new patch for review here.
Created attachment 161056 [details] [review] mutter-window: request BoundingBox damage changes to avoid DOS This patch requests BoundingBox damage region changes instead of streaming Raw damage events to the compositor. This greatly reduces the amount of events we receive and avoids overwhelming the compositor which can result in frozen window contents.
(In reply to comment #30) > Apparently I don't have the permissions reopen this bug? Can someone please do > that for me? Done.
OK so apparently the particular category the meego bug is currently filed under isn't public yet, so I'll try and copy over the comments of interest here... Most of the start comments are very hand wavy so I'll ignore them. Here are the introductory performance differences they were seeing with x11perf -shmput10: zhengkui 2010-04-13 18:54:04 PDT ok, I tested both 0312 and 0319 image. the result as following: 0312: mesa-7.7.99-2.2, xserver-1.7.99.3-6.1, xorg-x11-drv-intel-2.10.0-7.2, libdrm-2.4.19-1.1, kernel-2.6.33-8.1, clutter-1.1.14-2.7, mutter-2.28.1_0.10-1.1 0319: mesa-7.7.99.901~git346298c765-2.1, xserver-1.7.99.901~git67a8c659f2, xorg-x11-drv-intel-2.10.902-1.1, libdrm-2.4.19-2.1, kernel-2.6.33.1-4.1, clutter-1.2.2-2.4, mutter-2.29.0_0.0-1.1 1). run test without window manager, we got the similar result either on 0312 or 0319 800000 reps @ 0.0106 msec ( 94200.0/sec): ShmPutImage 10x10 square 800000 reps @ 0.0106 msec ( 94100.0/sec): ShmPutImage 10x10 square 800000 reps @ 0.0106 msec ( 94200.0/sec): ShmPutImage 10x10 square 800000 reps @ 0.0106 msec ( 94200.0/sec): ShmPutImage 10x10 square 800000 reps @ 0.0106 msec ( 94200.0/sec): ShmPutImage 10x10 square 4000000 trep @ 0.0106 msec ( 94200.0/sec): ShmPutImage 10x10 square 2). run test with the window manager, we find there is a drop down: 0312: 240000 reps @ 0.0220 msec ( 45500.0/sec): ShmPutImage 10x10 square 240000 reps @ 0.0216 msec ( 46400.0/sec): ShmPutImage 10x10 square 240000 reps @ 0.0218 msec ( 45900.0/sec): ShmPutImage 10x10 square 240000 reps @ 0.0213 msec ( 46900.0/sec): ShmPutImage 10x10 square 240000 reps @ 0.0214 msec ( 46700.0/sec): ShmPutImage 10x10 square 1200000 trep @ 0.0216 msec ( 46300.0/sec): ShmPutImage 10x10 square 0319: 160000 reps @ 0.0346 msec ( 28900.0/sec): ShmPutImage 10x10 square 160000 reps @ 0.0334 msec ( 30000.0/sec): ShmPutImage 10x10 square 160000 reps @ 0.0370 msec ( 27000.0/sec): ShmPutImage 10x10 square 160000 reps @ 0.0333 msec ( 30000.0/sec): ShmPutImage 10x10 square 160000 reps @ 0.0333 msec ( 30100.0/sec): ShmPutImage 10x10 square 800000 trep @ 0.0343 msec ( 29200.0/sec): ShmPutImage 10x10 square The Sole difference in mutter between those images turns out to be: mmit fc4d3f292ca45e05ab8c2f571b537f62fc834c4a Author: Robert Bragg <robert@linux.intel.com> Date: Tue Mar 2 18:02:28 2010 +0000 mutter-window: stream raw updates to ClutterX11TexturePixmap
rib 2010-05-13 16:59:27 PDT It seems to be that the compositor is being DOS'd by the Raw damage events that my patch made mutter request from the server. The advantage of switching to Raw events was that it avoided 2 non-synchronous and 2 synchronous X requests per window with damage, per frame: * 2 (non-sync) to create/destroy a temporary region to copy the damage region into. * 1 to request the server to copy the damage region into our given region * 1 to fetch that region back into the client. The disadvantage is that some 2D benchmarks can DOS the compositor and updates can freeze until the compositor manages to catch up. I have tried two approaches to fixing this, and it seems which ever way it's done there will be cases where one is better than the other so I'll attach both patches. These are crude "performance" measurements I made: x11perf -smhput10 original: 280,000 raw: 160,000 + window updates freeze patch0: 200,000 patch1: 240,000 glxgears original: 600 raw: 1382 patch0: 1475 patch1: 1240 original: before my patch to take advantage of glXCopySubBuffer + stream Raw damage events raw: with the culprit patch to take advantage of glXCopySubBuffer + stream Raw damage events. patch0: A patch I wrote a while ago to delete the damage object when > 100 events are received, and recreate it when we start the next frame. I attached it to this upstream bug, but we agreed not to apply it until there was evidence that it was needed: https://bugzilla.gnome.org/show_bug.cgi?id=611838 patch1: A new patch that requests the server for BoundingBox changes so we only get notified when the region bounding all outstanding damage changes. This reduces the number of events we receive which should avoid DOS issues, but it requires a synchronous request to clear the damage region each frame, for each window with damage. It can be seen that neither patch gets x11perf back to its original performance, but glxgears also provides an example that is significantly slower with the original code so it's not just a simple case of reverting the patch. Also although patch0 provides the best performance for glxgears, patch0 isn't as good as patch1 for x11perf -shmput10. I think it's important to note that x11perf or glxgears are not good benchmarks so it may require further testing to decide which patch is best. On a related side note we should also cherry-pick these two commits from the upstream gnome repo into Mutter: 9350c586ecdccbe Fix accounting of frozen with maximize/unmaximize c69f8e4dfe4d951 Don't trap XErrors in meta_compositor_process_event
rib 2010-05-13 17:50:31 PDT I've also done some tests with gtkperf (GtkPerf 0.40) since that was mentioned in comment 14: Please refer to comment 17 for a summary of each configuration: original: GtkEntry - time: 0.15 GtkComboBox - time: 3.08 GtkComboBoxEntry - time: 2.56 GtkSpinButton - time: 0.35 GtkProgressBar - time: 0.24 GtkToggleButton - time: 0.33 GtkCheckButton - time: 0.27 GtkRadioButton - time: 0.43 GtkTextView - Add text - time: 1.18 GtkTextView - Scroll - time: 0.80 GtkDrawingArea - Lines - time: 4.65 GtkDrawingArea - Circles - time: 9.06 GtkDrawingArea - Text - time: 4.07 GtkDrawingArea - Pixbufs - time: 0.56 --- Total time: 27.75 raw: GtkEntry - time: 0.16 GtkComboBox - time: 2.94 GtkComboBoxEntry - time: 2.63 GtkSpinButton - time: 0.32 GtkProgressBar - time: 0.21 GtkToggleButton - time: 0.36 GtkCheckButton - time: 0.33 GtkRadioButton - time: 0.49 GtkTextView - Add text - time: 1.20 GtkTextView - Scroll - time: 0.72 GtkDrawingArea - Lines - time: 7.90 GtkDrawingArea - Circles - time: 14.14 GtkDrawingArea - Text - time: 4.49 GtkDrawingArea - Pixbufs - time: 1.57 --- Total time: 37.47 patch0: GtkEntry - time: 0.16 GtkComboBox - time: 3.07 GtkComboBoxEntry - time: 2.92 GtkSpinButton - time: 0.33 GtkProgressBar - time: 0.22 GtkToggleButton - time: 0.36 GtkCheckButton - time: 0.32 GtkRadioButton - time: 0.50 GtkTextView - Add text - time: 1.22 GtkTextView - Scroll - time: 0.86 GtkDrawingArea - Lines - time: 6.79 GtkDrawingArea - Circles - time: 11.45 GtkDrawingArea - Text - time: 4.02 GtkDrawingArea - Pixbufs - time: 0.64 --- Total time: 32.87 patch1: GtkEntry - time: 0.16 GtkComboBox - time: 3.02 GtkComboBoxEntry - time: 2.58 GtkSpinButton - time: 0.32 GtkProgressBar - time: 0.21 GtkToggleButton - time: 0.36 GtkCheckButton - time: 0.32 GtkRadioButton - time: 0.49 GtkTextView - Add text - time: 1.24 GtkTextView - Scroll - time: 0.68 GtkDrawingArea - Lines - time: 4.26 GtkDrawingArea - Circles - time: 8.78 GtkDrawingArea - Text - time: 3.50 GtkDrawingArea - Pixbufs - time: 0.57 --- Total time: 26.49 So this means we have three tests each with a different option working best: x11perf -shmput10 works best with the original code glxgears works best with patch0 gtkperf works best with patch1 At least this clarifies that we should to do *something* since none of them perform best with the current code. I don't think switching back to the original code is the right thing to do since we know there are cases that are faster with the newer code (and anything is more compelling that x11perf), and we also expect a power saving from being able to update small regions of the screen via glXCopySubBuffer which wasn't supported with the original code. I'll talk with tf tomorrow and hopefully we can pick an option.
tf 2010-05-14 03:28:02 PDT (In reply to comment #20) > I don't think switching back to the original code is the right thing to do Agreed. I think the gtkperf test is the most real-life like test of these, and hence the one we really care about, so I would be for the the BoundingBox approach (patch1).
ebassi 2010-05-14 05:30:50 PDT I tend to agree with tf: x11perf is an awful "benchmark" which tests all the wrong things, unless you find yourself stranded with you Delorean in 1996. GtkPerf is not a great benchmark but at least is testing modern code paths. since patch[1] is the one that gives a boost to GtkPerf and doesn't really affect much the performance of glxgears, I'd second the +1 vote to the BoundingBox approach. as a side note: it might be interesting to talk to the X11 team and figure out an extension to the XDAMAGE protocol to make it easier for compositors to do the right thing.
Reopened (and added Robert to the "Mutter Developers" group)
Created attachment 162706 [details] [review] mutter-window: request BoundingBox damage changes to avoid DOS I've attached a replacement for my "mutter-window: request BoundingBox damage changes to avoid DOS" patch which fixes a bug in mutter_window_process_damage where the received_damage boolean might not be updated if the window is frozen.
Review of attachment 162706 [details] [review]: So, since Clutter only repaints a bounding box currently, this should mostly be a pretty straight-up performance win. Using non-raw damage events can be tricky because it makes it easier to get race conditions, but I think this is OK. It is possible to have sequences like: <damage to +0+0x1x1> (A: reports damage to +1+1x1x1) <damage to +99+99x1x1> (B: reports damage to +1+1x100x100) We decide to repaint <damage to +00x1x1> (C: reports damage to +0+0x100x100) We XSubtractDamage We repaint +1+1x100x100 We get event C We XSubtractDamage We repaint +0+0x100x100 So we can end up painting more than we would with raw damage event,s but I can't figure out a case when we would paint less. (What we found out with GTK+ is that damage event serials have to be treated with caution for non-raw damage, since the X server is suppressing damage events. Just because the last damage you got occurred at serial T doesn't mean that damage didn't occur at serial T + 1. But we aren't even looking at serials here at all.) OK to commit if you fix up the wrong comment and commit message text about synchronous requests.
ok thanks, I've pushed this to master as: commit 91d82bf8c78ac7d2ae8a39f5ed619a413e2d6320 Author: Robert Bragg <robert@linux.intel.com> Date: Thu May 13 21:43:30 2010 +0100 mutter-window: request DamageReportBoundingBox report level In commit d34ae764769 I switched mutter-window to ask for Raw rectangles from the X server. This avoided 2 non synchronous and 2 synchronous X requests per window with damage, per frame; 2 (non-sync) to create/destroy a temporary region to copy the damage region into, 1 to request the server to copy the damage region into a our given region and another to fetch that region back into the client. The problem with raw events though is that it's possible to DOS the compositor with them. Instead of receiving an event for every bit of damage this patch instead asks the server to only report BoundingBox changes to the damage region. https://bugzilla.gnome.org/show_bug.cgi?id=611838