Bug 763350 – drawing performance worse than X

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 763350 - drawing performance worse than X


Summary:	drawing performance worse than X


Status:	RESOLVED FIXED

Product:	gtk+
Classification:	Platform
Component:	Backend: Wayland
Version:	3.19.x
Hardware:	Other All

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	gtk-bugs
QA Contact:	gtk-bugs

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2016-03-09 00:04 UTC by Christian Hergert
Modified:	2016-04-17 21:11 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
gnuplot graphs comparing GdkFrameClock timings (95.62 KB, application/x-tar) 2016-03-14 02:45 UTC, Christian Hergert		Details
graph of predicted frame timing shell vs weston (19.03 KB, image/png) 2016-03-14 13:20 UTC, Christian Hergert		Details
captures for mutter and weston (163.18 KB, application/x-bzip2) 2016-03-15 14:52 UTC, Christian Hergert		Details
sysprof captures with rudimentary pooling (154.05 KB, application/x-bzip2) 2016-03-15 17:08 UTC, Christian Hergert		Details
wip patch to propagate presentation time (4.11 KB, patch) 2016-03-15 19:08 UTC, Christian Hergert	none	Details \| Review
wip for mutter presentation time (3.50 KB, patch) 2016-03-15 19:11 UTC, Christian Hergert	none	Details \| Review
avoid dropping our shm-based surface on configure-event (1003 bytes, patch) 2016-03-17 06:39 UTC, Christian Hergert	committed	Details \| Review
predicted presentation times (22.13 KB, image/png) 2016-03-17 08:19 UTC, Christian Hergert		Details
updated cogl patch for clock discovery (5.40 KB, patch) 2016-03-17 09:40 UTC, Christian Hergert	none	Details \| Review
updated mutter patch to pass cogl timing info to clients (4.92 KB, patch) 2016-03-17 09:41 UTC, Christian Hergert	none	Details \| Review
mutter: hack in some timings (20.99 KB, patch) 2016-03-17 14:12 UTC, Ray Strode [halfline]	none	Details \| Review
gdk-wayland: hack in some timings (15.13 KB, patch) 2016-03-17 14:13 UTC, Ray Strode [halfline]	none	Details \| Review

Description Christian Hergert 2016-03-09 00:04:59 UTC

When changing GtkRevealer:reveal-child, I'm noticing a bunch of size allocates to the child. In particular, this seems to happen when min-req < nat-req.

The side-effect of this is that if you have a centered label in the descendant hierarchy, it will re-position itself on each frame. It looks rather unpleasant.

Secondly, it causes more resize work during each frame. Changing sizing and dropping pixel-cache contents, increases the difficultly in reaching 60 fps animations.

Thirdly, it prevents reusing the existing window contents for pixel cache in some cases. (Notably textview/listbox).

What I did to fix this performance in Builder was to avoid using GtkRevealer for the panels, but I'd like to change that for 3.22.

My thought for this is to clamp down the child allocation size during the animation and rely on clipping to get the same effect. I'm not sure what the unintended side effects might be of child input windows at the edge of the screen during this animation though.

Comment 1 Christian Hergert 2016-03-09 00:09:16 UTC

I should add that the situation on (F24) Wayland is significantly worse than on Xorg. Using GtkRevealer for a panel on Xorg is rather smooth, and the same thing on Wayland is choppy (probably 20 FPS at best).

My hunch is that we are creating new memfd's on every frame due to the size changes, but I need to go dig up evidence to prove that is out of the ordinary. (I'd expect some sort of fd pooling/reuse).

Comment 2 Ray Strode [halfline] 2016-03-09 15:47:38 UTC

we currently only reuse a shm buffer if the size is the same and no out of band drawing comes in while the frame is being processed by the compositor. If the size changes we allocate a new buffer and discard the old one when its finished.

I don't know if that's the cause of the frame rate drop. do you have a test case? what about the output of GDK_DEBUG=frames ?

The wayland protocol does have the notion of a shm pool, but it's actually more like a buffer atlas: you allocate one big chunk of memory and get sub buffers from it.  If it turns out the memfd_create / mmap dance is slow we could potentially start using that feature, though I think it makes sense to profile first and figure out what the cause of the slow down is before making changes.  

Still, I agree going through the resize machinery for a widget that's just getting visually translated is maybe unnecessarily heavy handed.

not sure I understand the child input window problem. They should get clipped by their parent too right?

Comment 3 Christian Hergert 2016-03-09 22:10:11 UTC

> I don't know if that's the cause of the frame rate drop. do you have a test case? what about the output of GDK_DEBUG=frames ?

My current test case is the new panel system I'm working on for Builder.

  git clone https://github.com/chergert/panel-gtk.git
  cd panel-gtk
  ./autogen.sh
  make
  cd src
  GDK_DEBUG=frames ./test-panel

If I'm reading the output right, I'm seeing intervals in the mid 20s to mid 40s.

If I keep the panels relatively small, it can get close to 60fps, but as they increase in size it gets worse.

> not sure I understand the child input window problem. They should get clipped by their parent too right?

It doesn't look like that is happening for GtkOverlay. I have some widgets at the edige and it is causing the GtkWindow resize input window to no longer work. Hence my trepidation.

Comment 4 Ray Strode [halfline] 2016-03-09 22:27:11 UTC

indeed, I think your intuition is spot on, running your test case through sysprof shows most of the time spent handling page faults on the shm segments

Comment 5 Ray Strode [halfline] 2016-03-10 15:42:45 UTC

so the obvious thing I didn't piece together yesterday was we aren't actually resizing the window, just things in the window. So, in theory we should be able to reuse the surface buffer and avoid allocating a new one. The fact that we aren't reusing the buffer is worth investigating.

It could be we're drawing before the compositor is done with the buffer, so we're having to allocate a new buffer to accommodate out of band drawing. If that's the case, maybe we can stop doing that.

If allocating the extra buffer / cairo surface is hard to prevent, one easy fix might be to allocate it up front. We aren't changing the size of the window, after all. Then if we end up needing the second buffer, it will only need to page fault into the compositor the first time it's used, and not the subsequent times its reused.

A more involved fix, would entail breaking our 1-to-1 relationship between wl_shm_pool objects (mmapped regions) and wl_buffer objects (sub buffers in those regions). Right now we allocate a pool the size of the buffer we need, and then allocate exactly one buffer from the pool. Instead, we could allocate a bigger pool up front and always source new buffers from that pool. So the pool would stick around but buffers would be transient. We'd probably need a new heap data structure to do the management of buffers from the pool. We could be smart about when the window shrinks, too, and continue to use buffers from the same pool. Of course if the window gets bigger we'd either have to allocate a new pool or resize the existing pool.

Comment 6 Christian Hergert 2016-03-10 22:23:22 UTC

(In reply to Ray Strode [halfline] from comment #5)
> We'd
> probably need a new heap data structure to do the management of buffers from
> the pool.  We could be smart about when the window shrinks, too, and
> continue to use buffers from the same pool. Of course if the window gets
> bigger we'd either have to allocate a new pool or resize the existing pool.

This sounds like what GtkPixelCache does when dealing with size changes.

Comment 7 Matthias Clasen 2016-03-11 15:09:02 UTC

One hint from irc discussion: Benjamin seemed to find that the slowdown is due to lack of clipping, causing the sliding out content to get into the shadow area of the window, and making us redraw the window shadow all the time.

Comment 8 Christian Hergert 2016-03-13 01:03:38 UTC

I've updated the panel-gtk code with 2 important things that should fix this (although I'm not sure how to verify the shadow drawing).

First, PnlDockBin now creates it's own GdkWindow. Secondly, PnlDockOverlay is now an eventbox which should also be clipping all of the children. Side window handles now work again, so that does indicate to me it is clipping properly now.

Performance is still poor on the wayland backend.

Comment 9 Christian Hergert 2016-03-13 23:29:32 UTC

I'm running weston under X to test this, and it looks like the number of frames rendered is about half that of the same code on Xorg.

I was curious about the raw memfd costs, so put together an informative test:

  - memfd_create()
  - ftruncate(2*page_size)
  - mmap()
  - page fault all pages
  - munmap()
  - close()

The cost for all of the above is about .06 msec. So not really that bad. Interestingly, half of the time is spent on close()!

Comment 10 Christian Hergert 2016-03-14 02:45:25 UTC

Created attachment 323835 [details]
gnuplot graphs comparing GdkFrameClock timings

We might want to update the summary of the bug, since this is getting off-topic from Revealer to general xorg-vs-wayland draw performance. (And the test case no longer uses revealer to simplify the workload).

I've attached some gnuplot graphs comparing the frame timings from one run on a native wayland (gnome-shell) with up to date F24 system vs a F23 xorg (with git gtk+) session.

Interesting to see the differences here.

Comment 11 Christian Hergert 2016-03-14 07:45:21 UTC

I modified gdkdisplay-wayland.c to use a single wl_shm_pool by increasing the buffer size and flipping between two buffers. Not generally useful, but useful enough to show that the memfd_create() overhead was not enough to cause the slowdowns.

Comment 12 Christian Hergert 2016-03-14 10:42:47 UTC

After digging into this a bit more, I'm starting to wonder if the issue is in the frame clock synchronization.

In gdkwindow-wayland.c we are expecting timing information from Weston. The timing from weston seems to be about the time of the vblank.

The timing information we get back from mutter is the time in ClutterStage::after-page, which as far as I can tell, it not any sort of predictable epoch to base our timings on. It just tells us when the frame was finished drawning, not when it has been presented, or even when the vblank occurs.

Comment 13 Christian Hergert 2016-03-14 12:06:23 UTC

It would appear that we can move the code in mutter that calls wl_callback_send_done() from ::after-paint into the frame callback from Cogl.

However, it seems that the frame callback from cogl is not reliable on EGL/kms to match the vblank.

One more layer down the stack...

Comment 14 Christian Hergert 2016-03-14 13:20:49 UTC

Created attachment 323867 [details]
graph of predicted frame timing shell vs weston

I finally got around to testing on weston with native drm, and everything was smooth. So this increases my presumption that the issue is frame clock related between how mutter and weston send their callback.

Comment 15 Christian Hergert 2016-03-14 13:47:49 UTC

cogl-winsys-egl-kms.c specifies a page_flip drm event handler, but not vblank_handler (which is what weston uses).

Even if we add this handler, I'm not sure we have the proper API to propagate it in the same fashion back to the calling API in Mutter (who is waiting for callbacks via the CoglOnscreen).

Comment 16 Emmanuele Bassi (:ebassi) 2016-03-14 13:56:51 UTC

(In reply to Christian Hergert from comment #15)
> cogl-winsys-egl-kms.c specifies a page_flip drm event handler, but not
> vblank_handler (which is what weston uses).
> 
> Even if we add this handler, I'm not sure we have the proper API to
> propagate it in the same fashion back to the calling API in Mutter (who is
> waiting for callbacks via the CoglOnscreen).

The usual way to achieve this is to populate the CoglFrameInfo on page flip; the frame info timing can then be used by higher layers, like Mutter. This is what already happens with Cogl using the GLX winsys.

Comment 17 Ray Strode [halfline] 2016-03-14 14:20:27 UTC

(In reply to Christian Hergert from comment #9)
> I was curious about the raw memfd costs, so put together an informative test:
> 
>   - memfd_create()
>   - ftruncate(2*page_size)
>   - mmap()
>   - page fault all pages
>   - munmap()
>   - close()
> 
> The cost for all of the above is about .06 msec. So not really that bad.
> Interestingly, half of the time is spent on close()!

are you accounting for faulting in the compositor too? if you run,

$ ./test-panel &
$ sysprof-cli -p $! sysprof-output

then open and close the panels like a billion times, then switch back to the terminal and hit ctrl-c 

then run

$ sysprof sysprof-output

what's at the top of your profile ? I see most of the time spent in sse2_fill/page_fault/shmem_fault, but for me the animations are 60 fps anyway. I'm curious what your sysprof profile looks like.

Comment 18 Ray Strode [halfline] 2016-03-14 15:39:32 UTC

(In reply to Christian Hergert from comment #15)
> cogl-winsys-egl-kms.c specifies a page_flip drm event handler, but not
> vblank_handler (which is what weston uses).

Note page flips are synced to vblank by the kernel.  DRM vblank events are an older, deprecated api.  There's a patch in bug 746042 comment 10 to make mutter use them, but it had problems with hotplug so we just fall back to tearing for old drivers that don't support page flipping.

Comment 19 Matthias Clasen 2016-03-15 00:56:02 UTC

See krh's comment here: https://lists.freedesktop.org/archives/wayland-bugs/2016-March/007542.html

Comment 20 Matthias Clasen 2016-03-15 01:08:57 UTC

(In reply to Christian Hergert from comment #9)
> I'm running weston under X to test this, and it looks like the number of
> frames rendered is about half that of the same code on Xorg.
> 
> I was curious about the raw memfd costs, so put together an informative test:
> 
>   - memfd_create()
>   - ftruncate(2*page_size)
>   - mmap()
>   - page fault all pages
>   - munmap()
>   - close()
> 
> The cost for all of the above is about .06 msec. So not really that bad.
> Interestingly, half of the time is spent on close()!

2 pages is pretty small - if you have a maximized window, say 1024x768 pixels, that is ~1000 pages at 1k pixels/page

Comment 21 Matthias Clasen 2016-03-15 01:15:59 UTC

Jonas points out this code as an example for use of a large pool in wayland clients:

https://cgit.freedesktop.org/wayland/weston/tree/clients/window.c#n1077

Comment 22 Christian Hergert 2016-03-15 01:27:13 UTC

(In reply to Matthias Clasen from comment #21)
> Jonas points out this code as an example for use of a large pool in wayland
> clients:
> 
> https://cgit.freedesktop.org/wayland/weston/tree/clients/window.c#n1077

While this seems like a good idea long term, I don't think it is in the way of us getting better draw performance with the 1:1 mapping. My crappy hack did a 48mb map and rotated through the items and it had very little effect.

Although, once the frame clock stuff is more inline with weston, maybe it will be a bit more obvious that is needed too?

Comment 23 Matthias Clasen 2016-03-15 01:47:22 UTC

(In reply to Christian Hergert from comment #22)
> 
> Although, once the frame clock stuff is more inline with weston, maybe it
> will be a bit more obvious that is needed too?

yeah, we may be looking at multiple overlapping issues

Comment 24 Ray Strode [halfline] 2016-03-15 14:10:55 UTC

it's be great if you could post your scratch patch and sysprof output !

Comment 25 Ray Strode [halfline] 2016-03-15 14:22:45 UTC

(In reply to Matthias Clasen from comment #20)
> > The cost for all of the above is about .06 msec. So not really that bad.
> > Interestingly, half of the time is spent on close()!
> 2 pages is pretty small - if you have a maximized window, say 1024x768
> pixels, that is ~1000 pages at 1k pixels/page

Right, if it takes .06msec to do 2 pages, then that's like ~20 to ~30 msec to do a full window ! That pretty closely matches what you report in comment 3.

Comment 26 Christian Hergert 2016-03-15 14:29:33 UTC

(In reply to Ray Strode [halfline] from comment #25)
> Right, if it takes .06msec to do 2 pages, then that's like ~20 to ~30 msec
> to do a full window ! That pretty closely matches what you report in comment
> 3.

But why doesn't this happen on weston?

Comment 27 Ray Strode [halfline] 2016-03-15 14:37:56 UTC

maybe weston releases the buffer sooner, so gtk is able to reuse it? hard to say.  can you get sysprof reports for both ?

Comment 28 Christian Hergert 2016-03-15 14:52:00 UTC

Created attachment 324001 [details]
captures for mutter and weston

sure, here are the sysprof captures.

this actually ran a bit smoother than my experimental patches based on @ebassi's quick hack to add drm page_flip information to cogl.

I've noticed some really weird things like the animation running smoother under heavy mouse movement.

Comment 29 Daniel Stone 2016-03-15 14:54:28 UTC

(In reply to Christian Hergert from comment #28)
> I've noticed some really weird things like the animation running smoother
> under heavy mouse movement.

Mouse movement will flush events out from server -> client, so maybe there is something in the delayed buffer release theory.

Comment 30 Ray Strode [halfline] 2016-03-15 15:27:32 UTC

so with both mutter and weston you're spending a great deal of time page faulting.  can you do a sysprof run with your scratch patch applied that reuses buffers? I know it doesn't fix frame rate, but i'm curious what peak pops up next when we eliminate this peak.

Comment 31 Christian Hergert 2016-03-15 17:08:13 UTC

Created attachment 324033 [details]
sysprof captures with rudimentary pooling

I verified a single memfd_create() with strace -e memfd_create before capturing.

This is using weston from git, and mutter from F24 package.

Comment 32 Christian Hergert 2016-03-15 17:18:10 UTC

Pekka wrote up some information for us on the wayland mailing list:

  https://lists.freedesktop.org/archives/wayland-devel/2016-March/027465.html

Comment 33 Ray Strode [halfline] 2016-03-15 18:04:04 UTC

Okay so with those sysprof profiles, most of the time is spent drawing (for both weston and mutter): gtk_widget_draw_internal and libpixman (though that's missing debug symbols, but it's probably drawing related), which is what you'd expect during an animation I guess.  So if we're not hitting the framerate we want, then assuming drawing takes less time than a frame (which is a reasonable assumption since it's fluid on weston and the drawing isn't different there), we must not be drawing frequently enough.  I think these profiles suggest, indeed, the timings are wonky.

Can you instrument maybe_start_idle in gdkframeclockidle to print out what min_interval  it's using for mutter versus weston ?

Does your version of mutter have this commit, by the way? https://git.gnome.org/browse/mutter/commit/?id=50099c4c10c9311c6d0deb978d21637c9437c027

Comment 34 Christian Hergert 2016-03-15 18:59:20 UTC

(In reply to Ray Strode [halfline] from comment #33)
> Can you instrument maybe_start_idle in gdkframeclockidle to print out what
> min_interval  it's using for mutter versus weston ?

I'll do this.

> Does your version of mutter have this commit, by the way?
> https://git.gnome.org/browse/mutter/commit/
> ?id=50099c4c10c9311c6d0deb978d21637c9437c027

Not sure about the system installed one, but it was suspect on my git checkout. I have local patches that change this to pass frame information from page_flip_handler, but the performance is actually slightly worse than the current situation.

Comment 35 Christian Hergert 2016-03-15 19:08:46 UTC

Created attachment 324038 [details] [review]
wip patch to propagate presentation time

figured id upload what im playing with to get presentation timing. here is the cogl patch.

Comment 36 Christian Hergert 2016-03-15 19:11:11 UTC

Created attachment 324039 [details] [review]
wip for mutter presentation time

use cogl presentation time to notify clients

As Pekka pointed out, we probably want to implement presentation-time extension rather than using this, but clearly we have better performance on Weston with wl_callback so it should be possible.

Comment 37 Christian Hergert 2016-03-15 19:26:41 UTC

(In reply to Ray Strode [halfline] from comment #33)
> Can you instrument maybe_start_idle in gdkframeclockidle to print out what
> min_interval  it's using for mutter versus weston ?

I tried 3 setups:

 - mutter from F24
 - mutter/cogl/clutter from git, using above patches
 - weston git

All have min_interval consistently of 0.

Comment 38 Ray Strode [halfline] 2016-03-15 20:13:03 UTC

so I guess what's happening is min_next_frame_time is never getting initialized in gdk_frame_clock_paint_idle because the gdkwindow-wayland after paint handler is freezing the clock ?

Comment 39 Christian Hergert 2016-03-17 06:39:43 UTC

Created attachment 324154 [details] [review]
avoid dropping our shm-based surface on configure-event

Here is an easy one.

The memfd_create() was getting called not because the committed buffer had not yet been returned, but simply because a configure-event came in due to animation widget resizing. We unconditionally dropped the surfaces when we could reuse the existing ones if sizing did not actually change.

Comment 40 Christian Hergert 2016-03-17 08:19:26 UTC

Created attachment 324156 [details]
predicted presentation times

I'm going to add some updated patches for cogl and mutter, but I'm unconvinced that we need them yet. Simply applying the gdk-wayland patch above gets us almost all the way there.

The reason I'm suggesting not using the cogl and mutter patches will be obvious by looking at the attached graph.

Notice how the predicted timing is 25msec across the board. This is what happens when GdkFrameClock can't predict the timing and so it guesses "half-way between frame". 16.7+8.35=25.5

I've been playing around with various tweaks in mutter and cogl and so far have been unable to improve that. I thought that was the original problem and so I focused on it far longer than I'm happy to admit.

Comment 41 Christian Hergert 2016-03-17 08:21:28 UTC

Additionally, it is still unclear to me why Weston could deal with the memfd_create() on every frame better than Mutter. Perhaps it just has less code and so the extra mapping/shootdowns weren't a big enough issue yet.

Comment 42 Christian Hergert 2016-03-17 09:40:07 UTC

Created attachment 324158 [details] [review]
updated cogl patch for clock discovery

It still requires investigation to determine why this patch for cogl and the mutter patch cause frame timings to be invalid. But, figured I'd update this for future reference.

Comment 43 Christian Hergert 2016-03-17 09:41:25 UTC

Created attachment 324159 [details] [review]
updated mutter patch to pass cogl timing info to clients

Again, I don't think this is ready for commiting without understanding why it has the effect it does on the GdkFrameTimings. But here it is for future reference.

Comment 44 Ray Strode [halfline] 2016-03-17 13:20:59 UTC

Review of attachment 324154 [details] [review]:

Makes sense.  If we can avoid allocating the extra buffer in the first place, since the size isn't changing that then that seems like the right way to go to me!

Comment 45 Ray Strode [halfline] 2016-03-17 13:28:30 UTC

(In reply to Christian Hergert from comment #40)
> I'm going to add some updated patches for cogl and mutter, but I'm
> unconvinced that we need them yet. Simply applying the gdk-wayland patch
> above gets us almost all the way there.
> 
> The reason I'm suggesting not using the cogl and mutter patches will be
> obvious by looking at the attached graph.
> 
> Notice how the predicted timing is 25msec across the board. This is what
> happens when GdkFrameClock can't predict the timing and so it guesses
> "half-way between frame". 16.7+8.35=25.5
So one thing I don't understand is, doesn't comment 37 / comment 38 imply the timings aren't actually getting used at all? Granted, I haven't investigated in detail with gdb/etc, but from your comment and briefly looking at the code it seems like we freeze the clock any time we draw (on_frame_clock_after_paint), and we thaw it when we get a frame callback.  As soon as we thaw it we call maybe_start_idle which immediately dispatches the frame clock with a min_interval of 0. I don't think the frame clock timings are used (or am I missing something?)?

Comment 46 Ray Strode [halfline] 2016-03-17 13:30:12 UTC

One other thing I don't quite understand is why your scratch patch to allocate the 2nd buffer up front didn't help.

Comment 47 Ray Strode [halfline] 2016-03-17 13:51:13 UTC

(In reply to Christian Hergert from comment #42)
> Created attachment 324158 [details] [review] [review]
> updated cogl patch for clock discovery
> 
> It still requires investigation to determine why this patch for cogl and the
> mutter patch cause frame timings to be invalid. But, figured I'd update this
> for future reference.

maybe CLOCK_MONOTONIC_RAW versus CLOCK_MONOTONIC ?  why doesn't it just use g_get_monotonic_clock instead of calling clock_gettime directly btw?

Comment 48 Ray Strode [halfline] 2016-03-17 14:07:49 UTC

So i did a scratch patch (will attach it) just to get some timings with and without christian's fix:

a time to draw: 8.5ms
b time from wl_event_loop_dispatch until wl_surface_commit is called: 4.4ms
c time spent allocating the texture in mutter: 6ms
d time spent copying the buffer to the texture in mutter: 5ms

with patch

a time to draw: 3.8ms
b time from wl_event_loop_dispatch until wl_surface_commit is called: .055ms
c time spent allocating the texture in mutter: .004ms
d time spent copying the buffer to the texture in mutter: 5.7ms

Comment 49 Ray Strode [halfline] 2016-03-17 14:09:55 UTC

so a is 5ms shorter because we don't have to allocate the buffer before drawing
   b is 4ms shorter because the compositor doesn't have to import the new buffer
   c is a noop with the patch since the texture is already set
   d is about the same (which makes sense)

Comment 50 Ray Strode [halfline] 2016-03-17 14:12:08 UTC

Created attachment 324185 [details] [review]
mutter: hack in some timings

Comment 51 Ray Strode [halfline] 2016-03-17 14:13:13 UTC

Created attachment 324186 [details] [review]
gdk-wayland: hack in some timings

Comment 52 Ray Strode [halfline] 2016-03-17 14:24:10 UTC

two additional points, then i'll stop spamming this bug:

1) in comment 17 i said i was seeing 60 fps anyway, but turns out I just wasn't maximizing the window

2) the data in comment 48 shows why the timings were off.  Even though gtk+ was taking less than a frame to draw, mutter was taking 2/3's of a frame processing the new buffer, and we only have half a frame left

Comment 53 Matthias Clasen 2016-03-17 14:26:04 UTC

(In reply to Ray Strode [halfline] from comment #46)
> One other thing I don't quite understand is why your scratch patch to
> allocate the 2nd buffer up front didn't help.

this may just be because we were unconditionally throwing away both buffers in the configure event ?

Comment 54 Christian Hergert 2016-03-17 14:37:23 UTC

(In reply to Ray Strode [halfline] from comment #47)
> > It still requires investigation to determine why this patch for cogl and the
> > mutter patch cause frame timings to be invalid. But, figured I'd update this
> > for future reference.
> 
> maybe CLOCK_MONOTONIC_RAW versus CLOCK_MONOTONIC ?  why doesn't it just use
> g_get_monotonic_clock instead of calling clock_gettime directly btw?

I played with this about 20 different ways. Every clock type, fudging values, etc etc.

Comment 55 Ray Strode [halfline] 2016-03-17 19:01:59 UTC

(In reply to Matthias Clasen from comment #53)
> this may just be because we were unconditionally throwing away both buffers
> in the configure event ?

I guess that's possible, but sysprof showed the shmem fault overhead going away after his patch.

Comment 56 Ray Strode [halfline] 2016-03-17 19:06:03 UTC

(In reply to Ray Strode [halfline] from comment #45)
> I don't think the frame clock timings are used

So just to follow up here. We had a discussion with Owen on IRC.  The frame timings are intentionally unused by the frame clock for regular updates. frame rate is throttled by the compositor, so we don't need separate timers client side.  The frame timings are mainly for video player type applications such as this:

https://git.gnome.org/browse/gtk+/tree/tests/video-timer.c

Comment 57 Matthias Clasen 2016-03-22 02:08:44 UTC

Comment on attachment 324154 [details] [review]
avoid dropping our shm-based surface on configure-event

Leaving this bug open for some of the outstanding timing work

Comment 58 Matthias Clasen 2016-04-17 21:11:13 UTC

just going to close this after all