GNOME Bugzilla – Bug 712809
d3dvideosink: Shows corrupted output on NVIDIA GPUs due to wrong stride
Last modified: 2014-07-04 12:57:33 UTC
Created attachment 260431 [details] Screenshot of the corrupted image The D3D video sink shows a corrupted image with some media types. The files are always played this way, so it's not a race condition. Codec and container does not seem to matter. Happens to WebM and QuickTime containers. Most files don't work. JPEG never works, PNG always works. Tested with gst-play. Media files: http://ftp.nluug.nl/ftp/graphics/blender/apricot/trailer/sintel_trailer-480p.ogv http://video.blendertestbuilds.de/download.blender.org/peach/trailer_400p.ogg Works: $ ./gst-launch-1.0.exe videotestsrc ! d3dvideosink $ ./gst-play-1.0.exe sintel_trailer-480p.ogv $ ./gst-discoverer-1.0.exe sintel_trailer-480p.ogv Topology: container: Ogg audio: Vorbis video: Theora Properties: Duration: 0:00:52.250000000 Seekable: yes Tags: title: Sintel Trailer artist: Durian Open Movie Team copyright: (c) copyright Blender Foundation | durian.blender.org license: Creative Commons Attribution 3.0 license application name: ffmpeg2theora-0.24 encoder: Xiph.Org libtheora 1.1 20090822 (Thusnelda) video codec: Theora encoder version: 3 container format: Ogg audio codec: Vorbis nominal bitrate: 80000 bitrate: 80000 Does not work: $ ./gst-play-1.0.exe trailer_400p.ogg $ ./gst-discoverer-1.0.exe trailer_400p.ogg Topology: container: Ogg audio: Vorbis video: Theora Properties: Duration: 0:00:32.996875000 Seekable: yes Tags: application name: ffmpeg2theora 0.19 encoder: Xiph.Org libTheora I 20060526 3 2 0 video codec: Theora encoder version: 3 container format: Ogg audio codec: Vorbis nominal bitrate: 80000 bitrate: 80000
I understand this might be graphics-driver related. Is yours with an Nvidia driver by any chance?
Yes I have the 327.23 NVIDIA driver on GeForce GTX 570. This occurred on Windows 8.1 and 7.
I discovered that every video in I420 colorspace is displayed corrupted if the width is larger than 512. With UYVY, YUY2, NV12, BGRx, BGRA, RGB16, RGB15 videos it does however work. On the other hand those video formats do not work as well: YV12. That is quiet relevant since common decoders (such as avdec_h264 or vp8dec) do negotiate in I420 format be default. Does work: gst-launch-1.0 videotestsrc ! video/x-raw, format=I420, width=512 ! d3dvideosink gst-launch-1.0 videotestsrc ! video/x-raw, format=UYVY, width=513 ! d3dvideosink Does NOT work: gst-launch-1.0 videotestsrc ! video/x-raw, format=I420, width=513 ! d3dvideosink I have 314.22 NNVIDIA driver on GeForce GTX 580. (Direct3D API version 11)
The error does not occur on my notebook with GStreamer 1.2.2 on Windows 7 with 259.57 on GeForce GT230M. Can't test any recent driver on this hardware, since Sony does not ship one. Will try the 512 pixel theory on my desktop soon.
I just filed a bug on nvidia feedback platform, unfortunately I had to create an account and there is no public link. Some additional info: I noticed that the problem was happening by intervals, for example a width of 1->256 included will exhibit the stride issue, then 257->512 included will work, then 513->768 included will exhibit the problem etc.. My graphic card model is nvidia nvs 5100m, the version of the driver is 307.83, I can reproduce on both 32 and 64-bit windows ultimate, but interestingly enough not in a virtual box with the drivers included in the vbox guest additions iso. Setting status to NEW, but we might even want to close it as I don't think this is our bug.
Just a shot in the dark: https://gist.github.com/ndufresne/37133b0ebc64cfd7e272 For the reference if someone need it: msdn.microsoft.com/en-us/library/windows/desktop/dd206750(v=vs.85).aspx Everything is pretty standard, so I don't see why we where trying to do our own stride extrapolation. Let me know what that does (you might need to fix compilation).
(In reply to comment #6) > Just a shot in the dark: > https://gist.github.com/ndufresne/37133b0ebc64cfd7e272 > > For the reference if someone need it: > msdn.microsoft.com/en-us/library/windows/desktop/dd206750(v=vs.85).aspx > > Everything is pretty standard, so I don't see why we where trying to do our own > stride extrapolation. Let me know what that does (you might need to fix > compilation). The pitch is in bytes, the padding in the alignment is in pixels. I think your calculations mix those two units in incompatible ways. You divide by the pitch by the pixel stride, which gives you the number of pixels but will fail if the pitch is not a multiple of the pixel stride due to rounding errors. Consider the case of 3-byte RGB and a pitch of 4 (for width=1) as a simple case. I don't think you can safely use the alignment API here. But you're right that the problem is most likely in that lines of code. Someone should go through those calculations and compare them to the MS documentation.
The size calculations for the packed formats in the old code are wrong for one, but that shouldn't be the problem here. The combined I420/YV12 case is correct, including the plane swapping. The only potential problem I currently see here is that for NV12 the documentation talks about an *even* number of lines for the Y plane. Otherwise this all looks correct to me. Can someone give some pitch (and width, height, video format) values for the working and non-working cases?
(In reply to comment #7) > > The pitch is in bytes, the padding in the alignment is in pixels. I think your > calculations mix those two units in incompatible ways. You divide by the pitch > by the pixel stride, which gives you the number of pixels but will fail if the > pitch is not a multiple of the pixel stride due to rounding errors. Consider > the case of 3-byte RGB and a pitch of 4 (for width=1) as a simple case. I'm not mixing it, just being lazy, as anyway I can't test anything here. > > I don't think you can safely use the alignment API here. But you're right that > the problem is most likely in that lines of code. Someone should go through > those calculations and compare them to the MS documentation. No fully correct, we can check what power of two the alignment the stride uses, and set the stride alignment in the same structure. This way you don't need pixel/bytes conversion.
(In reply to comment #8) > The size calculations for the packed formats in the old code are wrong for one, > but that shouldn't be the problem here. > > The combined I420/YV12 case is correct, including the plane swapping. Where do you see in the documentation that U and V should be swapped ? > > The only potential problem I currently see here is that for NV12 the > documentation talks about an *even* number of lines for the Y plane. Otherwise > this all looks correct to me. VideoInfo adds some height padding for the odd case.
Ah yes, the stride alignment could be used here if the pitch is a power-of-two alignment (which it probably should, but who knows... it's Windows after all). Not sure how the alignment would help much here other than hiding the calculations though, the calculations themselves should be more or less the same. So would in any case be good if we understood first why it fails :) Note that I can't test it here either, for me all widths work just fine with I420.
(In reply to comment #10) > (In reply to comment #8) > > The size calculations for the packed formats in the old code are wrong for one, > > but that shouldn't be the problem here. > > > > The combined I420/YV12 case is correct, including the plane swapping. > > Where do you see in the documentation that U and V should be swapped ? YV12 is Y, V, U and I420 is Y, U, V. Windows supports YV12, so we swap here accordingly to support both. About that part I'm pretty sure, it works ;) > > > > The only potential problem I currently see here is that for NV12 the > > documentation talks about an *even* number of lines for the Y plane. Otherwise > > this all looks correct to me. > > VideoInfo adds some height padding for the odd case. It adds another *line* to the Y component if the height is odd? Really? That would be a weird default
(In reply to comment #12) > YV12 is Y, V, U and I420 is Y, U, V. Windows supports YV12, so we swap here > accordingly to support both. About that part I'm pretty sure, it works ;) Oh, didn't realized they had YV12 but not I420, sorry for the confusion. > It adds another *line* to the Y component if the height is odd? Really? That > would be a weird default At least this is my reading of it, would have to ask Wim, but I've seen the same in all V4L2 drivers. For the power of two thing, I would not guaranty everything will work in GStreamer we don't don't apply to some basic alignment rules, like an alignment being a power of 2 ;-P info->offset[1] = info->stride[0] * GST_ROUND_UP_2 (height); info->offset[2] = info->offset[1] + info->stride[1] * (GST_ROUND_UP_2 (height) / 2);
Ok, now I've realized I have slept on too hypothetical basis and am hiding the most important question with all my comments, my apologies. If someone could do that it would greatly help understanding what's going on (from Sebastian): Can someone give some pitch (and width, height, video format) values for the working and non-working cases?
Someone would be me I suppose :) I'll work on compiling the d3dvideosink on Windows next week, and provide some figures.
*** Bug 732109 has been marked as a duplicate of this bug. ***
Mathieu, any new insights here? We should really get this fixed before 1.4.0 if possible :) It's a regression from 1.0.
I had the same bug (https://bugzilla.gnome.org/show_bug.cgi?id=732109). I don't have the source so I cannot recompile it, but I master API Hooking so I can change its functionality, so no problem. You buddies talk about colorspace issue but I'm not familiarized with this but I'm with DirectX. Does anyone know where's this bug included? Any news here? If a patch is done when will be it released? Thanks!
(In reply to comment #18) > but I master API Hooking so I can change its functionality, so no problem. Not sure what you mean there? > Does anyone know where's this bug included? Not sure what you mean here either, the bug happens with some NVidia cards on Windows, it doesn't happen on virtual box. > If a patch is done when will be it released? Thanks! There are no patches for that bug, if the bug is indeed in gstreamer the patch will be submitted here, as you commented on the bug you are now in the CC list so you receive mail updates when a patch is proposed / a solution is found.
(In reply to comment #19) > (In reply to comment #18) > > but I master API Hooking so I can change its functionality, so no problem. > > Not sure what you mean there? > > > Does anyone know where's this bug included? > > Not sure what you mean here either, the bug happens with some NVidia cards on > Windows, it doesn't happen on virtual box. > > > If a patch is done when will be it released? Thanks! > > There are no patches for that bug, if the bug is indeed in gstreamer the patch > will be submitted here, as you commented on the bug you are now in the CC list > so you receive mail updates when a patch is proposed / a solution is found. First of all I've resolved the Bug, I don't wanna share it because I need the plugin sources/binaries for Windows in order to make some tests, but It works in all the computers that show corrupted video frames. I talk about API hooking because I don't have the plugin's source, so with Reversing skills I've hooked some relevant functions and now works like a charm. OF course I want to share the fix with you all, so if anyone could share the binaries with me? I don't wanna HOOK any function if I can change the plugins functionality from its roots. Thanks! Neither the pitch, nor width, nor height, loook at StretchRect ;)
You can get the binaries from http://gstreamer.freedesktop.org/data/pkg/windows/1.3.90/ and the sources from http://cgit.freedesktop.org/gstreamer/gst-plugins-bad/tree/sys/d3dvideosink What's wrong with StretchRect? :)
Well it isn't StretchRect it was my mistake, I build many patches for many functions and API, and I was excited I fixed it. Now with more calm I have debugged it well, also deactivated some hooks and I've found it. Here's the bug: function: gst_d3dsurface_buffer_pool_alloc_buffer module: d3dhelpers.c In the instruction IDirect3DSurface9_LockRect(surface, &lr, NULL, D3DLOCK_READONLY); instead of D3DLOCK_READONLY pass D3DLOCK_DISCARD and it will fix it. This is crazy because I don't have the source of libgstd3dvideosink.dll so I debugged it with OllyDbg and got the function pointer (0x6D103CB0) of gst_d3dsurface_buffer_pool_alloc_buffer so I reimplement that function in runtime via inline hooking, so finally I got it working :) That does not mean that will work for all of you, idk. Now I've compiled the source of d3dvideosink but when gst tries to load the plugin, it throws me an error saying that the plugin could not be loaded. Hope your help me, I have some ideas for gaining performance on this.
Hmm, in that function we only lock the surface to get the pitch. So maybe the problem is that the pitch is different depending on the flags that are used, or that a surface that was once locked READONLY can't later be locked with WRITE support. The first case seems more likely right now and should be easy to confirm. In both cases we should probably always lock the surface with the same flags, i.e. 0.
Created attachment 279725 [details] Patched d3dvideosink plugin DLL This fix includes the patch of IDirect3DSurface9::LockRect D3DLOCK param issue.
Created attachment 279726 [details] ollydbg patch screenshot
Well today I built an asm patch with OllyDbg that fixes the error. When playing a mp4 video with gst-launch-1.0 with d3dvideosink ORIGINAL plugin, output is corrupted. But when played with PATCHED plugin it behaves correctly. Tested on 2 Windows 7 SP1 (first is x86 and second x64) machines with the same bug, it appears that belongs to GT familiy NVidia cards. Both machines now can play mp4 videos without problems and without hooks ;) I leave the DLL patch here as long as the OllyDbg patch proofs: OLD: Address Command 6D103E6A MOV DWORD PTR SS:[ESP+0C],10 NEW: Address Command 6D103E6A MOV DWORD PTR SS:[ESP+0C],2000 OllyDbg Screenshot -> https://bugzilla.gnome.org/attachment.cgi?id=279726 Patched libgstd3dvideosink.dll -> https://bugzilla.gnome.org/attachment.cgi?id=279725 Please confirm if this fixes the issue permanently. P.D = I've compiled d3dvideosink sources but when gst-launch tries to load the plugin it throws me the following error: (gst-launch-1.0:3516): GStreamer-WARNING **: Plugin description for 'C:\gstreame r\1.0\x86\lib\gstreamer-1.0\libgstd3dvideosink.dll' has no valid origin field. HELP ME :)
Created attachment 279738 [details] libgstd3dvideosink.debug.dll Someone who can reproduce this issue please replace libgstd3dvideosink.dll from lib/gstreamer-1.0 with this one here and then run gst-launch-1.0 videotestsrc num-buffers=1 ! "video/x-raw,..." ! d3dvideosink Once with a width that works, once with one that doesn't. And the provide the output it prints to stdout here (some "map lock" and "alloc lock" lines).
Created attachment 279739 [details] libgstd3dvideosink.noreadonly.debug.dll And the same here, this one might work in all cases. Note: these are 64 bit binaries. If someone needs 32 bit binaries I can also provide them.
Sebastian I wanna try your debug d3dvideosink DLLs on my 32 bit Win7 machine, so please submit an attachment with both fixes as fast as you can. What fixes libgstd3dvideosink.debug.dll? Anything new? The other DLL includes my fix, doesn't it? Also, I'll be so glad if you have a Win32 VC project of this plugin :P I keep getting this message: GStreamer-WARNING **: Plugin description for 'C:\gstreame r\1.0\x86\lib\gstreamer-1.0\libgstd3dvideosink.dll' has no valid origin field.
> Also, I'll be so glad if you have a Win32 VC project of this plugin :P I keep > getting this message: GStreamer-WARNING **: Plugin description for 'C:\gstreame > r\1.0\x86\lib\gstreamer-1.0\libgstd3dvideosink.dll' has no valid origin field. This is what happens when trying to load my d3dvidesoink compiled dll.
(In reply to comment #29) > Sebastian I wanna try your debug d3dvideosink DLLs on my 32 bit Win7 machine, > so please submit an attachment with both fixes as fast as you can. > > What fixes libgstd3dvideosink.debug.dll? Anything new? The other DLL includes > my fix, doesn't it? The first one just prints some debug output to stdout, the second includes your fix (not completely though, it uses 0 instead of DISCARD as DISCARD seems semantically wrong here) and also prints some debug output. I would like to see that debug output for both versions as that will hopefully allow to understand what exactly the problem here is, and also I would like to know if the second fixes the problem.
(In reply to comment #30) > > Also, I'll be so glad if you have a Win32 VC project of this plugin :P I keep > > getting this message: GStreamer-WARNING **: Plugin description for 'C:\gstreame > > r\1.0\x86\lib\gstreamer-1.0\libgstd3dvideosink.dll' has no valid origin field. > > This is what happens when trying to load my d3dvidesoink compiled dll. This sounds like it doesn't include win32/common/config.h properly, which is needed to set origin and some other fields of the plugin. I'm not compiling with MSVC.
(In reply to comment #31) >I would like to see that debug output for both versions as that will hopefully > allow to understand what exactly the problem here is, and also I would like to > know if the second fixes the problem. Upload both 32 bit version of the DLL so I can test on my x86 machine. (In reply to comment #32) > This sounds like it doesn't include win32/common/config.h properly, which is > needed to set origin and some other fields of the plugin. I'm not compiling > with MSVC. Thanks man! that was it (config.h). Now I can change whatever I want without inline hooking. I'm going to give some performance on this, you made my day.
Created attachment 279818 [details] libgstd3dvideosink.debug.32.dll
Created attachment 279819 [details] libgstd3dvideosink.noreadonly.debug.32.dll There you go. Would be great if someone could take some minutes to test these so we can finally get rid of this bug :)
Both uploaded DLL are named equal -> libgstd3dvideosink.NOREADONLY.debug.32.dll Please upload the libgstd3dvideosink.debug.32.dll Also the noreadonly.dll doesn't work properly. It throws the following error when trying to reproduce a mp4 file with gst-launch (note that I've renamed it to libgstd3dvideosink.dll): (gst-launch-1.0:5844): GStreamer-WARNING **: Failed to load plugin 'C:\gstreamer \1.0\x86\lib\gstreamer-1.0\libgstd3dvideosink.dll': 'C:\gstreamer\1.0\x86\lib\gs treamer-1.0\libgstd3dvideosink.dll': Could not find the specified process. WARNING: erroneous pipeline: no element "d3dvideosink". I would recommend to you to update the plugin to the 9Ex version, but I imagine that you want the plugin running on Win XP. I'm working with C++ and .NET (WPF) interop for rendering the backbuffer over there, so I need to set D3DCREATE_MULTITHREADED in d3dvideosink device creation or it crashes (that's explained on the msdn - main thread issues -). Thanks Sebastian you help me a lot ;)
Created attachment 279848 [details] Crash when trying to load libgstd3dvideosink.noreadonly.debug.32
You are apparently not using GStreamer 1.3.90 :) That's why it fails and why all the uploaded .dlls will fail. Would you consider opening another bug for the D3DCREATE_MULTITHREADED issue? We're considering to drop Windows XP support for the next major release, 1.6.
I can reproduce the issue on my Dell XPS 15 on Win 8.1 with Optimus graphics (GeForce GT 750M) with gst-launch-1.0.exe videotestsrc ! video/x-raw,width=637,height=480 ! d3dvideosink and dropping in the noreadonly.dll version fixes it for me. The format affected and negotiated is I420 in this case.
Created attachment 279855 [details] log file with libgstd3dvideosink.debug.dll (broken)
Created attachment 279856 [details] log file with libgstd3dvideosink.noreadonly.debug.dll (works)
Thanks for testing and confirming my suspicion. And thanks to voskater15 for giving the hint where to look for the problem! commit c134930dbed62453fa8ab5b4713db84bfc209de0 Author: Sebastian Dröge <sebastian@centricular.com> Date: Thu Jul 3 19:10:26 2014 +0200 d3dvideosink: Always lock the D3D surfaces in write mode Locking them in readonly mode can give different stride to mapping in write mode, which then causes rendering to be broken. Happened on all (many?) NVIDIA GPUs. Thanks to voskater15@gmail.com for hinting at the problem. https://bugzilla.gnome.org/show_bug.cgi?id=712809
With the working noreadonly DLL GStreamer format=YV12 is broken though (it's I420 with chroma planes swapped), all other formats are fine. Should be an easy fix (and internally it maps I420 to D3DFMT_YV12 anyway).
commit 73c40a3132b09adaa6abf0c0a1c10e3522983d51 Author: Sebastian Dröge <sebastian@centricular.com> Date: Thu Jul 3 19:05:22 2014 +0200 d3dhelpers: Swap UV planes properly for YV12 as compared to I420 If we only do it in one place colors will look funny.
Glad to see that the issue has been solved. I've upgraded to 9Ex and I've gained performance that translates into less CPU consume (around 40-50%), tested on 3 machines so I think that is going to be a good point when you guys upgrade to 9Ex.
Can you provide some details about what you did? Maybe even provide a patch? :)