GNOME Bugzilla – Bug 795462
video: Add support for the 10bit yuv pixel format from the Rockchip SoC platform
Last modified: 2018-05-19 17:37:35 UTC
Created attachment 371229 [details] [review] Only unpack work version This pixel format is a variant of P010_10LE, without any filled bits between pixels in a stride. You can download the raw dump here, and i420_10le from the software decoder. Also the P010_10LEC to i420_10le results are uploaded. https://drive.google.com/drive/folders/1VceFGn67E4e_zzq5---85caotY_h3REp?usp=sharing
Review of attachment 371229 [details] [review]: Ok, so this is fully packet 10bit (no padding). Please correct me, but for the Y plane you'll have, 4 pixels per 5bytes: 5bytes: Y1 0-9, Y2 10-19, Y3 20-29, Y4 20-39 5bytes: Y5 ... And for UV plane, you'll have 2 pixels per 5 bytes: 5bytes: U1V1: 0-19, U2V2: 20-39 5bytes: U3V3 ... If that is correct, my suggestion is to create a state machine with 5 states, and then combine to reduce the algo. That's what I did for the 32bit packed variants. I didn't unroll the loops though, as it's just for debugging. Then the state machine can be revered for the pack function. ::: gst-libs/gst/video/video-format.c @@ +5348,3 @@ GST_MAKE_FOURCC ('X', 'V', '2', '0'), DPTH10_10_10, PSTR0, PLANE011, OFFS001, SUB422, PACK_NV16_10LE32), + MAKE_YUV_C_LE_FORMAT (P010_10LEC, "raw video", 0x00000000, DPTH10_10_10, For the fourcc, as you can see for the Xilinx variant, we have used the fourcc used by the vendor (X is for Xilinx). May I suggest using something like RV20, or RK20, or if you prefer it could be RV12, as long as it's not already used. I like the idea to use two letter RK.., makes the origin pretty clear ;-P ::: gst-libs/gst/video/video-format.h @@ +114,3 @@ * @GST_VIDEO_FORMAT_Y444_12BE: planar 4:4:4 YUV, 12 bits per channel (Since: 1.12) * @GST_VIDEO_FORMAT_Y444_12LE: planar 4:4:4 YUV, 12 bits per channel (Since: 1.12) + * @GST_VIDEO_FORMAT_P010_10LEC: planar 4:2:0 YUV with interleaved UV plane, 10 bits per channel, a compat variant of P010 I'd use the "Fully packed variant of .." when describing this. Though, P010 is already a variant of NV12, so may I proposed the following name ? NV12_10LE40 Which reads NV12 layout (one Y plane and one UV plane), 10bit per component, packed over 40bits (hence no padding). ::: gst-libs/gst/video/video-info.c @@ +1030,3 @@ break; + case GST_VIDEO_FORMAT_P010_10LEC: + info->stride[0] = GST_ROUND_UP_64 (width * 10 / 8); Shouldn't this roundup to 5, while rounding up to 64 is hardware specific ? Extra alignment could be used by default if it more performant for the pack/unpack functions. Though, these function are likely just for debugging, the HW will take care of it in normal usage. @@ +1034,3 @@ + info->offset[0] = 0; + info->offset[1] = info->stride[0] * GST_ROUND_UP_16 (height); + cr_h = GST_ROUND_UP_16 (height) / 2; CbCr plane (or UV plane) only need to be ROUND_UP_2 (so the odd last line of the Y plane have a line of UV), 16 seems like an H264/H265 specific (16x16 macro blocks).
Created attachment 371243 [details] [review] A state machine implementation sample I wonder whether it would get a better performance with the helper of the state machine, but the code looks better. About the packing function, unless writing 4 pixel at the same time, you need to read back the data written.
Review of attachment 371243 [details] [review]: It's nicer, but not very efficient if I read correctly, since you are reading each 16bit short twice.
Created attachment 371312 [details] [review] Both pack and unpack work Would you accept this version?
Review of attachment 371312 [details] [review]: ::: gst-libs/gst/video/video-info.c @@ +1030,3 @@ break; + case GST_VIDEO_FORMAT_NV12_10LE40: + info->stride[0] = ((width * 5 >> 2) + 4) & (~4); As per IRC, you said you wanted to round up by 4, but this roundup technique only applies to power of two. You'll need * / instead.
Review of attachment 371312 [details] [review]: ::: gst-libs/gst/video/video-format.h @@ +114,3 @@ * @GST_VIDEO_FORMAT_Y444_12BE: planar 4:4:4 YUV, 12 bits per channel (Since: 1.12) * @GST_VIDEO_FORMAT_Y444_12LE: planar 4:4:4 YUV, 12 bits per channel (Since: 1.12) + * @GST_VIDEO_FORMAT_NV12_10LE40: Fully packed variant of NV12_10LE32 As you made me notice, (Since 1.16) please.
Created attachment 371346 [details] [review] The final version?
Review of attachment 371346 [details] [review]: Looks good, thanks.
commit 35d0783fca504997e1be30cfebfa8de042ea49ab (HEAD -> master, origin/master, origin/HEAD) Author: ayaka <ayaka@soulik.info> Date: Sun Mar 26 04:54:42 2017 +0800 video: Add NV12_10LE40 pixel format This pixel format is a fully packed variant of NV12, a luma pixel would take 10bits in memory, without any filled bits between pixels in a stride. The color range follows the BT.2020 standard. In order to get a performance in hardware memory operation, it may expend the stride, append zero data at the end of echo lines. Signed-off-by: ayaka <ayaka@soulik.info> https://bugzilla.gnome.org/show_bug.cgi?id=795462
I forget to run make check, it fails the generic unit test: commit 9a3ee4838f2ab3af85ac003eab22be27f4c0392d (HEAD -> master, origin/master, origin/HEAD) Author: Nicolas Dufresne <nicolas.dufresne@collabora.com> Date: Tue Apr 24 17:05:17 2018 -0400 Revert "video: Add NV12_10LE40 pixel format" This reverts commit 35d0783fca504997e1be30cfebfa8de042ea49ab.
Created attachment 371357 [details] [review] Fix the compiler warnings
0:02:07.808667125 11220 0x646430 ERROR video-info video-info.c:694:fill_planes: Frame size 2147483647x1073741823 would overflow Breakpoint 1, fill_planes (info=0x7fffffffd5f0) at video-info.c:694 694 GST_ERROR ("Frame size %ux%u would overflow", info->width, info->height); (gdb) print info->finfo->format $7 = GST_VIDEO_FORMAT_ARGB (gdb) continue Continuing. 0:02:10.782590916 11220 0x646430 ERROR video-info video-info.c:694:fill_planes: Frame size 1073741823x1073741823 would overflow Breakpoint 1, fill_planes (info=0x7fffffffd5f0) at video-info.c:694 694 GST_ERROR ("Frame size %ux%u would overflow", info->width, info->height); (gdb) print info->finfo->format $8 = GST_VIDEO_FORMAT_ARGB (gdb) continue Continuing. 0:02:13.972354418 11220 0x646430 ERROR video-info video-info.c:694:fill_planes: Frame size 2147483647x214748364$ would overflow Breakpoint 1, fill_planes (info=0x7fffffffd5f0) at video-info.c:694 694 GST_ERROR ("Frame size %ux%u would overflow", info->width, info->height); (gdb) print info->finfo->format $9 = GST_VIDEO_FORMAT_ARGB (gdb) continue Continuing. 0:02:17.165945971 11220 0x646430 ERROR video-info video-info.c:694:fill_planes: Frame size 2147483647x214748364$ would overflow Breakpoint 1, fill_planes (info=0x7fffffffd5f0) at video-info.c:694 694 GST_ERROR ("Frame size %ux%u would overflow", info->width, info->height); (gdb) print info->finfo->format $10 = GST_VIDEO_FORMAT_ARGB (gdb) continue Continuing. 0:02:22.672242129 11220 0x646430 ERROR video-info video-info.c:694:fill_planes: Frame size 1073741824x1 would o$erflow Does log show that it is not my pixel format cause the bug?
These are expected errors in the test. What fails is a binary diff, and it dumps the values.
Created attachment 371507 [details] [review] video: Add NV12_10LE40 pixel format This pixel format is a fully packed variant of NV12_10LE32, a luma pixel would take 10bits in memory, without any filled bits between pixels in a stride. The color range follows the BT.2020 standard. In order to get a better performance in hardware memory operation, it may expend the stride, append zero data at the end of echo lines. Pack function by Nicolas Dufresne <nicolas@ndufresne.ca>
Review of attachment 371507 [details] [review]: 4989 if (!(flags & GST_VIDEO_PACK_FLAG_TRUNCATE_RANGE)) { 4990 Yn |= Yn >> 10; 4991 Un |= Un >> 10; 4992 Vn |= Vn >> 10; 4993 } In the case 1 and 3, the Yn has modified with preivous operation, then first 10 bits may be modified again. That is why comment those code when I sent it to you, I want solve this problem later. About pack function part, it would still generate a different result than hardware decoder and I convert from a i420_10le manually. Maybe it won't be a problem in a small resolution.
Created attachment 371987 [details] [review] Fix the unpack function, it won't shift chroma value twice The way to fill truncate range is strange. Yn |= Yn >> 10; It would make the 4bit from MSB to and with the 4bit above LSB. That is why the unpack and pack manual test never match the result from the software decoder. But it looks all the previous pixel format are ok with that. So I won't fix it this time. And it passes the test.
Review of attachment 371987 [details] [review]: I believe this is ready now, thanks a lot!
commit 388b48511e90e008138e1842640b76934bd891dc (HEAD -> master) Author: ayaka <ayaka@soulik.info> Date: Sat Apr 28 20:22:31 2018 -0400 video: Add NV12_10LE40 pixel format This pixel format is a fully packed variant of NV12_10LE32, a luma pixel would take 10bits in memory, without any filled bits between pixels in a stride. The color range follows the BT.2020 standard. In order to get a better performance in hardware memory operation, it may expend the stride, append zero data at the end of echo lines. Pack function by Nicolas Dufresne. https://bugzilla.gnome.org/show_bug.cgi?id=795462 Signed-off-by: Nicolas Dufresne <nicolas@ndufresne.ca> Signed-off-by: ayaka <ayaka@soulik.info>