GNOME Bugzilla – Bug 700377
video: add NV16 pixel format support
Last modified: 2013-05-27 09:55:33 UTC
Created attachment 244298 [details] [review] Add NV16 pixel format support Some hardware decoders use NV16 instead of NV12 for 4:2:2 sampling, the attached patch implements support for this pixel format.
commit af24e238802112ae7072a8246a05c5df2ccfcc7c Author: Arnaud Vrac <avrac@freebox.fr> Date: Mon Nov 26 16:37:22 2012 +0100 video: add NV16 format This format is usually used by hardware video decoders for 4:2:2 sampling https://bugzilla.gnome.org/show_bug.cgi?id=700377
I get build errors after this commit (presumably because I don't have a recent enough version of Orc installed). See bug #700400.
'make check' results in two failures related to this commit here: video_orc_unpack_NV16: dest array 0 bad 0 0: 78 2bd6 -> d6d678ff 2bd678ff * 1 0: 2c 2bcb -> 2b2b2cff 2bd62cff * 2 0: c0 fef9 -> cbcbc0ff 2bcbc0ff * 3 0: 06 a19b -> 2b2b06ff 2bcb06ff * 4 0: 52 82cb -> f9f952ff fef952ff * 5 0: a3 11aa -> fefea3ff fef9a3ff * 6 0: 15 94d2 -> 9b9b15ff a19b15ff * 7 0: 70 b8da -> a1a170ff a19b70ff * 8 0: fc e45b -> cbcbfcff 82cbfcff * 9 0: 2d 2bb0 -> 82822dff 82cb2dff * 10 0: eb 736a -> aaaaebff 11aaebff * 11 0: be 332b -> 1111beff 11aabeff * 12 0: 7b cd68 -> d2d27bff 94d27bff * 13 0: 4a 6e47 -> 94944aff 94d24aff * 14 0: 80 068b -> dada80ff b8da80ff * 15 0: b4 a650 -> b8b8b4ff b8dab4ff * 16 0: eb 7505 -> cbcbebff e45bebff * 17 0: 1a f0e7 -> 82821aff e45b1aff * 18 0: 4f 33ec -> aaaa4fff 2bb04fff * 19 0: df 85d5 -> 1111dfff 2bb0dfff * 20 0: 3d 52ec -> d2d23dff 736a3dff * 21 0: 66 6093 -> 949466ff 736a66ff * 22 0: 2e ddae -> dada2eff 332b2eff * 23 0: 86 9b95 -> b8b886ff 332b86ff * 24 0: b2 9407 -> 5b5bb2ff cd68b2ff * 25 0: 6a 8a5a -> e4e46aff cd686aff * 26 0: 27 64e1 -> b0b027ff 6e4727ff * 27 0: 50 9435 -> 2b2b50ff 6e4750ff * 28 0: 16 a8ea -> 6a6a16ff 068b16ff * 29 0: 64 d5d9 -> 737364ff 068b64ff * 30 0: 16 22fe -> 2b2b16ff a65016ff * 31 0: b2 76a6 -> 3333b2ff a650b2ff * 32 0: bd b755 -> 5b5bbdff 7505bdff * 33 0: dd c7b5 -> e4e4ddff 7505ddff * 34 0: 06 ee6c -> b0b006ff f0e706ff * 35 0: 1e 21a5 -> 2b2b1eff f0e71eff * 36 0: 3f 27b5 -> 6a6a3fff 33ec3fff * 37 0: ba 7e23 -> 7373baff 33ecbaff * 38 0: 4c 9bfe -> 2b2b4cff 85d54cff * 39 0: ed d53b -> 3333edff 85d5edff * 40 0: f2 17a1 -> 6868f2ff 52ecf2ff * 41 0: 02 3662 -> cdcd02ff 52ec02ff * 42 0: 61 7871 -> 474761ff 609361ff * 43 0: 08 aa3f -> 6e6e08ff 609308ff * 44 0: 2c e55f -> 8b8b2cff ddae2cff * 45 0: 6c c83d -> 06066cff ddae6cff * 46 0: 81 cff0 -> 505081ff 9b9581ff * 47 0: 5a 17d2 -> a6a65aff 9b955aff * 48 0: 35 aa5f -> 686835ff 940735ff * 49 0: b0 278c -> cdcdb0ff 9407b0ff * 50 0: 04 6891 -> 474704ff 8a5a04ff * 51 0: f6 6ee0 -> 6e6ef6ff 8a5af6ff * 52 0: 0e babd -> 8b8b0eff 64e10eff * 53 0: 8a e3b2 -> 06068aff 64e18aff * 54 0: 7f 06d7 -> 50507fff 94357fff * 55 0: 07 5ca3 -> a6a607ff 943507ff * 56 0: e5 26c1 -> 0505e5ff a8eae5ff * 57 0: 8e a81e -> 75758eff a8ea8eff * 58 0: 9a e631 -> e7e79aff d5d99aff * 59 0: 7b 6c22 -> f0f07bff d5d97bff * 60 0: 53 3c60 -> ecec53ff 22fe53ff * 61 0: af becd -> 3333afff 22feafff * 62 0: af 4578 -> d5d5afff 76a6afff * 63 0: 6f 83ac -> 85856fff 76a66fff * 64 0: 5b 04bb -> 05055bff b7555bff * 65 0: 8d 89c4 -> 75758dff b7558dff * 66 0: 28 d973 -> e7e728ff c7b528ff * .global video_orc_unpack_NV16 .p2align 4 video_orc_unpack_NV16: # 1: loadpb # loading constant -1 0xffffffff # LOOP SHIFT 2 # 0: loadupdb # 2: loadb # 3: mergebw # 4: mergewl # 5: storel # 0: loadupdb # 2: loadb # 3: mergebw # 4: mergewl # 5: storel # LOOP SHIFT 2 # 0: loadupdb # 2: loadb # 3: mergebw # 4: mergewl # 5: storel # LOOP SHIFT 1 # 0: loadupdb # 2: loadb # 3: mergebw # 4: mergewl # 5: storel # LOOP SHIFT 0 # 0: loadupdb # 2: loadb # 3: mergebw # 4: mergewl # 5: storel pcmpeqb %xmm0, %xmm0 movl 8(%rdi), %ecx mov %ecx, %eax sar $3, %ecx movl %ecx, 16(%rdi) and $7, %eax movl %eax, 20(%rdi) mov 24(%rdi), %rax mov 56(%rdi), %rdx mov 64(%rdi), %rsi 1: cmp $0, 16(%rdi) jz 3f movl 16(%rdi), %r8d .p2align 4 2: movd 0(%rsi), %xmm1 punpcklwd %xmm1, %xmm1 movd 0(%rdx), %xmm2 movdqu %xmm0, %xmm3 punpcklbw %xmm2, %xmm3 punpcklwd %xmm1, %xmm3 movdqu %xmm3, 0(%rax) movd 4(%rsi), %xmm1 punpcklwd %xmm1, %xmm1 movd 4(%rdx), %xmm2 movdqu %xmm0, %xmm3 punpcklbw %xmm2, %xmm3 punpcklwd %xmm1, %xmm3 movdqu %xmm3, 16(%rax) leaq 32(%rax), %rax leaq 8(%rdx), %rdx leaq 8(%rsi), %rsi add $-1, %r8d jnz 2b 3: testl $4, 20(%rdi) jz 10f movd 0(%rsi), %xmm1 punpcklwd %xmm1, %xmm1 movd 0(%rdx), %xmm2 movdqu %xmm0, %xmm3 punpcklbw %xmm2, %xmm3 punpcklwd %xmm1, %xmm3 movdqu %xmm3, 0(%rax) leaq 16(%rax), %rax leaq 4(%rdx), %rdx leaq 4(%rsi), %rsi 10: testl $2, 20(%rdi) jz 9f pinsrw $0, 0(%rsi), %xmm1 punpcklwd %xmm1, %xmm1 pxor %xmm2, %xmm2 pinsrw $0, 0(%rdx), %xmm2 movdqu %xmm0, %xmm3 punpcklbw %xmm2, %xmm3 punpcklwd %xmm1, %xmm3 movq %xmm3, 0(%rax) leaq 8(%rax), %rax leaq 2(%rdx), %rdx leaq 2(%rsi), %rsi 9: testl $1, 20(%rdi) jz 8f movzx 0(%rsi), %ecx movd %ecx, %xmm1 punpcklwd %xmm1, %xmm1 movzx 0(%rdx), %ecx movd %ecx, %xmm2 movdqu %xmm0, %xmm3 punpcklbw %xmm2, %xmm3 punpcklwd %xmm1, %xmm3 movd %xmm3, 0(%rax) leaq 4(%rax), %rax leaq 1(%rdx), %rdx leaq 1(%rsi), %rsi 8: retq dest array 0 bad 0 0: 70 b6d5 -> d5d570ff b6d570ff * 1 0: 39 addc -> b6b639ff b6d539ff * 2 0: 60 d8dc -> dcdc60ff addc60ff * 3 0: f7 bcd7 -> adadf7ff addcf7ff * 4 0: 50 1e87 -> dcdc50ff d8dc50ff * 5 0: de 83e1 -> d8d8deff d8dcdeff * 6 0: 75 a8f4 -> d7d775ff bcd775ff * 7 0: 0b 8f76 -> bcbc0bff bcd70bff * 8 0: 97 3d14 -> 878797ff 1e8797ff * 9 0: c9 2769 -> 1e1ec9ff 1e87c9ff * 10 0: e2 1a26 -> e1e1e2ff 83e1e2ff * 11 0: 58 f2c7 -> 838358ff 83e158ff * 12 0: 05 7cf7 -> f4f405ff a8f405ff * 13 0: 76 656b -> a8a876ff a8f476ff * 14 0: ab efdb -> 7676abff 8f76abff * 15 0: 3a f261 -> 8f8f3aff 8f763aff * 16 0: fb 5569 -> 8787fbff 3d14fbff * 17 0: a2 f2fc -> 1e1ea2ff 3d14a2ff * 18 0: 9e b5fa -> e1e19eff 27699eff * 19 0: e3 4eb5 -> 8383e3ff 2769e3ff * 20 0: 80 c9e1 -> f4f480ff 1a2680ff * 21 0: 60 f1ca -> a8a860ff 1a2660ff * 22 0: 40 4667 -> 767640ff f2c740ff * 23 0: d2 ee93 -> 8f8fd2ff f2c7d2ff * 24 0: 11 e98d -> 141411ff 7cf711ff * 25 0: fd 70a2 -> 3d3dfdff 7cf7fdff * 26 0: 79 4681 -> 696979ff 656b79ff * 27 0: 14 65a2 -> 272714ff 656b14ff * 28 0: 2c 50dc -> 26262cff efdb2cff * 29 0: 95 eaec -> 1a1a95ff efdb95ff * 30 0: ff a7a1 -> c7c7ffff f261ffff * 31 0: 3e 75ea -> f2f23eff f2613eff * 32 0: 9b a4f6 -> 14149bff 55699bff * 33 0: 72 c42c -> 3d3d72ff 556972ff * 34 0: 82 26a0 -> 696982ff f2fc82ff * 35 0: 25 894c -> 272725ff f2fc25ff * 36 0: 81 1b40 -> 262681ff b5fa81ff * 37 0: 28 d282 -> 1a1a28ff b5fa28ff * 38 0: 9c cd4f -> c7c79cff 4eb59cff * 39 0: 64 3a06 -> f2f264ff 4eb564ff * 40 0: 26 f2da -> f7f726ff c9e126ff * 41 0: 71 d229 -> 7c7c71ff c9e171ff * 42 0: 77 72fe -> 6b6b77ff f1ca77ff * 43 0: 8b cc32 -> 65658bff f1ca8bff * 44 0: 82 f623 -> dbdb82ff 466782ff * 45 0: c9 edf9 -> efefc9ff 4667c9ff * 46 0: 3c 35f8 -> 61613cff ee933cff * 47 0: 27 b145 -> f2f227ff ee9327ff * 48 0: 8b fd32 -> f7f78bff e98d8bff * 49 0: c8 3be7 -> 7c7cc8ff e98dc8ff * 50 0: 3f 0404 -> 6b6b3fff 70a23fff * 51 0: 76 0393 -> 656576ff 70a276ff * 52 0: 3e 6c5d -> dbdb3eff 46813eff * 53 0: 43 3e81 -> efef43ff 468143ff * 54 0: eb 18e4 -> 6161ebff 65a2ebff * 55 0: e7 0cc8 -> f2f2e7ff 65a2e7ff * 56 0: 6f b1b4 -> 69696fff 50dc6fff * 57 0: 23 6475 -> 555523ff 50dc23ff * 58 0: 6f 76d7 -> fcfc6fff eaec6fff * 59 0: 53 bf6e -> f2f253ff eaec53ff * 60 0: 52 c884 -> fafa52ff a7a152ff * 61 0: 00 8709 -> b5b500ff a7a100ff * 62 0: 25 7118 -> b5b525ff 75ea25ff * 63 0: fe 3e6b -> 4e4efeff 75eafeff * .global video_orc_unpack_NV16 .p2align 4 video_orc_unpack_NV16: # 1: loadpb # loading constant -1 0xffffffff # LOOP SHIFT 2 # 0: loadupdb # 2: loadb # 3: mergebw # 4: mergewl # 5: storel # 0: loadupdb # 2: loadb # 3: mergebw # 4: mergewl # 5: storel # LOOP SHIFT 2 # 0: loadupdb # 2: loadb # 3: mergebw # 4: mergewl # 5: storel # LOOP SHIFT 1 # 0: loadupdb # 2: loadb # 3: mergebw # 4: mergewl # 5: storel # LOOP SHIFT 0 # 0: loadupdb # 2: loadb # 3: mergebw # 4: mergewl # 5: storel pcmpeqb %xmm0, %xmm0 movl 8(%rdi), %ecx mov %ecx, %eax sar $3, %ecx movl %ecx, 16(%rdi) and $7, %eax movl %eax, 20(%rdi) mov 24(%rdi), %rax mov 56(%rdi), %rdx mov 64(%rdi), %rsi 1: cmp $0, 16(%rdi) jz 3f movl 16(%rdi), %r8d .p2align 4 2: movd 0(%rsi), %xmm1 punpcklwd %xmm1, %xmm1 movd 0(%rdx), %xmm2 movdqu %xmm0, %xmm3 punpcklbw %xmm2, %xmm3 punpcklwd %xmm1, %xmm3 movdqu %xmm3, 0(%rax) movd 4(%rsi), %xmm1 punpcklwd %xmm1, %xmm1 movd 4(%rdx), %xmm2 movdqu %xmm0, %xmm3 punpcklbw %xmm2, %xmm3 punpcklwd %xmm1, %xmm3 movdqu %xmm3, 16(%rax) leaq 32(%rax), %rax leaq 8(%rdx), %rdx leaq 8(%rsi), %rsi add $-1, %r8d jnz 2b 3: testl $4, 20(%rdi) jz 10f movd 0(%rsi), %xmm1 punpcklwd %xmm1, %xmm1 movd 0(%rdx), %xmm2 movdqu %xmm0, %xmm3 punpcklbw %xmm2, %xmm3 punpcklwd %xmm1, %xmm3 movdqu %xmm3, 0(%rax) leaq 16(%rax), %rax leaq 4(%rdx), %rdx leaq 4(%rsi), %rsi 10: testl $2, 20(%rdi) jz 9f pinsrw $0, 0(%rsi), %xmm1 punpcklwd %xmm1, %xmm1 pxor %xmm2, %xmm2 pinsrw $0, 0(%rdx), %xmm2 movdqu %xmm0, %xmm3 punpcklbw %xmm2, %xmm3 punpcklwd %xmm1, %xmm3 movq %xmm3, 0(%rax) leaq 8(%rax), %rax leaq 2(%rdx), %rdx leaq 2(%rsi), %rsi 9: testl $1, 20(%rdi) jz 8f movzx 0(%rsi), %ecx movd %ecx, %xmm1 punpcklwd %xmm1, %xmm1 movzx 0(%rdx), %ecx movd %ecx, %xmm2 movdqu %xmm0, %xmm3 punpcklbw %xmm2, %xmm3 punpcklwd %xmm1, %xmm3 movd %xmm3, 0(%rax) leaq 4(%rax), %rax leaq 1(%rdx), %rdx leaq 1(%rsi), %rsi 8: retq video_orc_pack_NV16: backup function : PASSED compiled function: PASSED and Running suite(s): videoscale ** (lt-videoscale:7357): ERROR **: videoconvert doesn't support format 'NV16' 98%: Checks: 51, Failures: 0, Errors: 1 elements/videoscale.c:75:E:general:test_template_formats:0: (after this point) Received signal 5 (Trace/breakpoint trap) FAIL: elements/videoscale I'll fix the videoconvert one, but it would be nice if people ran 'make check' from time to time..
Sorry about that, I'll fix the issue if you want. It's weird the test does not pass, I do use videoconvert to convert to and from NV16.
It was a bug in the videoscale unit test, and a misleading warning (code probably copied from videoconvert unit test): commit 1bc94d4aa3babbfa76da0599d93005755ae21748 Author: Tim-Philipp Müller <tim.muller@collabora.co.uk> Date: Thu May 16 11:09:11 2013 +0100 tests: ignore new NV16 format in videoscale unit test https://bugzilla.gnome.org/show_bug.cgi?id=700377 As for the unpack failure - maybe an orc bug?
From my understanding it seems to be a bug in orc emulation code. The test reports for the first line: 0 0: 70 b6d5 -> d5d570ff b6d570ff * source is y=70 uv=b6d5, compiled code yields b6d570ff (AYUV), emulation code yields d5d570ff which is wrong.
Fails (colors are wrong): ORC_CODE=emulate gst-launch-1.0 videotestsrc ! 'video/x-raw, format=NV16' ! videoconvert ! 'video/x-raw, format=I420' ! xvimagesink Passes: ORC_CODE=backup gst-launch-1.0 videotestsrc ! 'video/x-raw, format=NV16' ! videoconvert ! 'video/x-raw, format=I420' ! xvimagesink
This should fix the pack/unpack errors, the unpack function was wrong, I believe, and the pack function was the same as NV12. commit 97784b1563786533f4056327088c10cc4e9384ee Author: Wim Taymans <wim.taymans@collabora.co.uk> Date: Mon May 27 11:53:27 2013 +0200 video-format: fix NV16 unpack We can just use the NV12 functions, the only difference is the vertical subsampling.