GNOME Bugzilla – Bug 731442
bytereader: optimize _scan_for_start_code() using pointer access
Last modified: 2014-06-11 00:06:16 UTC
Created attachment 278185 [details] [review] 001_gstbytereader_improved_by_pointer_access.patch This is a sub thread of the following: https://bugzilla.gnome.org/show_bug.cgi?id=730783 Currently the scan uses Boyer-moore method and its performance is good. but, it can be optimized from an implementation of view. The original scan code is implemented by byte array and index-based access. In _scan_for_start_code(), the index is increasing from start to end and the base address of the byte array is referred to as return value. In the case, index-based access can be replaced by pointer access, which improve the performance by removing index-related operations. Its performace is enhanced by approximately 8% on arm-based embedded devices. Although it seems trivial, it can affect the overall performance because the _scan_for_start_code() function is very often called when H.264/H.265 video is played. In addition, the technique can apply for all architectures and it is good in view of readability and maintainability. I attached a patch file. If you have a problem or question, let me know it. After reviewing this, please apply it to git repository.
Interesting. I would have expected the compiler to do that optimization. Anyway, seems to also produce decent speedup on x86-64 (-O2) also.
Review of attachment 278185 [details] [review]: Looks good to me, I indeed thought the compiler was smarter.
Comment on attachment 278185 [details] [review] 001_gstbytereader_improved_by_pointer_access.patch commit d3b2f6e4b85a45213ed2878f411f7c66ebb15e33 Author: Sungho Bae <baver.bae@lge.com> Date: Tue Jun 10 09:35:38 2014 -0400 bytereader: Use pointer instead of index access Currently the scan uses Boyer-moore method and its performance is good. but, it can be optimized from an implementation of view. The original scan code is implemented by byte array and index-based access. In _scan_for_start_code(), the index is increasing from start to end and the base address of the byte array is referred to as return value. In the case, index-based access can be replaced by pointer access, which improve the performance by removing index-related operations. Its performace is enhanced by approximately 8% on arm-based embedded devices. Although it seems trivial, it can affect the overall performance because the _scan_for_start_code() function is very often called when H.264/H.265 video is played. In addition, the technique can apply for all architectures and it is good in view of readability and maintainability. https://bugzilla.gnome.org/show_bug.cgi?id=731442
Thanks for you patch and time. It would be appreciated if you commit locally and use "git format-patch -1" next time. This will generate a patch with the commit log and author set correctly. It saves us a bit a time.
(In reply to comment #1) > Interesting. I would have expected the compiler to do that optimization. > > Anyway, seems to also produce decent speedup on x86-64 (-O2) also. I expected the compiler to do it. but, it hasn't yet supported. Thanks for your interest and test on x86-64. (In reply to comment #4) > Thanks for you patch and time. It would be appreciated if you commit locally > and use "git format-patch -1" next time. This will generate a patch with the > commit log and author set correctly. It saves us a bit a time. Thanks for your advice and review on my patch. I will use "git format-patch -1" next time.