Bug 675772 – byte_reader + adapter: _masked_scan_uin32() could be optimized

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 675772 - byte_reader + adapter: _masked_scan_uin32() could be optimized


Summary:	byte_reader + adapter: _masked_scan_uin32() could be optimized


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gstreamer (core)
Version:	1.x
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	1.2.0
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2012-05-09 18:56 UTC by Nicolas Dufresne (ndufresne)
Modified:	2014-12-22 15:16 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Nicolas Dufresne (ndufresne) 2012-05-09 18:56:02 UTC

Currently the scan will rotate byte into the state and recheck the state if it match the masked and value.

While this ensure that no more then 1 load is done per-byte we can do better then that. If one byte does not match mask in any of word from it's current location to the first, it means we can directly skip over that byte, allowing bigger step. This would mean we would do fewer load, but increase the bit masked + check. As bit operation are much cheaper then load, it should give a good speed up. Note keeping in mind that adding many branches could make it as bad on certain CPU.

Comment 1 Nicolas Dufresne (ndufresne) 2013-06-18 21:50:11 UTC

It was proposed in bug #702357 to simply add few optimized patch, like the mpeg case, where mask is 0xFFFFFF00 and value 0x00000100. I think this is a simple, and safe solution.

It is likely that there exist a generalisation for the algorithm found in mpegvideoparse, but I didn't got time to look into it, yet.

Comment 2 Wim Taymans 2013-06-19 15:00:24 UTC

(In reply to comment #1)
> It is likely that there exist a generalisation for the algorithm found in
> mpegvideoparse, but I didn't got time to look into it, yet.

It is using a specialization of http://en.wikipedia.org/wiki/Boyer_moore

Comment 3 Nicolas Dufresne (ndufresne) 2013-06-21 02:36:08 UTC

I just learned that by reading gstrtph264pay next_start_code function. Are you having a look, or shall I take care ?

Comment 4 Wim Taymans 2013-06-21 08:11:21 UTC

(In reply to comment #3)
> I just learned that by reading gstrtph264pay next_start_code function. Are you
> having a look, or shall I take care ?

Please go ahead, but I don't think it makes a lot of sense to implement Boyer-moore there...

Comment 5 Nicolas Dufresne (ndufresne) 2013-06-21 14:37:15 UTC

(In reply to comment #4)
> (In reply to comment #3)
> > I just learned that by reading gstrtph264pay next_start_code function. Are you
> > having a look, or shall I take care ?
> 
> Please go ahead, but I don't think it makes a lot of sense to implement
> Boyer-moore there...

Agreed, the cost of generating the tables is probably too high, I had a simplier version in mind, just need to implement it.

Comment 6 Nicolas Dufresne (ndufresne) 2013-08-10 23:11:36 UTC

I have experimented with ageneric solution:

see http://cgit.collabora.com/git/user/nicolas/gstreamer.git/log/?h=boyer-moore

But this endup being slower. Instead it was decided to optimize certain know case. The H264/MPEG/VC1 case has been fixed:

commit de75bca9b387c404a29cb057046d0db1d601794c
Author: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Date:   Thu Aug 8 12:08:31 2013 +0200

    bytereader: Accelerate MPEG/H264 start code scanning

    Accelerate MPEG/H264 start code scanning using Boyer-Moor bad character
    heuristic.

    https://bugzilla.gnome.org/show_bug.cgi?id=702357

I think we can also optimizing the cases with mask 0xFFFFFFFF and 0xFFFFFF00 (found in Matroska), where generating table of offset would be possible since we don't need to mask each bytes. Thus I'll keep this open until I'm done experimenting.

p.s. the Adapter version has not been optimized since it's not used for 0xffffff00/0x00000100 start codes.

Comment 7 Nicolas Dufresne (ndufresne) 2014-12-22 15:16:44 UTC

Nearly a year without any more complaint on performance. I think it's fine to close this.