GNOME Bugzilla – Bug 631200
flacparse: major performance improvements
Last modified: 2010-10-23 09:05:33 UTC
These are the patches in gst-maemo-xiph that look like the can be somewhat acceptable in GStreamer in one form or the other. I wasn't really intending push them here, so there might be some horrible things for GStreamer guidelines. Also, I didn't test these on top of the latest g-p-bad, so take these with a grain of salt.
Created attachment 171594 [details] [review] trivial caps fix
Created attachment 171595 [details] [review] delay tags push
Created attachment 171596 [details] [review] fix picture parsing
Created attachment 171597 [details] [review] get rid of compilation warnings
Created attachment 171598 [details] [review] remove useless stuff You would probably not like it.
Created attachment 171599 [details] [review] refactor into frame_header_is_valid()
Created attachment 171600 [details] [review] rewrite sync matching
Created attachment 171601 [details] [review] tidying up
Created attachment 171602 [details] [review] simplify frame_header_is_valid
Created attachment 171603 [details] [review] fix some warnings
Created attachment 171604 [details] [review] we don't need no headers
BTW. It took me a full 10 minutes to create this bug report; git send-email would have taken a few seconds.
Thanks for the patches, I'll take a look at them later. You could use git-bz to create bugreports like this, this should only take a few seconds too ;) One thing that should be done additionally to these patches here is, to optimize GstBitReader but converting the get_bits functions to macros and to provide unchecked variants. This would be similar to what was done with GstByteReader some time ago.
(In reply to comment #13) > Thanks for the patches, I'll take a look at them later. You could use git-bz to > create bugreports like this, this should only take a few seconds too ;) Not really. Maybe it would help if it actually worked: http://felipec.wordpress.com/2010/01/19/why-bugzilla-sucks-for-handling-patches/
Well, let's not argue about that here (and FWIW I'm successfully using git-bz). From a short look at your patches you're still parsing the complete FLAC frames. Didn't you say yesterday on IRC that you're only looking for a) a sync marker and b) that a valid frame header follows and then simply push everything until the next valid sync marker/frame header downstream? And what's the reason for dropping the optional CRC16 frame check (which defaults to FALSE)? I added it some time ago and planned to post a message on the bus for broken frames to provide a tool for checking FLAC files for corruptions. Other than that the patches look good from a first glance... will merge them later and do a deeper review then. After optimizing GstBitReader a bit :)
(In reply to comment #15) > Well, let's not argue about that here (and FWIW I'm successfully using git-bz). Yes, some project's bugzillas are easier to setup than others. But just the though of re-basing my patches in case I need configurations not supported scares me away. > From a short look at your patches you're still parsing the complete FLAC > frames. Didn't you say yesterday on IRC that you're only looking for a) a sync > marker and b) that a valid frame header follows and then simply push everything > until the next valid sync marker/frame header downstream? Yes: (In reply to comment #7) > Created an attachment (id=171600) [details] [review] > rewrite sync matching See: * get_next_sync -> frame_header_is_valid > And what's the reason for dropping the optional CRC16 frame check (which > defaults to FALSE)? I added it some time ago and planned to post a message on > the bus for broken frames to provide a tool for checking FLAC files for > corruptions. I was porting the parsing code and I didn't want to port that, I don't find it useful anyway. As I said, I wasn't intending to post the patches.
This might be interesting for you: commit dd762eb49f8e04f9f6fee84d9a00265e0410798f Author: Sebastian Dröge <sebastian.droege@collabora.co.uk> Date: Sun Oct 3 15:27:37 2010 +0200 bitreader: Add inlined and unchecked versions of the important functions API: gst_bit_reader_skip_unchecked API: gst_bit_reader_skip_to_byte_unchecked API: gst_bit_reader_get_bits_uint16_unchecked API: gst_bit_reader_get_bits_uint32_unchecked API: gst_bit_reader_get_bits_uint64_unchecked API: gst_bit_reader_get_bits_uint8_unchecked API: gst_bit_reader_peek_bits_uint16_unchecked API: gst_bit_reader_peek_bits_uint32_unchecked API: gst_bit_reader_peek_bits_uint64_unchecked API: gst_bit_reader_peek_bits_uint8_unchecked This alone makes flacparse about 3 times faster. Next I'll get to your patches :)
Comment on attachment 171595 [details] [review] delay tags push This was already fixed some time ago
(In reply to comment #4) > Created an attachment (id=171597) [details] [review] > get rid of compilation warnings What are the warnings you're trying to fix here?
Comment on attachment 171598 [details] [review] remove useless stuff Yes, I don't :) The property was added to add a tool for checking if a FLAC file is valid and it's disabled by default. The requested_frame_size calculations everywhere are there for giving baseparse a better estimate of the possible framesize by looking at the minimum/maximum frame sizes in the flac streaminfo, etc. The gotos are there to keep the error cases outside the normal code flow.
Comment on attachment 171604 [details] [review] we don't need no headers The header buffers must be pushed to allow pipelines like ... ! flacparse ! filesink Otherwise we're dropping information from the flac file
Comment on attachment 171599 [details] [review] refactor into frame_header_is_valid() This patch uses C99 features but the idea of not parsing every frame twice is good. I'll commit something similar
(In reply to comment #19) > (In reply to comment #4) > > Created an attachment (id=171597) [details] [review] [details] [review] > > get rid of compilation warnings > > What are the warnings you're trying to fix here? -Wall -Wextra -Wno-unused-parameter
(In reply to comment #20) > (From update of attachment 171598 [details] [review]) > Yes, I don't :) The property was added to add a tool for checking if a FLAC > file is valid and it's disabled by default. Which is useless most of the time. I would have a separate tool for that, a separate branch, or at least add G_UNLIKELY. > The requested_frame_size > calculations everywhere are there for giving baseparse a better estimate of the > possible framesize by looking at the minimum/maximum frame sizes in the flac > streaminfo, etc. It's not used at all, anywhere. Not on flacparse, not on baseparse... nowhere. > The gotos are there to keep the error cases outside the normal code flow. They do two things, print a message on EOS, and need streaminfo, and return -1 or -2 depending on that, but then, that FALSE is returned in gst_flac_parse_check_valid_frame() regardless. In my resulting code EOS is still printed, so nothing is lost, only the "need streaminfo" message, which is useless IMO; I never saw that, can be done much more easily with a single 'if' at the top, and I don't think it matters after the reorganization.
(In reply to comment #21) > (From update of attachment 171604 [details] [review]) > The header buffers must be pushed to allow pipelines like > ... ! flacparse ! filesink > > Otherwise we're dropping information from the flac file That information is already in the caps.
(In reply to comment #18) > (From update of attachment 171595 [details] [review]) > This was already fixed some time ago Where? In 65f620a? From what I can see you pushed that about the same time you made that comment. And that commit message is very bad; it doesn't answer the question 'why is this needed?' It also doesn't mention any attribution... where did you get the idea from? And did you test this?
(In reply to comment #26) > (In reply to comment #18) > > (From update of attachment 171595 [details] [review] [details]) > > This was already fixed some time ago > > Where? In 65f620a? From what I can see you pushed that about the same time you > made that comment. > > And that commit message is very bad; it doesn't answer the question 'why is > this needed?' It also doesn't mention any attribution... where did you get the > idea from? And did you test this? No idea when this was fixed but in latest GIT (before 65f620a) the tags were already pushed not immediately but after the newsegment event and after the caps were known. I only changed it to push the tags *before* the header buffers.
commit 812075dc5db7c479a367815e6f386bf63d84504d Author: Sebastian Dröge <sebastian.droege@collabora.co.uk> Date: Thu Oct 7 23:37:36 2010 +0200 flacparse: Simplify frame header parsing by using lookup tables Based on a patch by Felipe Contreras. See bug #631200. commit 64407ca94b3295bf88dd491d79eeb3aae9b4686a Author: Sebastian Dröge <sebastian.droege@collabora.co.uk> Date: Thu Oct 7 23:28:08 2010 +0200 flacparse: Don't parse the complete FLAC frames but only look for valid fram Thanks to Felipe Contreras for the suggestion. This is partially based on his patches and makes flacparse more than 3.5 times faster. Looking for valid frame headers is unlikely to give false positives because every frame header is at least 9 bytes long, contains a 14 bit sync code and a 8 bit checksum over the first 8 bytes. Fixes bug #631200.
Comment on attachment 171597 [details] [review] get rid of compilation warnings Except the GValue initializations this is fixed now. I don't see what's wrong with these initializations and GLib and GStreamer is doing that *everywhere*
Comment on attachment 171600 [details] [review] rewrite sync matching Something similar is committed now.
Comment on attachment 171603 [details] [review] fix some warnings I don't see anything wrong with this code either. a) Every list node always contains a valid buffer reference b) Casting gst_mini_object_unref() to a GFunc is valid and also passing the buffer and NULL to the function is valid because the C calling conventions say that the caller has to clean up the function parameters.
(In reply to comment #29) > (From update of attachment 171597 [details] [review]) > Except the GValue initializations this is fixed now. I don't see what's wrong > with these initializations and GLib and GStreamer is doing that *everywhere* That doesn't make it right. See bug #577231.
(In reply to comment #31) > (From update of attachment 171603 [details] [review]) > I don't see anything wrong with this code either. > > a) Every list node always contains a valid buffer reference Not true, there are some error conditions where this doesn't happen. I know because I saw the warnings of gst_mini_object_unref(), which why I bothered writing the patch. BTW. gst_mini_object_unref has a silly API, should be: void gst_mini_object_unref(void *mini_object); And should accept NULL, that would make it really useful, similar to free(), and g_object_unref().
For the record: % git log --author=felipe.contreras -- gst/audioparsers Shows basically nothing. Way to incentivize contributors. I have more luck contributing patches to the linux kernel, where they: a) actually review the patches b) comment the exact changes needed, or what in general would they do c) give a chance to the contributor to do the changes themselves d) leave the original author in the commit So, in case this wasn't clear; this process was not rewarding at all.
(In reply to comment #32) > (In reply to comment #29) > > (From update of attachment 171597 [details] [review] [details]) > > Except the GValue initializations this is fixed now. I don't see what's wrong > > with these initializations and GLib and GStreamer is doing that *everywhere* > > That doesn't make it right. > > See bug #577231. Everything after the first 0 will be initialized to 0 too, that warning only means that you didn't explicitely initialize the remaining struct members. (In reply to comment #33) > (In reply to comment #31) > > (From update of attachment 171603 [details] [review] [details]) > > I don't see anything wrong with this code either. > > > > a) Every list node always contains a valid buffer reference > > Not true, there are some error conditions where this doesn't happen. Which cases? > I know because I saw the warnings of gst_mini_object_unref(), which why I > bothered writing the patch. > > BTW. gst_mini_object_unref has a silly API, should be: > void gst_mini_object_unref(void *mini_object); There's a gst_buffer_unref() for the correct parameter type. > And should accept NULL, that would make it really useful, similar to free(), > and g_object_unref(). g_object_unref() does not accept NULL, neither does free() (in general). g_free() accepts NULL, yes. (In reply to comment #34) > For the record: > % git log --author=felipe.contreras -- gst/audioparsers > > Shows basically nothing. Way to incentivize contributors. I committed two of your patches which could be included. The caps fix and the picture parsing fix. I rejected other patches because they were not necessary (as explained above) or the large rewrite of the parsing because you added unrelated changes and also used C99 features everywhere. Instead I've done the rewrite of the parsing myself based on your idea, which was done faster than splitting your patches, cleaning things up, etc. But if you look at the commit message of that change you'll see your name. > I have more luck contributing patches to the linux kernel, where they: > a) actually review the patches Not sure why you think nobody reviewed your patches > b) comment the exact changes needed, or what in general would they do ... and told you what was wrong with them or why I think they were not necessary. > c) give a chance to the contributor to do the changes themselves You complained that the patch submitting process wasted too much time for you and that you didn't plan to submit the patches at all and also thought that they might not be good for inclusion yet. I assumed that you don't plan to improve them and as such I've simply fixed your patches myself (or have done them in a way that is acceptable for inclusion) > d) leave the original author in the commit And if you took at look at the commit message you would see that the rewritten change still says that it's based on your patch
(In reply to comment #35) > (In reply to comment #32) > > (In reply to comment #29) > > > (From update of attachment 171597 [details] [review] [details] [details]) > > > Except the GValue initializations this is fixed now. I don't see what's wrong > > > with these initializations and GLib and GStreamer is doing that *everywhere* > > > > That doesn't make it right. > > > > See bug #577231. > > Everything after the first 0 will be initialized to 0 too, that warning only > means that you didn't explicitely initialize the remaining struct members. I know that, but -Wextra is pretty essential. > (In reply to comment #33) > > (In reply to comment #31) > > > (From update of attachment 171603 [details] [review] [details] [details]) > > > I don't see anything wrong with this code either. > > > > > > a) Every list node always contains a valid buffer reference > > > > Not true, there are some error conditions where this doesn't happen. > > Which cases? I don't know, invalid clips? > > I know because I saw the warnings of gst_mini_object_unref(), which why I > > bothered writing the patch. > > > > BTW. gst_mini_object_unref has a silly API, should be: > > void gst_mini_object_unref(void *mini_object); > > There's a gst_buffer_unref() for the correct parameter type. Yeah, but that's why g_object_unref accepts a void, so you don't need the correct parameter type. > > And should accept NULL, that would make it really useful, similar to free(), > > and g_object_unref(). > > g_object_unref() does not accept NULL, neither does free() (in general). > g_free() accepts NULL, yes. 1) g_object_unref() receives a void 2) free accepts NULL See the C standard on "7.20.3.2 The free function": If ptr is a null pointer, no action occurs. > (In reply to comment #34) > > For the record: > > % git log --author=felipe.contreras -- gst/audioparsers > > > > Shows basically nothing. Way to incentivize contributors. > > I committed two of your patches which could be included. The caps fix and the > picture parsing fix. > I rejected other patches because they were not necessary (as explained above) > or the large rewrite of the parsing because you added unrelated changes and > also used C99 features everywhere. > > Instead I've done the rewrite of the parsing myself based on your idea, which > was done faster than splitting your patches, cleaning things up, etc. But if > you look at the commit message of that change you'll see your name. That's not enough. I would have expected you to add review comments so that I could do the changes and remain the author of the patches, that would give the appropriate attribution. Now you remain as the author of the patch. Even though I'm mentioned on the commit message as making the "suggestion", nobody will really see my name, not on the log, or statistics of any kind: https://www.ohloh.net/p/gstreamer/contributors > > I have more luck contributing patches to the linux kernel, where they: > > a) actually review the patches > > Not sure why you think nobody reviewed your patches By "review", I mean go through the code, and as you go, make comments about it. Not keep the comments to yourself, and afterwards explain why you rewrote, or dropped the patch. > > b) comment the exact changes needed, or what in general would they do > > ... and told you what was wrong with them or why I think they were not > necessary. Mentioning what was wrong after rewriting the code is not the same as comment on my patch itself. > > c) give a chance to the contributor to do the changes themselves > > You complained that the patch submitting process wasted too much time for you > and that you didn't plan to submit the patches at all and also thought that > they might not be good for inclusion yet. I assumed that you don't plan to > improve them and as such I've simply fixed your patches myself (or have done > them in a way that is acceptable for inclusion) Well, I did send the patches, didn't I? Maybe I was giving a chance to the process. Moreover, that's not an excuse to not follow a process that keeps good attribution (if you even had such process in place). I would have expected you to follow this process, and only if I say I'm not interested on doing these changes myself you go ahead. Just like you should do with everybody else. In fact, even if you rewrite the patch, in a good review process you would post the patch here, and only after I had reviewed the patch too, you commit it with my signed-off-by. > > d) leave the original author in the commit > > And if you took at look at the commit message you would see that the rewritten > change still says that it's based on your patch "Thanks to Felipe Contreras for the suggestion." Right, that gives a lot of credit. That's very far from giving me a chance to be the author of the patch, which in a way, I am. Anyway, I have complained about the patch review and contribution process, and what happened here is a good example of why it doesn't work. All I'm saying is that FWIW, this was not rewarding, and that's a fact.