GNOME Bugzilla – Bug 609801
[volume] Use sample accurate property values if a controller is used
Last modified: 2010-03-16 14:28:06 UTC
Hi, the attached patches add some helper API to libgstaudio for splitting audio buffers into specific intervals. This is usually a good idea if GstController is used to have smoother changes of the controlled properties. The current implementation of it counts the samples and makes sure that every N samples the properties are synced, while balancing the accumulative rounding errors. Buffer timestamps are then interpolated from the current buffer's start timestamp but are clamped to the current buffer's end timestamp. Future implementations can change this behaviour, for example by making sure that the timestamps are always at x, 2x, 3x, 4x, ... The second patch uses this in the volume element.
Created attachment 153677 [details] [review] API: Add helper function to split audio buffers into specific intervals
Created attachment 153678 [details] [review] volume: Use audio buffer splitting helper functions
Created attachment 153680 [details] [review] volume: Use audio buffer splitting helper functions
we really don't want to do that.. If you want sample accurate interpollation we don't want to split the buffer on each sample. We should use the controller to generate an array of values for each sample that we need to modify (with some interpolation) and then combine the arrays (this also allows us to use efficient matrix multiply with sse or something).
Created attachment 154061 [details] [review] volume: If a controller is used, use sample accurate property values Fixes bug #609801.
(In reply to comment #5) > Created an attachment (id=154061) [details] [review] > volume: If a controller is used, use sample accurate property values > > Fixes bug #609801. What about this instead? ;)
Created attachment 154062 [details] [review] volume: If a controller is used, use sample accurate property values Fixes bug #609801.
Created attachment 154063 [details] [review] volume: If a controller is used, use sample accurate property values Fixes bug #609801.
Created attachment 154066 [details] [review] volume: If a controller is used, use sample accurate property values Fixes bug #609801.
(In reply to comment #4) > we really don't want to do that.. If you want sample accurate interpollation we > don't want to split the buffer on each sample. We should use the controller to > generate an array of values for each sample that we need to modify (with some > interpolation) and then combine the arrays (this also allows us to use > efficient matrix multiply with sse or something). We want sample accurate interpollation, but not a new control value per sample. The controller is for control curves and those cause to high overhead if done on sample level. Having a max controlrate of e.g. 50 Hz is covering already exotic use cases. Most use cases would still be fine with 2-10Hz.
(In reply to comment #10) > (In reply to comment #4) > > we really don't want to do that.. If you want sample accurate interpollation we > > don't want to split the buffer on each sample. We should use the controller to > > generate an array of values for each sample that we need to modify (with some > > interpolation) and then combine the arrays (this also allows us to use > > efficient matrix multiply with sse or something). > > We want sample accurate interpollation, but not a new control value per sample. > The controller is for control curves and those cause to high overhead if done > on sample level. Having a max controlrate of e.g. 50 Hz is covering already > exotic use cases. Most use cases would still be fine with 2-10Hz. I agree that always doing sample accurate interpollation is a bit over the top for most use cases. The granularity should somehow be a decision of the application using the controller. At the volume level, however, I believe it should be possible to request an equal amount of control points as you have samples so that you can later optimize the multiplication (in volume case) with some straightforward sse/mmx paralel multiplies (as the latest patch seems to do). I don't think the volume element should try to split buffers or mess with the controller too much. How the controller then generates those points would depend on its configuration (it could duplicate the same value N times or it could do a simple linear interpollation between the calculated control points or...). It can't imagine that rendering a controller curve for N samples should be such an expensive operation. Even less so when the algorithm is allowed to take shortcuts based on the configured granularity. I might be wrong, though...
Now more comments on the actual patch - it is nicely done. But honestly I don't think we want to do it like this for audio. For video we can sync on each frame. For audio we want to sync at a similar interval. process_controlled_* ... for (i = 0; i < num_samples; i++) { for (j = 0; j < channels; j++) { if (*mute) { *data++ = 0; } else { gdouble val = *data * (*volume) + 0.5; *data++ = (gint8) CLAMP (val, VOLUME_MIN_INT8, VOLUME_MAX_INT8); } } volume++; mute++; } lets at least move the "if (*mute)" check out of the channel loop. Otherwise this is okay. Compared to the uncontrolled loops we have the overhead for nested loops and interating over 3 arrays. In addition we have the overhead for filling the 2 control value arrays all the time. SO the controller overhead is larger than the actual operation. I see the control-value array functionality more suitable for fetching the control curse for display purposes in the UI. in volume_before_transform: * shouldn't the GstControlSource object be taken in e.g. volume_setup() - this doesn't change while running. * also filling the self->mutes/volumes with default values needs to be done when the size has changes and not on every buffer.
In reply to comment #11. I agree that if the inner loop is multiplying the samples with a const value or with one vector, then the speed difference might be acceptable. To get there we would need: * process_controlled() to take only one volume array * thus if mute is also controlled, merge both: volume=(1-mute)*volume; * some means to avoid updating control value array unless there is a change
(In reply to comment #12) > Now more comments on the actual patch - it is nicely done. But honestly I don't > think we want to do it like this for audio. For video we can sync on each > frame. For audio we want to sync at a similar interval. > > process_controlled_* > ... > for (i = 0; i < num_samples; i++) { > for (j = 0; j < channels; j++) { > if (*mute) { > *data++ = 0; > } else { > gdouble val = *data * (*volume) + 0.5; > > *data++ = (gint8) CLAMP (val, VOLUME_MIN_INT8, VOLUME_MAX_INT8); > } > } > volume++; > mute++; > } > > lets at least move the "if (*mute)" check out of the channel loop. Otherwise > this is okay. Compared to the uncontrolled loops we have the overhead for > nested loops and interating over 3 arrays. In addition we have the overhead for > filling the 2 control value arrays all the time. SO the controller overhead is > larger than the actual operation. Yes, that can be optimized of course. We could even get rid of the mute array by first combining it with the volume array. > I see the control-value array functionality more suitable for fetching the > control curse for display purposes in the UI. > > in volume_before_transform: > > * shouldn't the GstControlSource object be taken in e.g. volume_setup() - this > doesn't change while running. It doesn't? It can change at every point in time unfortunately, it could even change while we're processing a buffer. Nothing is preventing the application from doing that. > * also filling the self->mutes/volumes with default values needs to be done > when the size has changes and not on every buffer. Because the control source can change for every buffer it is necessary unfortunately. For one buffer both could be there, for the next only the volume csource could be there. (In reply to comment #13) > In reply to comment #11. I agree that if the inner loop is multiplying the > samples with a const value or with one vector, then the speed difference might > be acceptable. To get there we would need: > * process_controlled() to take only one volume array > * thus if mute is also controlled, merge both: volume=(1-mute)*volume; Yes, that would probably be a good idea, see above ;) If the generic approach here is accepted I can optimize this kind of things > * some means to avoid updating control value array unless there is a change the GstController API definitely needs some improvements and cleanups, yes ;) Most of that can only be done in 0.11 though... Another way of lowering the overhead would be to generate a property value for every second sample or for every n-samples. That could be combined with the control-rate of the GstController somehow. But it's definitely better to do this way than to sync the properties (including notify signals and stuff) multiple times for every buffer, that has a bigger overhead ;) (In reply to comment #11) > It can't imagine that rendering a controller curve for N samples should be > such an expensive operation. Even less so when the algorithm is allowed to > take shortcuts based on the configured granularity. I might be wrong, though... Well, it's quite expensive. Let's take the GstInterpolationControlSource in linear interpolation mode for example (cubic is even more expensive of course, same goes for the GstLFOControlSource). For every position it currently searches inside a binary tree (GSequence to be exact), gets the current and next value, interpolates, returns it. Doing that for every sample is probably not the best idea :) Yes, for generating a row of values there could be a single binary tree search and incremental searches for the next value but that's still quite expensive.
Created attachment 154119 [details] [review] volume: If a controller is used, use sample accurate property values Fixes bug #609801.
(In reply to comment #15) > Created an attachment (id=154119) [details] [review] > volume: If a controller is used, use sample accurate property values > > Fixes bug #609801. This optimizes a bit already. Still I don't know how this should be properly optimized by MMX/SSE because there's a variable number of channels and double-to-something conversions are needed. Wim, Stefan?
This looks nicer. For the future, I wonder if we can add some shortcut to the controller api where it would tell, that there is no control-change in the requested array (and thus all fields would have the same value).
Ok, shall I push this after 0.10.27 release or do you want some things fixed first? Like optimizing the get_value_array() of the interpolation control source?
Sebastian, okay to push from my side. If its not asked too much could you do a before and after performance meassurement, so that we get an idea of huw big the overhead is?
Created attachment 154331 [details] test.c Ok, here's a small benchmark that processes 400s of audio, runs this 5 times and calculates the average and standard deviation. It's about 15.5 times slower now :) 0.400000 -> 6.198000
94% of the time is spent in gst_control_source_get_value_array(), most of them time in g_sequence_search(). That's easy to optimize and I'll do that before pushing this.
Ok, I have it down to 1.37 times slower for none interpolation. I guess that's acceptable :) I'll optimize it everywhere else and do some benchmarks for other interpolation modes.
Created attachment 154404 [details] [review] interpolationcontrolsource: Optimize get_value_array() This makes it >10x faster if more than a single value is requested by not searching in the GSequence for every value and converting the value from GValue to the real value type.
Created attachment 154426 [details] [review] lfocontrolsource: Optimize get_value_array() Don't convert from GValue to the actual type for every single value.
Created attachment 154427 [details] [review] interpolationcontrolsource: Optimize get_value_array() This makes it >10x faster if more than a single value is requested by not searching in the GSequence for every value and converting the value from GValue to the real value type.