Bug 336366 – Fast factor 2 resampling

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 336366 - Fast factor 2 resampling


Summary:	Fast factor 2 resampling


Status:	RESOLVED FIXED

Product:	beast
Classification:	Other
Component:	bse
Version:	SVN trunk
Hardware:	Other All

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	Stefan Westerfeld
QA Contact:	Beast Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2006-03-28 16:12 UTC by Stefan Westerfeld
Modified:	2006-10-21 23:52 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Header of standalone test program. (14.09 KB, text/x-c++hdr) 2006-03-28 16:14 UTC, Stefan Westerfeld		Details
Coefficient file for standalone test (2.49 KB, text/x-objchdr) 2006-03-28 16:15 UTC, Stefan Westerfeld		Details
Standalone test program (11.26 KB, text/x-c++src) 2006-03-28 16:17 UTC, Stefan Westerfeld		Details
Shell script to run a bunch of tests (166 bytes, application/x-shellscript) 2006-03-28 16:18 UTC, Stefan Westerfeld		Details
Resampling datahandle first prototype (7.02 KB, text/x-c++src) 2006-03-28 16:19 UTC, Stefan Westerfeld		Details
Resampling datahandle header (1.21 KB, text/x-diff) 2006-03-28 16:20 UTC, Stefan Westerfeld		Details
Resampling datahandle testcode (2.44 KB, text/x-c++src) 2006-03-28 16:25 UTC, Stefan Westerfeld		Details
Upsampling datahandle mostly complete (11.10 KB, text/x-c++src) 2006-04-04 16:39 UTC, Stefan Westerfeld		Details
Resampling datahandle new header (1.23 KB, text/x-diff) 2006-04-04 16:41 UTC, Stefan Westerfeld		Details
Resampling datahandle new testcode (4.94 KB, text/x-c++src) 2006-04-04 16:43 UTC, Stefan Westerfeld		Details
C++ Resampler code header (14.17 KB, text/x-c++hdr) 2006-04-10 16:14 UTC, Stefan Westerfeld		Details
C++ Resampler code implementation (2.14 KB, text/x-c++src) 2006-04-10 16:15 UTC, Stefan Westerfeld		Details
New standalone test program (11.32 KB, text/x-c++src) 2006-04-10 16:16 UTC, Stefan Westerfeld		Details
New upsampling datahandle (11.00 KB, text/x-c++src) 2006-04-10 16:18 UTC, Stefan Westerfeld		Details
C++ Resampler code header (14.69 KB, text/x-c++hdr) 2006-04-12 14:48 UTC, Stefan Westerfeld		Details
C++ Resampler code implementation (2.12 KB, text/x-c++src) 2006-04-12 14:49 UTC, Stefan Westerfeld		Details
Standalone test program (11.34 KB, text/x-c++src) 2006-04-12 14:49 UTC, Stefan Westerfeld		Details
Resampler integration patch into CVS head (6.92 KB, patch) 2006-05-05 22:03 UTC, Stefan Westerfeld	none	Details \| Review
Resampling datahandle testcode (7.04 KB, text/x-c++src) 2006-05-05 22:06 UTC, Stefan Westerfeld		Details
C++ Resampler code header (3.98 KB, text/x-c++hdr) 2006-05-05 22:11 UTC, Stefan Westerfeld		Details
C++ Resampler code implementation: TCC file (15.61 KB, text/x-c++src) 2006-05-05 22:32 UTC, Stefan Westerfeld		Details
C++ Resampler code implementation: CC file (5.49 KB, text/x-c++src) 2006-05-05 22:34 UTC, Stefan Westerfeld		Details
"Standalone" test program (11.52 KB, text/x-c++src) 2006-05-05 22:36 UTC, Stefan Westerfeld		Details
Resampler C API testcode (2.47 KB, text/x-csrc) 2006-05-05 22:38 UTC, Stefan Westerfeld		Details
Upsampling datahandle (7.79 KB, text/x-c++src) 2006-05-05 22:43 UTC, Stefan Westerfeld		Details

Description Stefan Westerfeld 2006-03-28 16:12:03 UTC

I have worked quite a bit now on writing code for fast factor two resampling 
based on FIR filters. These are be useful for implementing factor 2 
up-/downsampling datahandles; maybe for other uses in BEAST, too.

Comment 1 Stefan Westerfeld 2006-03-28 16:14:21 UTC

Created attachment 62218 [details]
Header of standalone test program.

The actual template implementation of the SSE resampling code.

In BSE, the identical file can be used, but will be called bsessefir.hh.

Comment 2 Stefan Westerfeld 2006-03-28 16:15:08 UTC

Created attachment 62219 [details]
Coefficient file for standalone test

Comment 3 Stefan Westerfeld 2006-03-28 16:17:04 UTC

Created attachment 62220 [details]
Standalone test program

Compile me with

g++ -O3 -funroll-loops -o ssefir ssefir.cc

You may need extra flags for enabling architecture specific SSE instructions (the above line works on AMD64, because there is no AMD64 without SSE).

Comment 4 Stefan Westerfeld 2006-03-28 16:18:08 UTC

Created attachment 62221 [details]
Shell script to run a bunch of tests

Comment 5 Stefan Westerfeld 2006-03-28 16:19:26 UTC

Created attachment 62222 [details]
Resampling datahandle first prototype

Requires ssefir.h to be inside the bse directory of your BEAST checkout, named bsessefir.hh.

Comment 6 Stefan Westerfeld 2006-03-28 16:20:20 UTC

Created attachment 62225 [details]
Resampling datahandle header

Comment 7 Stefan Westerfeld 2006-03-28 16:25:27 UTC

Created attachment 62226 [details]
Resampling datahandle testcode

This test is currently only testing reading the whole data from beginning to end linearily once, because thats the only thing the first implementation can do.

Comment 8 Stefan Westerfeld 2006-04-04 16:39:57 UTC

Created attachment 62744 [details]
Upsampling datahandle mostly complete

The upsampling datahandle is now mostly complete. Seeking, more than 1 channel and different precisions are supported.

The file contains the class SseFir::Resampler2 too, which is a second try to design an easy public API for the resampling classes, and some implementation bits as well as coefficients. All these parts should probably go into seperate files.

Comment 9 Stefan Westerfeld 2006-04-04 16:41:30 UTC

Created attachment 62746 [details]
Resampling datahandle new header

Added precision argument to the API.

Comment 10 Stefan Westerfeld 2006-04-04 16:43:10 UTC

Created attachment 62747 [details]
Resampling datahandle new testcode

Tests reading the handle twice (not only once) now, and some random seeks. Supports testing difference precisions, currently 8 and 16 bit.

Comment 11 Tim Janik 2006-04-07 10:47:08 UTC

hm, lots of attachments and for two different things. this is a bit of a mess.
lets focus on getting the FIR code into BSE first, and only after that tackle the datahandles (maybe even in a different bug report), ok?

as i understand, you want to add ssefir-4.h and coeffs-3.h to stock BSE, right?
so, a few comments on those headers first:
- C++ header files are named .hh throughout the beast project
- coeffs-3.h lacks proper namespacing and #ifdef guards. if it's contents aren't included into ssefir-4.h, it should be setup as a normal header files. is there any reason to not include it with the FIR code?
- the "sse" syllable used all over the place is actually a gross misnomer. i don't see anything in the code that makes it SSE instruction set dependent (and that is good the way it is ;). instead, the code uses the GCC vectorizer, so can be compiled into multiple instruction sets. consequently, the files, API and namespaces should be renamed, e.g. to Bse::Resampler.
- please honour the general coding style rules of the beast project, namely to put spaces around mathematical operators like +,/,-,*,%,&,| etc.
- also, opening paranthesis are space prefixed like this: "foo ();"
- in general, normal C code is written in lower case, typenames in mixed case and enum values (some constants) and macros in upper case. your function fir_process_one_sample() should thus take an argument "n", not "N".
- fir_process_one_sample() sums into a double accumulator but returns a float. that's a loss of precision if double output is actually wanted, and incurres extra double->float conversions if the function call is properly inlined and e.g. it's return value reused in double precision context. the best way here is probably to make it a template that requires an explicit <float> or <double> specification upon funciton calls.
- why does the array argument to fir_compute_taps_sse() have a "no" prefix? is it not an sse_taps array? 
- assertions used in beast code should be g_assert() or BIRNET_ASSERT(). in non-public header files and .c/.cc files, the shorthand ASSERT() may also be used. assert.h or cassert should *not* be included.
- malloc() also should not be used in BEAST, g_malloc(), g_slice_new() and new/delete are available instead. 
- the whole notion of an AlignedMem class in the FIR code seems wrong to me anyway. this needs to be moved to some C++ utility files (probably in birnet), and should get special casing to use g_slice if possible or memalign.
- use of memcpy() requires inclusion of string.h
- for portability reasons, g_memmove should be used instead of memmove().
- /* please write 1line comments like this */
- and only for documentation comments that start with "/**" should the first line of a comment be left blank.

there maybe other small issues with the code but that can be handled later. for a start it'd be good to have the above comments integrated/fixed, and then get the code reviewed again and hopefully included.

Comment 12 Stefan Westerfeld 2006-04-10 16:07:14 UTC

(In reply to comment #11)
> hm, lots of attachments and for two different things. this is a bit of a mess.
> lets focus on getting the FIR code into BSE first, and only after that tackle
> the datahandles (maybe even in a different bug report), ok?

well, the datahandles code provided me with ideas on how to refactor the other
code; also the fact that we need two seperate object files, one containing SSE
instructions and one without, should be taken into account when committing the
code. finally the discussion on the list has led us into the assumption that
we can get away with 5 sets of coefficient sets; this again offers
possibilities for constraining the API (for instance using enums for specifying
which coefficient set to use).

to sum it up: if we only consider the FIR code without its applications, we may
include it, and later we'll find out that what we have included should have
been done differently; that is why I suggest at least considering the
requirements of the datahandle code a bit before deciding how to proceed for
getting everything in the CVS

> as i understand, you want to add ssefir-4.h and coeffs-3.h to stock BSE, right?
> so, a few comments on those headers first:
> - C++ header files are named .hh throughout the beast project
no objections here - when I originally wrote the code I wrote it as standalone
program; but for including I would suggest bsessefir.hh (we come to the
problems with that name later)

> - coeffs-3.h lacks proper namespacing and #ifdef guards. if it's contents
> aren't included into ssefir-4.h, it should be setup as a normal header files.
> is there any reason to not include it with the FIR code?

I think that it would be better to define the coefficients in a C(++) file
rather than a header file. A factory pattern may be useful for getting a
resampler with certain coefficient sets. Look at the datahandle code, at the
Resampler2Factory::create method. That would be my current idea on how
to proceed with coefficients:

 * provide an abstract Resampler2 interface (I did this in the recent
   version of the code)
 * provide a factory which can create resamplers using well-defined
   requirements; actually for implementing this, we may need two
   internal factory helper methods, on in an object file compiled with
   -msse, and one in an object file compiled without the flag

> - the "sse" syllable used all over the place is actually a gross misnomer. i
> don't see anything in the code that makes it SSE instruction set dependent (and
> that is good the way it is ;). instead, the code uses the GCC vectorizer, so
> can be compiled into multiple instruction sets. consequently, the files, API
> and namespaces should be renamed, e.g. to Bse::Resampler.

Well, the code can be divided into three or four levels, like this:

1. basic vectorized FIR code (and utils like AlignedMem)
2. Upsampler2 and Downsampler2, the actual implementations
   (which need to be compiled into two seperate object files, one with SSE
    instructions, and one without)
3. Resampler2, the abstract resampler interface; and a corresponding
   factory for creating resamplers
4. The resampling datahandle

1) can be used for purposes beyond factor 2 resampling; and even for purposes
beyond resampling at all (for instance it could be used to implement an
equalizer)

2) can only be used for streaming fast factor2 resampling (under some
constraints)

3) is a simple interface to it

4) finally a C interface

As for the name SSE:

While you are right that in theory other chipsets could benefit from
what I do with GCCs vector operations, in practice the code may not turn
out to be faster. The reason is that I use a coefficient array and a
number of temporary variables which match the capabilities of SSE more
or less ideally. Thats what I have benchmarked.

That means that on non-SSE SIMD instruction sets, the code may turn out
slower than the FPU code. Or - less extreme - a different way of
scrambeling the coefficient array may produce faster code than the one
optimized for SSE.

But bseresampler is fine with me. However, I am not sure about removing
SSE and replacing it with a more neutral term (like SIMD or VECTORIZED),
because for a new platform with a different vector instruction set, a
new version may need to be written anyway.

> - please honour the general coding style rules of the beast project, namely to
> put spaces around mathematical operators like +,/,-,*,%,&,| etc.

I'll upload a new version which is more like the beast coding style,
although I can't promise that my spaces are exactly at the same places
where you would have put them.

> - also, opening paranthesis are space prefixed like this: "foo ();"

I often omit the space for functions without arguments, and we had a
discussion about it quite some time ago (where you agreed that its ok
in BEAST). The reason for doing it is that in C++ functions without
arguments are often accessors, and as such used similar to variables in
plain C code.

* Example of C code:

struct Rect {
  /*...*/
  int width;
  int height;
} rect;

printf ("pixels to paint = %d\n", rect.width * rect.height);

* The same written in C++:

class Rect {
  /*...*/
  int width() { return m_width; }
  int height() { return m_height; }
} rect;

printf ("pixels to paint = %d\n", rect.width() * rect.height());

If you put extra spaces at the places before opening parentheses, it
gets harder to see what are "variables" in the computation, and what are
"operators", because the spaces don't help associating width and height
with rect.

printf ("pixels to paint = %d\n", rect.width () * rect.height ());

But in any case, I am trying to be consistent to the rule:
 * zero args: optional space before opening parentheses
 * one or more args: space before opening parentheses

> - in general, normal C code is written in lower case, typenames in mixed case
> and enum values (some constants) and macros in upper case. your function
> fir_process_one_sample() should thus take an argument "n", not "N".

Well, it used to be a template argument. What about these? I capitalize
template arguments (unless they are type names), because they are const.

And what about

  const int ORDER = no_sse_taps.size();

is that ok, or does that need fixing, too?

> - fir_process_one_sample() sums into a double accumulator but returns a float.
> that's a loss of precision if double output is actually wanted, and incurres
> extra double->float conversions if the function call is properly inlined and
> e.g. it's return value reused in double precision context. the best way here is
> probably to make it a template that requires an explicit <float> or <double>
> specification upon funciton calls.

Like that?

template<class Accumulator> inline Accumulator
fir_process_one_sample (const float *input, const float *taps, const int n_taps)
{
  Accumulator out = 0;
  for (int i = 0; i < n_taps; i++)
    out += input[i] * taps[i];
  return out;
}

> - why does the array argument to fir_compute_taps_sse() have a "no" prefix? is
> it not an sse_taps array?

It has a no prefix because it is not an sse_taps array. I extended the
source code comment now (new are the first three lines):

/**
 * fir_compute_taps_sse takes a normal vector of FIR taps as argument and
 * computes a specially scrambled version of these taps, ready to be used
 * for SSE operations (by fir_process_4samples_sse).
 *
 * we require a special ordering of the FIR taps, to get maximum benefit of the SSE SIMD operations
 *
 * example: suppose the FIR taps are [ x1 x2 x3 x4 x5 x6 x7 x8 x9 ], then the SSE taps become
 *
 * [ x1 x2 x3 x4   0 x1 x2 x3   0  0 x1 x2   0  0  0 x1      <- for input[0]
 *   x5 x6 x7 x8  x4 x5 x6 x7  x3 x4 x5 x6  x2 x3 x4 x5      <- for input[1]
 *   x9  0  0  0  x8 x9  0  0  x7 x8 x9  0  x6 x7 x8 x9 ]    <- for input[2]
 * \------------/\-----------/\-----------/\-----------/
 *    for out0     for out1      for out2     for out3
 *
 * so that we can compute out0, out1, out2 and out3 simultaneously
 * from input[0]..input[2]
 */
vector<float>
fir_compute_taps_sse (const vector<float>& no_sse_taps)
{
  const int T = no_sse_taps.size();
  vector<float> sse_taps ((T + 6) / 4 * 16);

  for (int j = 0; j < 4; j++)
    for (int i = 0; i < T; i++)
      {
        int k = i + j;
        sse_taps[(k / 4) * 16 + (k % 4) + j * 4] = no_sse_taps[i];
      }

  return sse_taps;
}

The question still is, whether fir_compute_taps_sse is public API. If
yes, then the comment should start with /**. But then, I am not sure if
the example should be part of the source code documentation, or not.

> - assertions used in beast code should be g_assert() or BIRNET_ASSERT(). in
> non-public header files and .c/.cc files, the shorthand ASSERT() may also be
> used. assert.h or cassert should *not* be included.

I tried to write dependancy-less code for the standalone example. This
made development easier. But ok, I'll add a glib-2.0 dependancy. Thats
not overly complex and I can use g_assert, then.

> - malloc() also should not be used in BEAST, g_malloc(), g_slice_new() and
> new/delete are available instead.
Ok.

> - the whole notion of an AlignedMem class in the FIR code seems wrong to me
> anyway. this needs to be moved to some C++ utility files (probably in birnet),
> and should get special casing to use g_slice if possible or memalign.

Yes. I don't like it either; I just wanted to get the main problems
solved first. What you're suggesting (pushing aligned allocation further
down in the dependancy stack) sounds good. Let me add that AlignedMem is
currently a strange mixture of a minimal reimplementation of
std::vector, which also performs aligned memory allocation.

Ideally, I think we should have an aligned STL allocator, so that the
normal std::vector implementation can be used, but proper alignment of
the memory the vector uses can still be relied upon. I briefly looked
into this before writing the AlignedMem class, but I didn't find
reasonable documentation about how to write allocators.

> - use of memcpy() requires inclusion of string.h

Did I use it? Usually I use std::copy. Anyway, grepping the most recent
version of my code for memcpy gets no results, so I still don't include
string.h.

> - for portability reasons, g_memmove should be used instead of memmove().

Ok, I changed that one.

> - /* please write 1line comments like this */

Ok.

> - and only for documentation comments that start with "/**" should the first
> line of a comment be left blank.

Ok.

> there maybe other small issues with the code but that can be handled later. for
> a start it'd be good to have the above comments integrated/fixed, and then get
> the code reviewed again and hopefully included.

I'll upload a new version now.

Comment 13 Stefan Westerfeld 2006-04-10 16:14:06 UTC

Created attachment 63147 [details]
C++ Resampler code header

Comment 14 Stefan Westerfeld 2006-04-10 16:15:11 UTC

Created attachment 63148 [details]
C++ Resampler code implementation

Comment 15 Stefan Westerfeld 2006-04-10 16:16:02 UTC

Created attachment 63149 [details]
New standalone test program

Comment 16 Stefan Westerfeld 2006-04-10 16:18:12 UTC

Created attachment 63150 [details]
New upsampling datahandle

Comment 17 Tim Janik 2006-04-10 19:43:40 UTC

> However, I am not sure about removing
> SSE and replacing it with a more neutral term (like SIMD or VECTORIZED),
> because for a new platform with a different vector instruction set, a
> new version may need to be written anyway.

to the best of my judgement, if we write code that depends on
gcc vectorization/simd features and on those alone, we should
name the files that way as well. if new versions are written
we can still decide what to do (use rthe same file, rename
files, etc.), that scenario is to vague to really consider it
atm.

> I'll upload a new version which is more like the beast coding style,
> although I can't promise that my spaces are exactly at the same places
> where you would have put them.

thanks, and you're of course not expected to match all requirements
out of the blue ;) that's what review is for.

> I often omit the space for functions without arguments, and we had a
> discussion about it quite some time ago (where you agreed that its ok
> in BEAST). The reason for doing it is that 
[...]

if you seriously start to discuss coding style issues, then please
move this out of technical bug reports (whre it's off topic) and
e.g. start a thread on the mailing list.

as for:
> const int ORDER = no_sse_taps.size();

if this was a global enum value or macro, yes, you'd upper case it.
a global constant is somewhat of an arguable border case i'd say,
inside a function i'd prolly most often use the lower case variant
like an ordinary variable name.
in any case, function arguments should be lower case, template
arguments can be upper case if they equate global constants.
(fwiw, i asked mitch about the gimp coding style on this, and he'd
strictly lower case everyhting except macros and enum values, which
i'm of course fine with as well).

> template<class Accumulator> inline Accumulator
> fir_process_one_sample (const float *input, const float *taps, const int

this looks good to me, except the missing newlines after each ',',
and i'd probably have called the accu AccuType.

> vector<float> fir_compute_taps_sse (const vector<float>& no_sse_taps)

using explicit negation in naming is most often a bad idea.
here, if i understood your comment correctly, you could better
use something like "unordered_taps" and i gues what you return
would be "sse_ordered_taps", right?

> const int T = no_sse_taps.size();

this, inside the function should be lower case, and a descriptive
variable name would also help occasional readers, e.g. "n_taps" or
"n_unordered_taps".

the spacing looks fine to me in that function now btw.

as for using "/**" comments. basically, they can be used everywhere
throughout the code. the "public API" distinction is accomplished
by a more generic BSE mechanism anyway, namely the glue layer and
a dedicated tool to generate API docs, i.e. bseautodoc.

> Ideally, I think we should have an aligned STL allocator, so that the
> normal std::vector implementation can be used,
[...]

i don't think this is at all suitable, because afaik, the allocator
functionality/interface isn't super stable in libstdc++ across gcc-3.3
up to gcc-4.2 yet. for the moment, something similar to what you
did might suffice.

Comment 18 Tim Janik 2006-04-10 22:08:57 UTC

the exact mechnism to implement SSE specialization in the BSE core and in the plugins has yet to be worked out, so i'll comment on your resampling implementation later on.
two things up front anyway:
- in your code:
  /* see: http://ds9a.nl/gcc-simd/ */
  typedef float v4sf __attribute__ ((vector_size (16))); //((mode(V4SF))); // vector of four single floats
  union F4Vector
  {
    v4sf v;
    float f[4];
  };
  typenames should always be named in MixedCase, here e.g. Vector4SingleFloat.
  however, grepping the rest of the file, the typedef doesn't need to be neccessary at all, can't you just inline its definition for the union member?
- on thje file name. the API to the resample will (with SSE/auto-vectorization in the core) probably be nothing more than a single/few simple function calls. and then resample.[hc] would make more sense i guess. however as i said i'll come back with more details anyway as i can imagine that we'll have special prefixes for files which are compiled with optimization flags in the BSE core.

Comment 19 Stefan Westerfeld 2006-04-12 14:48:30 UTC

Created attachment 63311 [details]
C++ Resampler code header

Comment 20 Stefan Westerfeld 2006-04-12 14:49:19 UTC

Created attachment 63312 [details]
C++ Resampler code implementation

Comment 21 Stefan Westerfeld 2006-04-12 14:49:56 UTC

Created attachment 63313 [details]
Standalone test program

Comment 22 Stefan Westerfeld 2006-04-12 15:07:48 UTC

(In reply to comment #17)
> > However, I am not sure about removing
> > SSE and replacing it with a more neutral term (like SIMD or VECTORIZED),
> > because for a new platform with a different vector instruction set, a
> > new version may need to be written anyway.
> 
> to the best of my judgement, if we write code that depends on
> gcc vectorization/simd features and on those alone, we should
> name the files that way as well. if new versions are written
> we can still decide what to do (use rthe same file, rename
> files, etc.), that scenario is to vague to really consider it
> atm.

Ok, I renamed variables and source code comments from SSE to SIMD now.
 
> as for:
> > const int ORDER = no_sse_taps.size();
> 
> if this was a global enum value or macro, yes, you'd upper case it.
> a global constant is somewhat of an arguable border case i'd say,
> inside a function i'd prolly most often use the lower case variant
> like an ordinary variable name.

Ok, I lowercased it.

> > template<class Accumulator> inline Accumulator
> > fir_process_one_sample (const float *input, const float *taps, const int
> 
> this looks good to me, except the missing newlines after each ',',
> and i'd probably have called the accu AccuType.

Ok, the new version of the code has lots new of newlines for function/method arguments.

> > vector<float> fir_compute_taps_sse (const vector<float>& no_sse_taps)
> 
> using explicit negation in naming is most often a bad idea.
> here, if i understood your comment correctly, you could better
> use something like "unordered_taps" and i gues what you return
> would be "sse_ordered_taps", right?

I don't like the idea of calling it unordered taps, because you can hardly call the array [ x1 x2 x3 x4 x5 x6 ] "unordered". If anything is unordered, then it is the simd_taps. So I consistently use the following in the source code how:

 * taps: filter impulse response ("ordered", formerly called no_sse_taps)
 * simd_taps: scrambled filter impulse response (for simd instructions,   
   formerly called sse_taps)

> > const int T = no_sse_taps.size();
> 
> this, inside the function should be lower case, and a descriptive
> variable name would also help occasional readers, e.g. "n_taps" or
> "n_unordered_taps".

Yes, I am using "order" now consistently throughout the source. The reason for not using n_taps and n_unordered_taps (rather n_simd_taps) is that these two numbers differ; and you can not reasonably call the template argument for the filters N_TAPS. Thus ORDER (or order) is probably a better name for achieving consistency throughout the source.

> > Ideally, I think we should have an aligned STL allocator, so that the
> > normal std::vector implementation can be used,
> [...]
> 
> i don't think this is at all suitable, because afaik, the allocator
> functionality/interface isn't super stable in libstdc++ across gcc-3.3
> up to gcc-4.2 yet. for the moment, something similar to what you
> did might suffice.

Are you sure that what you're speaking about is not the allocator *implementation* in libstdc++? Even if the allocator implementation has changed in the past (I know it has), and maybe will continue to change in the future, the *interface* of the STL allocator is specified in the C++ Standard. Thus it should never change in libstdc++ (the same way you would not expect std::string to have a different interface after an upgrade to gcc-4.2), unless the STL allocator *interface* was not implemented properly before.

But if we just move the AlignedMem class to some different header, its also ok for me.

Comment 23 Stefan Westerfeld 2006-04-12 15:30:25 UTC

(In reply to comment #18)
> the exact mechnism to implement SSE specialization in the BSE core and in the
> plugins has yet to be worked out, so i'll comment on your resampling
> implementation later on.
> two things up front anyway:
> - in your code:
>   /* see: http://ds9a.nl/gcc-simd/ */
>   typedef float v4sf __attribute__ ((vector_size (16))); //((mode(V4SF))); //
> vector of four single floats
>   union F4Vector
>   {
>     v4sf v;
>     float f[4];
>   };
>   typenames should always be named in MixedCase, here e.g. Vector4SingleFloat.
>   however, grepping the rest of the file, the typedef doesn't need to be
> neccessary at all, can't you just inline its definition for the union member?

Ok, the new version inlines it now, so the offending v4sf type is gone.

> - on thje file name. the API to the resample will (with SSE/auto-vectorization
> in the core) probably be nothing more than a single/few simple function calls.
> and then resample.[hc] would make more sense i guess.

Well, of course you can C-ify the Resampler2 interface, like this:

typedef enum {...} BseResampler2Mode;
typedef enum {...} BseResampler2Precision;
typedef ... BseResampler2;

BseResampler2* bse_resampler2_create (BseResampler2Mode      mode,
                                      BseResampler2Precision precision);
void           bse_resampler2_process_block (BseResampler2  *resampler2,
                                             const gfloat   *input,
                                             guint           n_input_samples,
                                             gfloat         *output);
guint          bse_resampler2_order         (BseResampler2  *resampler2);
void           bse_resampler2_free          (BseResampler2  *resampler2);

However, as this C interface can be trivially implemented on top of corresponding C++ classes, we could simply provide bseresampler.hh, bseresampler.cc and an additional bseresampler.h (which can then also be implemented in bseresampler.cc, as wrappers around the C++ interface).

> however as i said i'll
> come back with more details anyway as i can imagine that we'll have special
> prefixes for files which are compiled with optimization flags in the BSE core.

Ok.

Comment 24 Tim Janik 2006-05-01 21:14:01 UTC

things have changed quite a bit in CVS since my last comment, since beast got extended to support SSE and auto vectorization now.
in particular we now have bse/bseblockutils.hh (also contains C digestible prototypes) and bse/bseblockutils.cc for time critical operations on float and integer blocks.
processor optimized versions of these can be added to plugins/bseblockutils.cc.

can you rework your proposal in terms of addign the resampling funcitonality to those files (including a C API)?

Comment 25 Stefan Westerfeld 2006-05-05 22:03:07 UTC

Created attachment 64889 [details] [review]
Resampler integration patch into CVS head

Here is a patch that twiddles makefiles and hooks up the Resampler code into bseblockutils.

Comment 26 Stefan Westerfeld 2006-05-05 22:06:59 UTC

Created attachment 64890 [details]
Resampling datahandle testcode

Testcode for the datahandle; tests both, FPU and SSE version; uses the birnet test framework where possible.

Comment 27 Stefan Westerfeld 2006-05-05 22:11:36 UTC

Created attachment 64891 [details]
C++ Resampler code header

The C++ (and C) API to the resampler, including the factory API for creating a resampler for some specification.

What changed? Everything that is not needed for code using the Resampler API disappeared into the .tcc file.

Comment 28 Stefan Westerfeld 2006-05-05 22:32:11 UTC

Created attachment 64892 [details]
C++ Resampler code implementation: TCC file

This contains the actual resample filter implementation, helper classes, helper functions and so on. By template argument, this code can become instantiate both: a plain FPU implementation or a SSE implementation.

This .tcc file has been seperated from the public header, so that the "implementation bits" are not visible to those who use the .hh file, but, still the code can be used by the factories defined in libbse and the SSE plugin in the plugins directories.

Comment 29 Stefan Westerfeld 2006-05-05 22:34:21 UTC

Created attachment 64893 [details]
C++ Resampler code implementation: CC file

This contains the implementation bits for bseresampler.hh, mostly related to factory code, and the coefficient sets for the filters.

Comment 30 Stefan Westerfeld 2006-05-05 22:36:37 UTC

Created attachment 64894 [details]
"Standalone" test program

This is a standalone commandline test utility for the resampler.

Note: a problem here is that the program needs to be compiled with SSE as it uses the .tcc file directly; however, it resides in a part of BSE where SSE can not be guaranteed. I currently know no easy solution, but still want to keep the test as it covers aspects that the other test programs I propose don't.

Comment 31 Stefan Westerfeld 2006-05-05 22:38:56 UTC

Created attachment 64895 [details]
Resampler C API testcode

This is testcode (the only file I propose adding to the CVS deliberately written in plain C) that tests the C API of the Resampler.

Comment 32 Stefan Westerfeld 2006-05-05 22:43:00 UTC

Created attachment 64897 [details]
Upsampling datahandle

This implements an upsampling datahandle on top of the resampler. It will automagically use whatever upsampler is provided by the factory, and thereby be faster when SSE support is loaded via plugin into BSE.

The factory code has been moved out in this version (and was added to the public API in bseresampler.hh instead); this also makes the coefficient tables go away.

Comment 33 Stefan Westerfeld 2006-10-21 23:52:55 UTC

I think this bug can be closed now that the sources are in the SVN (reopen if you think there is something relevant I missed).