/ray/src/lib/numerics/3d/vector3.h File Reference

3d vector template for 3D applications. More...

#include <lib/numerics/num_math.h>

Include dependency graph for vector3.h:

This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Namespaces

namespace NUM

Defines

#define _LIB_NUMERICS_3D_VECTOR3_H_ 1

Detailed Description

3d vector template for 3D applications.

Author:: Wolfgang Wieser ] wwieser (a) gmx <*> de [

So, here comes the vector class. Everybody knows that the main problem with all this number crunching numerics is speed. So, using C++ features like a class and operator overloading seems to contradict this goal in the first place. But on the other hand, I also want some convenience and not end up writing this old-fashioned C-style code like:

    vector_t r;
    assign_vect(r,a);
    mul_vect_scalar(r,3);
    add_vect(r,b);
    vector_t tmp;
    assign_vect(tmp,c);
    mul_vect_scalar(tmp,3);
    add_vect(r,tmp);
    normalize_vect(r);

when I could do things as simple and well-readable as that:

    Vector r=normalize(3*a+b+2*c);

But after all we want speed, don't we?

So, I ran some tests to see the difference between directly inlined C-style code (no loops, all vector_t as arrays and coding in C-style in the fastest way I could imagine) and (compiler-inlined) vector class code. Of course, the main problem would be the overhead due to temporary objects being created and destroyed by the C++ compiler when using the C++ variant.

It turned out in my experiments, that when carefully coding the C++ variant it was in a lot of cases possible to get comparable speed as with the C variant. However, I could not get the right clue on general design considerations which would ensure good performance.
For example I saw the following odd things:

    // One would expect the second version to be faster since no 
    // assignment is involved and the vector v can modify itself. However...
    v=normalize(v);    // <-- measurement: _faster_
    v.normalize();     // <-- measurement: slower
    
    // On the right side: Times in 1/100 of a second for repeated execution 
    // of the code on the left side. See text!!
    // C++ versions:             //  A    B
    Vector bb=a+b+2*c;           // 543  118    (1)
    Vector bb=c*2+b+a;           // 523  135    (2)
    Vector bb=c*2; bb+=a+b;      // 456  120    (3)
    Vector bb=c*2; bb+a; bb+=b;  // 479  122    (4)
    
    dbl l=len(bb);   // <-- "offending code"
    
    // C versions:                                                     //  A    B
    dbl bb[3]={c[0]*2+a[0]+b[0],c[1]*2+a[1]+b[1],c[2]*2+a[2]+b[2]}; // 406  147
    dbl bb[3]={a[0]+b[0]+c[0]*2,a[1]+b[1]+c[1]*2,a[2]+b[2]+c[2]*2}; // 393  143
    
    dbl l=sqrt(bb[0]*bb[0]+bb[1]*bb[1]+bb[2]*bb[2]);  // <-- "offending code"

I first made the measurements with 1e8 repetitions of the above code; each time using one of the listed possibilities. (The loop contained also the initialisation of the 3 vectors a,b,c and a statement to add the entries of bb to a summing dbl (which was then the return value of main()) in order to prevent things from being optimized away.) So, the first measurements resulted in the timings in the A column. Espcially note that the C version is somewhat faster than all the C++ versions and that there are nice differences in the timimgs for the different C++ versions. It seems, (3) is faster then (1) because there are fewer temporaries but why should then (4) be again slower?

But wait, it gets better: Next I noticed that the time was dominated by a left-over line which was commented with "offending code" in the listing above (both in the C and the C++ version). So, I thought that the sqrt() call in the length calculation would probably dominate the timings and hence removed the line in order to be able to show a greater effect of variable re-ordering and also to show that C was still faster than C++.

However, after removing the lines, the timings were those in the columns called B and strangely, now the C++ version is faster than the C version and also variant (1) (previously the slowest one) turned out to be the fastest one out of the C++ versions!!
So I asked myself, what the hell was going on here!

Conclusion

When not doing too fancy and over-engineered C++-style operations, both versions can run at a comparable speed.
The actual runtime of the code then depends to a quite large extent on the used compiler, the architecture and/or the code context and it is not clear (at least not to me) in advance which version would run faster and why. (For those interested, I used GCC 3.4.2 20040902 (prerelease) on an AthlonXP but optimized with -march=i686 since that proved to be faster than -march=athlon-xp. For optimization, I chose -O3 -ffast-math. Timings were on idle system.)

Definition in file vector3.h.

Define Documentation

#define _LIB_NUMERICS_3D_VECTOR3_H_ 1

Definition at line 18 of file vector3.h.

Generated on Sat Feb 19 22:34:18 2005 for Ray by

1.3.5


Namespaces
namespace	NUM
Defines
#define	_LIB_NUMERICS_3D_VECTOR3_H_ 1