#include <lib/numerics/num_math.h>
Include dependency graph for vector3.h:
This graph shows which files directly or indirectly include this file:
Go to the source code of this file.
Namespaces | |
namespace | NUM |
Defines | |
#define | _LIB_NUMERICS_3D_VECTOR3_H_ 1 |
vector_t r; assign_vect(r,a); mul_vect_scalar(r,3); add_vect(r,b); vector_t tmp; assign_vect(tmp,c); mul_vect_scalar(tmp,3); add_vect(r,tmp); normalize_vect(r);
Vector r=normalize(3*a+b+2*c);
But after all we want speed, don't we?
So, I ran some tests to see the difference between directly inlined C-style code (no loops, all vector_t as arrays and coding in C-style in the fastest way I could imagine) and (compiler-inlined) vector class code. Of course, the main problem would be the overhead due to temporary objects being created and destroyed by the C++ compiler when using the C++ variant.
It turned out in my experiments, that when carefully coding the C++ variant it was in a lot of cases possible to get comparable speed as with the C variant. However, I could not get the right clue on general design considerations which would ensure good performance.
For example I saw the following odd things:
// One would expect the second version to be faster since no // assignment is involved and the vector v can modify itself. However... v=normalize(v); // <-- measurement: _faster_ v.normalize(); // <-- measurement: slower // On the right side: Times in 1/100 of a second for repeated execution // of the code on the left side. See text!! // C++ versions: // A B Vector bb=a+b+2*c; // 543 118 (1) Vector bb=c*2+b+a; // 523 135 (2) Vector bb=c*2; bb+=a+b; // 456 120 (3) Vector bb=c*2; bb+a; bb+=b; // 479 122 (4) dbl l=len(bb); // <-- "offending code" // C versions: // A B dbl bb[3]={c[0]*2+a[0]+b[0],c[1]*2+a[1]+b[1],c[2]*2+a[2]+b[2]}; // 406 147 dbl bb[3]={a[0]+b[0]+c[0]*2,a[1]+b[1]+c[1]*2,a[2]+b[2]+c[2]*2}; // 393 143 dbl l=sqrt(bb[0]*bb[0]+bb[1]*bb[1]+bb[2]*bb[2]); // <-- "offending code"
I first made the measurements with 1e8 repetitions of the above code; each time using one of the listed possibilities. (The loop contained also the initialisation of the 3 vectors a,b,c and a statement to add the entries of bb to a summing dbl (which was then the return value of main()) in order to prevent things from being optimized away.) So, the first measurements resulted in the timings in the A column. Espcially note that the C version is somewhat faster than all the C++ versions and that there are nice differences in the timimgs for the different C++ versions. It seems, (3) is faster then (1) because there are fewer temporaries but why should then (4) be again slower?
But wait, it gets better: Next I noticed that the time was dominated by a left-over line which was commented with "offending code" in the listing above (both in the C and the C++ version). So, I thought that the sqrt() call in the length calculation would probably dominate the timings and hence removed the line in order to be able to show a greater effect of variable re-ordering and also to show that C was still faster than C++.
However, after removing the lines, the timings were those in the columns called B and strangely, now the C++ version is faster than the C version and also variant (1) (previously the slowest one) turned out to be the fastest one out of the C++ versions!!
So I asked myself, what the hell was going on here!
Conclusion
When not doing too fancy and over-engineered C++-style operations, both versions can run at a comparable speed.
The actual runtime of the code then depends to a quite large extent on the used compiler, the architecture and/or the code context and it is not clear (at least not to me) in advance which version would run faster and why. (For those interested, I used GCC 3.4.2 20040902 (prerelease) on an AthlonXP but optimized with -march=i686 since that proved to be faster than -march=athlon-xp. For optimization, I chose -O3 -ffast-math. Timings were on idle system.)
Definition in file vector3.h.
|
|