2012-01-26 17:19:17 vForce (vvsqrt, vvsin) Performance Testing
Some vvsin and vvsqrt tests (on doubles) using randomized datasets working on a dataset of 100k elements. I broke up the work into varying sized chunks, from 10 to 50,000 element arrays, and then averaged results over multiple runs.

On my 3ghz xeon system, vvsqrt consistently was about twice as fast as a sqrt loop for any sized array.

Green lines represent a standard C loop for the math function.
Blue lines are several runs of the vForce library in question.
Array chunk sizes are along the x-axis, time in milliseconds along the y-axis.

Lin-log plot of vvsqrt



On the next graph, I plotted the timing ratio of vForce to Libm. The green line represents a 1:1 ratio. The blue line represents a (two 4-core) 3GHz Xeon MacPro and the reddish line is from a (one 4-core) 2.8 GHz i7 iMac. There is a bump right at 1024 of bad performance, and then greater gains from then on out, as the algorithm for vvsin turns on GCD for multicore machines. I missed this bump when I first started testing with data sets that were powers of 10.

Lin-log plot of vvsin



( c )
  1  #include <stdio.h>
2 #include <time.h>
3 #include <mach/mach_time.h>
4 #import <Accelerate/Accelerate.h>
5
6 #define TOTAL 131072
7 #define TRIAL_SIZE 5
8 #define BOUNDS(a) ((sizeof(a))/(sizeof((a)[0])))
9
10 int main() {
11
12 uint64_t t2, t1, t0 = mach_absolute_time( );
13 mach_timebase_info_data_t timebase;
14 mach_timebase_info(&timebase);
15 double ticksToNanoseconds = (double)timebase.numer / timebase.denom;
16 srandom(time(NULL));
17 double x[TOTAL], y[TOTAL], z[TOTAL];
18 int i, size;
19
20 for (int l=1; l<18; l++) {
21 size = pow( 2, l );
22 double time1 = 0, time2 = 0;
23 for (int k=0; k<TRIAL_SIZE; k++) {
24 int chunks = (int)(TOTAL / size);
25
26 for (i=0; i<TOTAL; i++)
27 x[ i ] = (double)random()/(double)RAND_MAX;
28 t0 = mach_absolute_time();
29
30 for (int j=0; j<chunks; j++)
31 for (i=0; i<size; i++)
32 // y[i + size * j] [color= sqrt(x[i + size * j]);
33 y[i + size * j] = sin(x[i + size * j]);
34 t1 = mach_absolute_time();
35
36 int num = size;
37 for (int j=0; j<chunks; j++) {
38 double *zz = &z[size * j], *xx = &x[size * j];
39 // vvsqrt( zz, xx, &num );
40 vvsin( zz, xx, &num );
41 }
42 t2 = mach_absolute_time();
43
44 time1 += (t1-t0) * ticksToNanoseconds;
45 time2 += (t2-t1) * ticksToNanoseconds;
46 }
47 printf( "%6i, %g\n", size, time2/time1 );
48 }
49 }

Compiled via: clang -o vForceTest vForceTest.m -framework Accelerate

Let me know how this could be improved.
  • Ian Ollmann (Thu, January 26th, 2012, 7:14pm UTC)
    The jump occurs at the crossover where we invoke GCD to multithread the work. Apparently the crossover is not set right for your hardware.

Leave a comment