══════════════════════════════════════════════════════════════════ STANDARD DEVIATION BENCHMARK: Native J vs Futhark ══════════════════════════════════════════════════════════════════ stddev(x) = sqrt( mean( (x - mean(x))^2 ) ) Operations per element: - Compute mean (reduction over all elements) - Subtract mean from each element - Square each difference - Sum all squared differences (reduction) - Divide by n and take sqrt This is MORE COMPUTE-INTENSIVE than dot product because: - Multiple passes over data - More arithmetic operations per element - Better opportunity for parallel speedup ══════════════════════════════════════════════════════════════════ Backend: multicore Threads: 12 Size J (ms) Futhark (ms) Speedup Match ---------- -------- ------------ --------- ----- 1000 0.00 0.02 0.25x OK 10000 0.01 0.12 0.12x OK 100000 0.20 0.35 0.57x OK 500000 3.30 1.76 1.87x OK 1000000 7.19 6.40 1.12x OK 2000000 25.47 13.33 1.91x OK 5000000 42.12 38.91 1.08x OK 10000000 262.87 133.73 1.97x OK 20000000 479.80 276.26 1.74x OK 50000000 850.79 441.40 1.93x OK 100000000 1108.66 732.17 1.51x OK ══════════════════════════════════════════════════════════════════ RESULTS ANALYSIS ══════════════════════════════════════════════════════════════════ Crossover point: ~500000 elements Futhark becomes faster than J at this size Maximum speedup: 1.97x at 10000000 elements Average speedup: 1.28x Key observations: - Speedup > 1.0x means Futhark is faster - Standard deviation is compute-bound (multiple ops per element) - Futhark can parallelize both reductions and element-wise ops - J's implementation is sequential but highly optimized ══════════════════════════════════════════════════════════════════