=================================================================
       DOT PRODUCT BENCHMARK: Native J vs Futhark
=================================================================
  dot(a,b) = sum(a * b)
  J:       +/ a * b
  Futhark: reduce (+) 0 (map2 (*) a b)
=================================================================

Backend: multicore  Threads: 12

Size          J (ms)      Futhark (ms)  Speedup    Match
---------     --------    ------------  -------    -----
100000            0.20        0.75       0.27x    OK  
500000            2.06        5.15       0.40x    OK  
1000000           4.29       13.48       0.32x    OK  
2000000          10.03       29.74       0.34x    OK  
5000000          30.22       76.79       0.39x    OK  
10000000         71.24      179.27       0.40x    OK  
100000000       573.31     2465.81       0.23x    OK  

=================================================================
ANALYSIS:

  Dot product is a SIMPLE, MEMORY-BOUND operation:
    - Only 2 FLOPs per element (multiply + add)
    - Performance limited by memory bandwidth, not CPU
    - J's native +/@:* is highly optimized for this

  Futhark overhead includes:
    - FFI boundary crossing
    - Memory allocation and data copying
    - Thread synchronization

  For simple operations like dot product, native J wins.
  Futhark excels at COMPUTE-BOUND operations where the
  computation per element is much higher (e.g., stencils,
  complex reductions, nested parallelism).

  Try the stddev benchmark for a more complex operation
  where Futhark can better amortize its overhead.
=================================================================