Futhark-J Bridge - Technical Documentation

Table of Contents

  1. Introduction
  2. Architecture Overview
  3. Compilation Pipeline
  4. FFI Layer
  5. Type System
  6. Memory Management
  7. The fut Adverb
  8. Platform Support
  9. Performance Analysis
  10. Internals Reference

Introduction

The Futhark-J Bridge enables J programs to execute Futhark code for high-performance parallel computations. Unlike traditional FFI approaches that require separate compilation steps, this bridge compiles Futhark code on-demand from string literals or multiline 0 : 0 blocks embedded in J source files.

Design Goals

  1. Zero-friction integration: Write Futhark inline, use immediately
  2. Immediate compilation: Code compiles at definition time, not first use
  3. Persistent context: Futhark context created once, reused for all calls
  4. Automatic caching: Compile once, reuse forever
  5. Type safety: Automatic type detection and conversion
  6. Parallel performance: Full multicore execution

How It Works (High Level)

J Source File                    At Definition Time
─────────────────                ─────────────────────

my_verb =: '...' fut  ────────▶  1. Hash Futhark code
                                 2. Check cache
NB. or multiline:                3. Compile if needed
my_src =: 0 : 0                  4. Create Futhark context
  entry f ...                    5. Return bound verb
)
my_verb =: my_src fut
                                 At Execution Time
                                 ─────────────────────
my_verb 1 2 3 4 5    ────────▶  6. Execute via FFI (reuses context)
                                 7. Return result

Key improvement in v1.4.0: The Futhark context is created at definition time and persists for the lifetime of the verb. This eliminates the “warmup” overhead on first execution.

Key improvement in v1.6.0: Added support for scalar inputs (not just arrays) and improved error reporting with full compiler output on failures.

Key improvement in v1.7.0: Fixed parsing of fixed-size array dimensions ([][32]i64) and correct detection of entry point when helper functions (def) are present.

Architecture Overview

Component Diagram

┌─────────────────────────────────────────────────────────────────┐
│                        J Runtime                                 │
├─────────────────────────────────────────────────────────────────┤
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐  │
│  │  fut adverb  │  │ fut_compile  │  │ fut_exec_scalar/array│  │
│  │  (1 : 0)     │──│              │──│                      │  │
│  └──────────────┘  └──────────────┘  └──────────────────────┘  │
│         │                 │                    │                │
│         │                 │                    │                │
├─────────┼─────────────────┼────────────────────┼────────────────┤
│         │                 │                    │                │
│         ▼                 ▼                    ▼                │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                    J FFI (cd verb)                       │   │
│  │                    Memory: mema/memf/memr/memw           │   │
│  └─────────────────────────────────────────────────────────┘   │
│                              │                                  │
└──────────────────────────────┼──────────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Operating System                              │
├─────────────────────────────────────────────────────────────────┤
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐  │
│  │   Futhark    │  │     GCC      │  │   Shared Library     │  │
│  │   Compiler   │──│   Compiler   │──│      (.so)           │  │
│  └──────────────┘  └──────────────┘  └──────────────────────┘  │
│                                              │                  │
│                                              │                  │
│  ┌───────────────────────────────────────────┼──────────────┐  │
│  │              /tmp/futhark_j/              │              │  │
│  │  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─┴───────┐     │  │
│  │  │ .fut    │  │ .c      │  │ .h      │  │ .so     │     │  │
│  │  │ source  │  │ code    │  │ header  │  │ library │     │  │
│  │  └─────────┘  └─────────┘  └─────────┘  └─────────┘     │  │
│  └──────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

Data Flow

J Array (1 2 3 4 5)
        │
        ▼ (J stores as contiguous 64-bit floats)
┌───────────────────────────────────────┐
│  Memory: [1.0][2.0][3.0][4.0][5.0]    │
└───────────────────────────────────────┘
        │
        ▼ (FFI passes pointer + length)
┌───────────────────────────────────────┐
│  futhark_new_f64_1d(ctx, ptr, n)      │
│  Creates opaque Futhark array         │
└───────────────────────────────────────┘
        │
        ▼ (Futhark processes in parallel)
┌───────────────────────────────────────┐
│  futhark_entry_f(ctx, out, in)        │
│  Executes map/reduce/scan             │
└───────────────────────────────────────┘
        │
        ▼ (Copy results back)
┌───────────────────────────────────────┐
│  futhark_values_f64_1d(ctx, arr, buf) │
│  Copies to J-allocated memory         │
└───────────────────────────────────────┘
        │
        ▼ (J reads result)
┌───────────────────────────────────────┐
│  memr buf, 0, n, 8                    │
│  Returns J array                      │
└───────────────────────────────────────┘
        │
        ▼
J Array (2 4 6 8 10)

Compilation Pipeline

Step 1: Hashing

Source code is hashed using J’s built-in SHA-256 (128!:6):

hash =. 16 {. 3 (128!:6) code   NB. First 16 hex chars of SHA-256
id =. 'fut_', hash

This produces a 16-character hexadecimal string (~64 bits of entropy) that serves as a unique identifier. SHA-256 provides cryptographic collision resistance, making hash collisions virtually impossible even for very similar code strings.

Example:

"entry f (xs: []f64) : f64 = reduce (+) 0 xs"  →  fut_a3b8f2c1e9d04567
"entry f (xs: []f64) : f64 = reduce (*) 1 xs"  →  fut_7c2e1a9b5f803d42

Why SHA-256? - Native C implementation via 128!:6 (faster than interpreted J code) - Cryptographic quality distribution (no collision risk) - Standard, well-understood algorithm

Step 2: Cache Lookup

Files are checked in order: 1. fut_<hash>.so - If exists, return immediately 2. fut_<hash>.c - If exists, compile to .so 3. Neither - Full compilation needed

Step 3: Futhark Compilation

cd /tmp/futhark_j
futhark multicore --library fut_<hash>.fut

This generates: - fut_<hash>.c - C source with parallel implementation - fut_<hash>.h - Header with API declarations - fut_<hash>.json - Metadata about entry points

Step 4: Platform Patches

For FreeBSD with multicore/ispc backends:

# Add FreeBSD CPU detection
sed -i.bak 's/#elif __EMSCRIPTEN__/#elif defined(__FreeBSD__)\n  return sysconf(_SC_NPROCESSORS_ONLN);\n#elif __EMSCRIPTEN__/' file.c

# Add missing headers
(echo "#include <sys/resource.h>"; echo "#include <unistd.h>"; cat file.c) > temp.c
mv temp.c file.c

Step 5: GCC Compilation

gcc -O3 -fPIC -shared -o fut_<hash>.so fut_<hash>.c -lm -lpthread

Flags explained: - -O3 - Maximum optimization - -fPIC - Position-independent code (required for .so) - -shared - Build shared library - -lm - Link math library - -lpthread - Link pthreads (for multicore)

FFI Layer

J FFI Basics

J’s FFI uses the cd verb (15!:0):

result =: 'library function_signature' cd arguments

Function Signatures

Format: ' function_name return_type arg1_type arg2_type ...'

Type codes: | Code | Type | Size | |——|——|——| | n | void | 0 | | c | char | 1 | | s | short | 2 | | i | int | 4 | | x | int64 | 8 | | f | float | 4 | | d | double | 8 | | *t | pointer to t | 8 | | > t | return type | - |

Futhark C API Calls

Context Management:

NB. Create config
cfg =. (so, ' futhark_context_config_new > x') cd ''

NB. Set threads
(so, ' futhark_context_config_set_num_threads > n x i') cd cfg; 12

NB. Create context
ctx =. (so, ' futhark_context_new > x x') cd cfg

Array Operations:

NB. Create array from J data
arr =. (so, ' futhark_new_f64_1d > x x *d x') cd ctx; data; n

NB. Copy values out
(so, ' futhark_values_f64_1d > i x x x') cd ctx; arr; buf

NB. Free array
(so, ' futhark_free_f64_1d > i x x') cd ctx; arr

Entry Point Calls:

NB. Scalar return
rc =. (so, ' futhark_entry_f > i x x x') cd ctx; out_ptr; in_arr

NB. Array return (pointer-to-pointer)
rc =. (so, ' futhark_entry_f > i x x x') cd ctx; ptr_space; in_arr

Type System

Futhark to J Mapping

Futhark J FFI memr code Size J representation
i8 c 1 1 integer
i16 s - 2 integer
i32 i 2 4 integer
i64 x 4 8 integer
u8 c 1 1 integer
u16 s - 2 integer
u32 i 2 4 integer
u64 x 4 8 integer
f32 f 6 4 float
f64 d 8 8 float

Type Detection

The parser examines Futhark code to detect types:

fut_parse_dtype =: 3 : 0
  code =. y
  if. +./ '[]f64' E. code do. 'f64'
  elseif. +./ '[]f32' E. code do. 'f32'
  elseif. +./ '[]i64' E. code do. 'i64'
  elseif. +./ '[]i32' E. code do. 'i32'
  elseif. +./ ': f64' E. code do. 'f64'
  NB. ... etc
  elseif. do. 'f64'  NB. default
  end.
)

Return Type Detection

fut_returns_array =: 3 : 0
  code =. y
  paren_pos =. I. ')' E. code
  if. 0 < # paren_pos do.
    rest =. (1 + {. paren_pos) }. code
    +./ ': []' E. rest
  else.
    0
  end.
)

Example:

"entry f (xs: []f64) : f64 = ..."     →  0 (scalar)
"entry f (xs: []f64) : []f64 = ..."   →  1 (array)

Memory Management

J Memory Functions

Function Purpose
mema n Allocate n bytes, return address
memf addr Free allocated memory
memr addr,off,n,type Read n values of type from addr+off
memw data addr,off,n,type Write n values to addr+off

Allocation Pattern

For scalar returns:

out_mem =. mema 8              NB. Allocate 8 bytes for double
NB. ... call Futhark ...
result =. 0 { memr out_mem, 0, 1, 8  NB. Read one double
memf out_mem                   NB. Free

For array returns:

ptr_space =. mema 8            NB. For pointer-to-pointer
NB. ... call Futhark ...
arr_out =. 0 { memr ptr_space, 0, 1, 4  NB. Read pointer
memf ptr_space

data_mem =. mema n * 8         NB. For actual data
NB. ... futhark_values ...
result =. memr data_mem, 0, n, 8
memf data_mem

Resource Cleanup Order

Resources must be freed in reverse allocation order:

NB. Allocation order:
1. cfg (config)
2. ctx (context)
3. arr_in (input array)
4. ptr_space (output pointer)
5. arr_out (output array)
6. data_mem (result buffer)

NB. Free order:
6. memf data_mem
5. futhark_free arr_out
4. memf ptr_space
3. futhark_free arr_in
2. futhark_context_free ctx
1. futhark_context_config_free cfg

The fut Adverb

Why an Adverb?

In J, verbs cannot return verbs. However, adverbs can produce verbs as their result. The fut adverb:

  1. Takes Futhark code as operand m
  2. Compiles code and extracts metadata
  3. Creates persistent Futhark context
  4. Returns a verb bound with & to the executor

Implementation (v1.4.0+)

fut =: 1 : 0
  code =. m

  NB. Compile to shared library
  so_file =. fut_compile code

  NB. Create Futhark context immediately (no warmup needed later)
  ctx =. fut_init_context so_file

  NB. Extract metadata
  entry =. fut_parse_entry code
  in_dtype =. fut_input_dtype code
  out_dtype =. fut_output_dtype code
  in_rank =. fut_input_rank code
  out_rank =. fut_output_rank code

  NB. Return bound verb with pre-created context
  if. out_rank = 0 do.
    (so_file;ctx;entry;in_dtype;out_dtype;in_rank)&fut_exec_scalar
  else.
    (so_file;ctx;entry;in_dtype;out_dtype;in_rank;out_rank)&fut_exec_nd
  end.
)

Context Initialization

The fut_init_context function creates a Futhark context at definition time:

fut_init_context =: 3 : 0
  so =. y
  cfg =. (so, ' futhark_context_config_new > x') cd ''
  (so, ' futhark_context_config_set_num_threads > n x i') cd cfg; FUTHARK_THREADS
  ctx =. (so, ' futhark_context_new > x x') cd cfg
  ctx
)

This context is bound into the returned verb and reused for every subsequent call, eliminating the overhead of context creation/destruction on each invocation.

Binding with &

The expression (so_file;ctx;entry;...)&fut_exec_nd creates a monadic verb where: - The left argument (so_file;ctx;entry;...) including the context is baked in - The user’s input becomes the right argument

This is equivalent to:

my_verb =: 3 : '(so_file;ctx;entry;...) fut_exec_nd y'

Platform Support

Linux

Full support with no modifications needed.

FreeBSD

Requires automatic patches for multicore/ispc backends:

  1. Missing Headers
  2. CPU Detection

macOS

Should work (untested). May need: - Adjust .dylib paths - Handle different sed syntax

Performance Analysis

Overhead Sources

  1. Definition time: ~1-2 seconds (compilation, one-time)
  2. Execution time: ~0.1-1ms (FFI overhead per call)
  3. Data transfer: O(n) for copying arrays

When Futhark Wins

When J Wins

Optimization Strategies

  1. Batch operations: Combine in Futhark
  2. Persistent context: Context is reused automatically (v1.4.0+)
  3. Match types: Avoid conversions

Internals Reference

Global Variables

Variable Default Description
FUTHARK_CACHE /tmp/futhark_j/ Cache directory
FUTHARK_BACKEND multicore Compilation backend
FUTHARK_THREADS 12 Thread count

Internal Functions

Function Purpose
fut_type_to_j Futhark type → J FFI code
fut_type_to_memtype Futhark type → memr code
fut_type_size Futhark type → byte size
fut_compile Compile Futhark to .so
fut_init_context Create persistent Futhark context
fut_parse_entry Extract entry point name
fut_parse_dtype Detect data type (legacy)
fut_returns_array Detect return type (legacy)
fut_parse_return_type Extract full return type string
fut_parse_input_type Extract input parameter type string
fut_count_brackets Count array dimensions in type
fut_extract_basetype Remove [] from type to get base type
fut_input_rank Get input array rank (1=1D, 2=2D)
fut_output_rank Get output array rank
fut_input_dtype Get input base type (f64, i32, etc.)
fut_output_dtype Get output base type
fut_exec_scalar Execute array-in, scalar-out function (with context)
fut_exec_nd Execute array-in, array-out function (with context)
fut_exec_scalar_dyad Execute dyadic scalar-returning function (with context)
fut_exec_nd_dyad Execute dyadic array-returning function (with context)
fut_exec_scalar_in_array_out Execute scalar-in, array-out function (v1.6.0)
fut_exec_scalar_in_scalar_out Execute scalar-in, scalar-out function (v1.6.0)
fut_make_new_sig Build FFI signature for futhark_new_*
fut_make_new_args Build argument list for array creation

Public API

Function Purpose
fut Adverb: compile and create verb
fut_clear_cache Clear compiled libraries
fut_set_threads Set thread count
fut_version Display version info

Error Handling

Errors are reported via echo: - Futhark compilation failures (with full compiler error output) - GCC compilation failures - Runtime execution failures (non-zero return codes)

Failed compilation returns empty string; fut returns identity verb as fallback.

Improved in v1.6.0: Compilation errors are now captured to a file and displayed in full, including the source file path for debugging:

Futhark compilation failed:
Error at fut_abc123.fut:10:6-9:
Unknown name "acc"
...
Source file: /tmp/futhark_j/fut_abc123.fut

Scalar Input Support (v1.6.0)

Overview

Version 1.6.0 adds support for Futhark functions that take scalar inputs instead of arrays. This enables functions like prime factorization that take a single number and return an array of results.

Futhark C API for Scalar Inputs

For scalar inputs, Futhark passes the value directly rather than wrapping it in an array struct:

// Array input (traditional)
int futhark_entry_f(ctx, out**, const struct futhark_f64_1d *in);

// Scalar input (new in v1.6.0)
int futhark_entry_f(ctx, out**, const int64_t in);

Usage Example

NB. Prime factorization: scalar in, array out
NB. Use 0 : 0 to define multiline Futhark, then apply fut separately
factors_src =: 0 : 0
  entry f (n: i64) : []i64 =
    let (_, _, result) =
      loop (x, i, acc) = (n, 2i64, ([] : []i64))
      while x > 1 do
        if x % i == 0
        then (x / i, i, acc ++ [i])
        else (x, i + 1, acc)
    in result
)
factors =: factors_src fut

factors 12      NB. Returns: 2 2 3
factors 100     NB. Returns: 2 2 5 5

Supported Combinations

Input Output Executor
Array Scalar fut_exec_scalar
Array Array fut_exec_nd
Scalar Scalar fut_exec_scalar_in_scalar_out
Scalar Array fut_exec_scalar_in_array_out

The fut adverb automatically detects scalar vs array input by checking if in_rank = 0.

Fixed-Size Array Dimensions (v1.7.0)

Overview

Version 1.7.0 fixes support for Futhark’s fixed-size array dimensions and improves parsing accuracy when helper functions are present.

Fixed-Size Dimension Syntax

Futhark allows fixed-size dimensions using [n] syntax:

-- Dynamic outer dimension, fixed inner dimension of 3
entry f (ns: []i64) : [][3]i64 = ...

-- Dynamic outer dimension, fixed inner dimension of 32
entry f (ns: []i64) : [][32]i64 = ...

Bug Fixes in v1.7.0

1. Bracket counting for fixed dimensions

Prior to v1.7.0, fut_count_brackets counted [] patterns only: - [][]f64 → correctly returned 2 - [][3]i64 → incorrectly returned 1 (should be 2)

Fixed by counting [ characters instead of [] pairs:

NB. Before (wrong):
fut_count_brackets =: +/ '[]' E. y

NB. After (correct):
fut_count_brackets =: +/ '[' E. y

2. Base type extraction for fixed dimensions

Prior to v1.7.0, fut_extract_basetype only removed [] prefixes: - [][]f64 → correctly returned f64 - [][32]i64 → incorrectly returned [32]i64

Fixed by removing all bracket-delimited prefixes:

NB. After (correct):
fut_extract_basetype =: 3 : 0
  type =. y
  while. '[' = {. type do.
    close_pos =. type i. ']'
    type =. (close_pos + 1) }. type
  end.
  type
)

3. Entry point parsing with helper functions

When Futhark code contains helper functions using def, the parser could mistakenly find the helper’s parameters instead of the entry point’s:

def helper (x: i64) : i64 = x * 2   -- Parser found this (wrong)

entry f (ns: []i64) : []i64 = ...   -- Should find this

Fixed by first locating entry and then searching for parameters from that position forward:

fut_parse_input_type =: 3 : 0
  code =. y
  NB. Find 'entry ' and look for parameters after it
  entry_pos =. I. 'entry ' E. code
  if. 0 = # entry_pos do. '[]f64' return. end.
  NB. Get code starting from entry point
  code_from_entry =. ({. entry_pos) }. code
  NB. ... rest of parsing
)

Usage Example

NB. Batch factorization with helper function
batch_factor =: 0 : 0
  def factorize (n: i64) : [32]i64 =
    let (_, _, factors, _) =
      loop (x, i, acc, c) = (n, 2i64, replicate 32 0i64, 0i64)
      while x > 1 && c < 32 do
        if x % i == 0
          then (x / i, i, acc with [c] = i, c + 1)
          else (x, i + 1, acc, c)
    in factors

  entry f (ns: []i64) : [][32]i64 =
    map factorize ns
) fut

NB. This correctly parses:
NB.   Input: []i64 (1D array)
NB.   Output: [][32]i64 (2D array with fixed inner dimension)

batch_factor 12 100 30
   2 2 3 0 0 ...
   2 2 5 5 0 ...
   2 3 5 0 0 ...

Technical Documentation for Futhark-J Bridge v1.7.0