Increasing J Thread Pool Limit from 63 to 128 Threads

Problem

The J programming language’s threading system was limited to a maximum of 63 worker threads per thread pool. This limitation prevented full utilization of modern systems with high core counts (e.g., 128-core FreeBSD systems).

When attempting to create threads equal to the number of available cores minus one (e.g., 127 worker threads on a 128-core system), the system would fail with:

|limit error, executing dyad T.
|a system limit was exceeded
|   0     T.0

Root Cause

The limitation stemmed from the locking mechanism used in the job queue system. The implementation stored the lock counter in the 6 least significant bits (LSBs) of the job queue head pointer (jobq->ht[0]). Since pointers to job structures are aligned on 64-byte boundaries (ABDY=64), these 6 bits are guaranteed to be zero in valid pointers and could be repurposed.

However, this design limited the lock counter to 6 bits, allowing a maximum value of 63 (2^6 - 1). If 64 or more threads tried to acquire the lock simultaneously, the counter would overflow into bit 6, corrupting the pointer.

Solution

The fix involved separating the lock counter from the pointer storage by:

  1. Adding a dedicated lock counter field to the JOBQ structure
  2. Modifying the locking macros to use the new counter instead of pointer LSBs
  3. Increasing the thread limits to support more threads

Implementation Details

Changes Made

The patch modifies three files:

1. jsrc/j.h - Thread limit constants

2. jsrc/jt.h - JOBQ structure modification

Added a new field to the job queue structure:

UI lockcount;  // Lock counter for this job queue. 0 = unlocked, >0 = number of threads trying to acquire lock

3. jsrc/ct.c - Lock implementation changes

Technical Constraints

While the immediate limit was raised to 128 threads per pool and 255 total threads, the actual technical limits are:

  1. Per-pool limit: 65,535 threads (limited by US nthreads field - unsigned short)
  2. Total system limit: Constrained by static allocation of JTT threaddata[MAXTHREADS] array
  3. Memory considerations: Each thread requires a JTT structure (several KB), so 255 threads use significant memory

The choice of 255/256 provides a reasonable balance between capability and memory usage while staying within the bounds where the system uses efficient 8-bit atomic operations for certain locks (WLOCKBIT behavior changes at 256).

Performance Impact

The new locking mechanism maintains the same performance characteristics: - Fast path (uncontended): Single atomic fetch-and-add operation - Contended path: Spin-wait with exponential backoff - No change to the lock-free fast path for job queue operations

Testing

On a 128-core FreeBSD system:

   8 T. ''
128 255

   {{0 T.0}}^:127 ''
127

   2 T. ''
127 0 127

The system successfully creates 127 worker threads (plus 1 master thread = 128 total) and reports them correctly in the thread pool.

Patch Download

The complete patch is available here: thread-limit-increase.patch

To apply this patch to your J source:

cd /path/to/jsource
patch -p1 < j-thread-limit-increase.patch
make clean
make

Future Considerations

Further increases beyond 255 threads would require: 1. Converting from static to dynamic allocation of thread data structures 2. Changing WLOCKBIT behavior (currently optimized for <256 threads) 3. Evaluating practical benefits vs. memory overhead

The current implementation provides sufficient threading capability for modern high-core-count systems while maintaining backward compatibility and minimal changes to the existing codebase.