The J programming language’s threading system was limited to a maximum of 63 worker threads per thread pool. This limitation prevented full utilization of modern systems with high core counts (e.g., 128-core FreeBSD systems).
When attempting to create threads equal to the number of available cores minus one (e.g., 127 worker threads on a 128-core system), the system would fail with:
|limit error, executing dyad T.
|a system limit was exceeded
| 0 T.0
The limitation stemmed from the locking mechanism used in the job
queue system. The implementation stored the lock counter in the 6 least
significant bits (LSBs) of the job queue head pointer
(jobq->ht[0]). Since pointers to job structures are
aligned on 64-byte boundaries (ABDY=64), these 6 bits are guaranteed to
be zero in valid pointers and could be repurposed.
However, this design limited the lock counter to 6 bits, allowing a maximum value of 63 (2^6 - 1). If 64 or more threads tried to acquire the lock simultaneously, the counter would overflow into bit 6, corrupting the pointer.
The fix involved separating the lock counter from the pointer storage by:
The patch modifies three files:
MAXTHREADS: Increased from 63 to 255 (maximum total
threads system-wide)MAXTHREADSRND: Increased from 64 to 256 (MAXTHREADS+1
rounded to power of 2)MAXTHREADSINPOOL: Increased from 63 to 128 (maximum
threads per pool)Added a new field to the job queue structure:
UI lockcount; // Lock counter for this job queue. 0 = unlocked, >0 = number of threads trying to acquire lockJOBLOCK macro to use
jobq->lockcount instead of LSBs of
jobq->ht[0]JOBUNLOCK macro to reset
lockcount to 0joblock() function to work with the separate
counterWhile the immediate limit was raised to 128 threads per pool and 255 total threads, the actual technical limits are:
US nthreads field - unsigned short)JTT threaddata[MAXTHREADS] arrayThe choice of 255/256 provides a reasonable balance between capability and memory usage while staying within the bounds where the system uses efficient 8-bit atomic operations for certain locks (WLOCKBIT behavior changes at 256).
The new locking mechanism maintains the same performance characteristics: - Fast path (uncontended): Single atomic fetch-and-add operation - Contended path: Spin-wait with exponential backoff - No change to the lock-free fast path for job queue operations
On a 128-core FreeBSD system:
8 T. ''
128 255
{{0 T.0}}^:127 ''
127
2 T. ''
127 0 127The system successfully creates 127 worker threads (plus 1 master thread = 128 total) and reports them correctly in the thread pool.
The complete patch is available here: thread-limit-increase.patch
To apply this patch to your J source:
cd /path/to/jsource
patch -p1 < j-thread-limit-increase.patch
make clean
makeFurther increases beyond 255 threads would require: 1. Converting from static to dynamic allocation of thread data structures 2. Changing WLOCKBIT behavior (currently optimized for <256 threads) 3. Evaluating practical benefits vs. memory overhead
The current implementation provides sufficient threading capability for modern high-core-count systems while maintaining backward compatibility and minimal changes to the existing codebase.