Packed Strings (PSTR) in Flowlog

Flowlog implements an internal “packed string” term kind (PL_TERM_PSTR) to represent lists of Unicode characters compactly, while remaining fully list-equivalent at the Prolog level (SECTION 11).

This document explains:

Code references (section markers in flowlog.c):

High-level semantics

PL_TERM_PSTR represents a value that is list-equivalent to a Prolog list of character atoms:

"ab" == [a,b].  % when read with double_quotes=chars
functor("ab", '.', 2).
arg(1, "ab", a).
arg(2, "ab", [b]).

Important: PL_TERM_PSTR is not a new user-visible type. It is an internal optimization that must behave like '.'/2 lists under unification and all ISO list/term operations (SECTIONS 11/12/14/15).

Memory layout

A packed string is stored inline in the term allocation:

  +--------------------+
  | pl_term header     |  (tag = PL_TERM_PSTR)
  +--------------------+
  | bytes[0..n-1]      |  UTF-8 bytes
  | 0                  |  NUL terminator (byte 0)
  | padding '\0' ...   |  to align the next pointer
  +--------------------+
  | pl_term* tail      |  continuation (see below)
  +--------------------+

The pl_term field t->v.pstr.bytes points to the first UTF-8 byte of the inline sequence.

The tail pointer is located by:

  1. scanning to the NUL terminator (byte 0)
  2. skipping the terminator
  3. rounding up to sizeof(void*) alignment
  4. reading the pl_term* at that aligned slot

This is implemented by (SUBSECTION 11.1):

Tail meaning

The tail cell (pl_term*) is the continuation of the list-like structure. Typical tails:

Construction and normalization

Allocating a packed run

make_pstr_rt(rt_ctrl, dbg_dat_ptr, bytes, len, tail) copies UTF-8 bytes[0..len-1] into the packed run and stores tail in the aligned tail slot (SUBSECTION 10.5, using layout helpers from SUBSECTION 11.1).

Notable behavior:

Empty-run peeling

Empty packed runs should be treated as transparent (equivalent to their tail).

Flowlog centralizes this in (SUBSECTION 11.3 and SUBSECTION 17.1):

Many engine operations “peel” before doing further work so empty PSTR segments do not leak into logic.

Term-view interface (preferred)

The goal is for most of the engine to treat packed strings through a small “term-view” API that hides the representation details.

Key helpers (SUBSECTION 11.3, with term predicates in SUBSECTION 17.1):

If you are implementing a built-in that should work on lists, prefer these helpers instead of checking t->tag == PL_TERM_PSTR directly.

Fast paths (where PSTR is referenced outside the core)

Even with a term-view layer, a few hot paths intentionally special-case PSTR to avoid per-element overhead.

length/2 fast path

length/2 uses a PSTR run-scan path that counts codepoints from UTF-8 bytes and jumps by tail pointers instead of repeatedly unconsing:

atom_chars/2 and number_chars/2 fast paths

These predicates often consume a whole list of characters. If the list is a PSTR (and properly terminated), Flowlog converts directly from packed byte runs into a UTF-8 C string without expanding into per-character terms:

The fast path is used for:

It intentionally rejects *_codes/2 when given a non-empty PSTR, since a PSTR is a list of characters, not integers.

subsumes_term/2 fast path

subsumes_term/2 (and its internal unifier) can be very allocation-heavy when implemented via repeated uncons on a packed run (because each tail step would otherwise allocate a PL_TERM_PSTR slice header).

Flowlog therefore has a scan-fast list-like subsumption path that:

Entry points:

Where the representation still leaks

For correctness, there are still a few places that explicitly mention PL_TERM_PSTR (either for fast paths or because they predate the term-view layer).

Common categories:

When adding new features, prefer to:

  1. keep PL_TERM_PSTR-specific logic in a small number of helper functions
  2. expose list-like behavior via the term-view helpers above
  3. add explicit fast paths only when it materially reduces allocations or asymptotic cost

Current limitations / notes