Flowlog implements an internal “packed string” term kind
(PL_TERM_PSTR) to represent lists of Unicode characters
compactly, while remaining fully list-equivalent at the
Prolog level (SECTION 11).
This document explains:
Code references (section markers in flowlog.c):
length/2,
atom_chars/2, number_chars/2,
subsumes_term/2): SECTIONS 22/23/24 (interp/WAM/WAMVM)PL_TERM_PSTR represents a value that is list-equivalent
to a Prolog list of character atoms:
"ab" == [a,b]. % when read with double_quotes=chars
functor("ab", '.', 2).
arg(1, "ab", a).
arg(2, "ab", [b]).Important: PL_TERM_PSTR is not a new
user-visible type. It is an internal optimization that must behave like
'.'/2 lists under unification and all ISO list/term
operations (SECTIONS 11/12/14/15).
A packed string is stored inline in the term allocation:
+--------------------+
| pl_term header | (tag = PL_TERM_PSTR)
+--------------------+
| bytes[0..n-1] | UTF-8 bytes
| 0 | NUL terminator (byte 0)
| padding '\0' ... | to align the next pointer
+--------------------+
| pl_term* tail | continuation (see below)
+--------------------+
The pl_term field t->v.pstr.bytes points
to the first UTF-8 byte of the inline sequence.
The tail pointer is located by:
0)sizeof(void*) alignmentpl_term* at that aligned slotThis is implemented by (SUBSECTION 11.1):
pstr_tail_term_from_nul()pstr_tail_term_from_bytes()pstr_run_len_bytes()The tail cell (pl_term*) is the continuation of the
list-like structure. Typical tails:
[] atom: complete stringPL_TERM_PSTR: concatenation by chaining packed
runs'.'/2 term: transition back to a “real” list
representationmake_pstr_rt(rt_ctrl, dbg_dat_ptr, bytes, len, tail)
copies UTF-8 bytes[0..len-1] into the packed run and stores
tail in the aligned tail slot (SUBSECTION 10.5, using
layout helpers from SUBSECTION 11.1).
Notable behavior:
len == 0, it returns tail directly (no
allocation).Empty packed runs should be treated as transparent (equivalent to their tail).
Flowlog centralizes this in (SUBSECTION 11.3 and SUBSECTION 17.1):
term_peel_pstr_empty_const()deref_term_and_peel_pstr_empty() /
deref_term_wam_and_peel_pstr_empty()Many engine operations “peel” before doing further work so empty PSTR segments do not leak into logic.
The goal is for most of the engine to treat packed strings through a small “term-view” API that hides the representation details.
Key helpers (SUBSECTION 11.3, with term predicates in SUBSECTION 17.1):
term_is_list_pair(t) for real '.'/2term_is_pstr_nonempty(t) for non-empty PSTRterm_is_list_like_pair(t) for eitherterm_list_like_uncons_view() (const “view” uncons; no
allocations)term_list_like_uncons_rt() (runtime uncons; may
allocate a slice term)term_list_like_run_len_and_tail_const() (works for both
list pairs and PSTR)term_pstr_run_len_and_tail_peeled_const() (PSTR-only,
assumes already at a non-empty PSTR)term_functor_arity_rt() returns '.'/2 for
list-like terms (including PSTR)term_compound_arg_rt() returns head/tail for list-like
terms (including PSTR)If you are implementing a built-in that should work on lists, prefer
these helpers instead of checking t->tag == PL_TERM_PSTR
directly.
Even with a term-view layer, a few hot paths intentionally special-case PSTR to avoid per-element overhead.
length/2 fast pathlength/2 uses a PSTR run-scan path that counts
codepoints from UTF-8 bytes and jumps by tail pointers instead of
repeatedly unconsing:
term_pstr_run_len_and_tail_peeled_const() (SUBSECTION
11.3)term_is_pstr_nonempty() + the above helper (SECTIONS
22/23/24)atom_chars/2
and number_chars/2 fast pathsThese predicates often consume a whole list of characters. If the list is a PSTR (and properly terminated), Flowlog converts directly from packed byte runs into a UTF-8 C string without expanding into per-character terms:
pstr_list_to_cstr_env() (interpreter/env mode,
SUBSECTION 11.3)pstr_list_to_cstr_wam() (WAM/WAMVM mode, SUBSECTION
11.3)The fast path is used for:
atom_chars/2 (List -> Atom direction, SECTIONS
22/23/24)number_chars/2 (List -> Number direction, SECTIONS
22/23/24)It intentionally rejects *_codes/2 when given a
non-empty PSTR, since a PSTR is a list of characters,
not integers.
subsumes_term/2 fast
pathsubsumes_term/2 (and its internal unifier) can be very
allocation-heavy when implemented via repeated uncons on a
packed run (because each tail step would otherwise allocate a
PL_TERM_PSTR slice header).
Flowlog therefore has a scan-fast list-like subsumption path that:
Entry points:
unify_terms_subsumes_listlike_fast() (interp/env mode,
SECTION 14)unify_terms_subsumes_wam_tr_listlike_fast() (WAM/WAMVM
mode, SECTION 15)For correctness, there are still a few places that explicitly mention
PL_TERM_PSTR (either for fast paths or because they predate
the term-view layer).
Common categories:
list_to_vec_iso*() expands PSTR into a vector of
character atoms (slow path, SUBSECTION 11.3)When adding new features, prefer to:
PL_TERM_PSTR-specific logic in a small number of
helper functions0), so
U+0000 is not representable inside a run.
file_to_chars/2), Flowlog splits packed runs and inserts an
explicit '.'/2 list cell with integer 0 for
each NUL byte (SUBSECTION 11.4).