Theoretical Foundations

pnxt’s design rests on rigorous theoretical foundations spanning category theory, formal verification, process algebra, and information-flow control. These aren’t academic exercises — each directly addresses a concrete problem in LLM-driven programming.

Typed Tokenization (HoTT)

Problem: LLMs process flat token sequences, losing structural information about code.

Solution: Represent code as categorical objects using Homotopy Type Theory (HoTT). Programs become:

Objects — Types, functions, modules
Morphisms — Transformations between types
Paths — Proofs of equivalence between implementations

This enables LLMs to manipulate code at a semantic level rather than a syntactic one. Two programs that compute the same result are provably equivalent paths in the type space.

VPIR (Verifiable Programmatic Intermediate Representation)

Problem: LLM reasoning is opaque — there’s no way to verify that each step logically follows from the last.

Solution: VPIR forces reasoning chains where every step is mechanically verifiable:

Every operation is a typed, idempotent operator
Logical constraints are separated from execution
Each step carries a proof certificate that can be checked independently
The full chain forms a verifiable trace from inputs to outputs

VPIR enables trust without blind faith — you can verify the reasoning, not just the result.

Dataflow Process Networks (DPN)

Problem: Imperative loops require tracking mutable state across iterations — something LLMs handle poorly.

Solution: Model execution as a Dataflow Process Network where:

The LLM defines actors (processing nodes)
Actors communicate solely via non-blocking unidirectional FIFO channels
No shared mutable state — each actor owns its local state
Execution is inherently parallel and deterministic

This eliminates the loop-state tracking problem entirely. Instead of for i in range(n): accumulate(state), you define data transformations that flow through a directed graph.

LLMbda Calculus (IFC)

Problem: LLMs are vulnerable to prompt injection — malicious inputs that hijack the model’s behavior.

Solution: A typed lambda calculus enriched with Information Flow Control (IFC):

Every value carries a security label (e.g., trusted, user-input, untrusted)
Information can flow from low to high security but never from high to low
Mathematical noninterference guarantee: untrusted inputs cannot influence trusted outputs
Typing enables compile-time IFC checking, decidable Z3 queries, and LLM boundary validation
Provides formal protection against prompt injection at the calculus level

SMT Solvers

Problem: LLMs generate plausible but potentially incorrect code — there’s no automatic way to check constraints.

Solution: Integrate SMT solvers (Z3, CVC5) into the execution pipeline:

Atomic claims are autoformalized into first-order logic
The solver evaluates whether constraints are satisfiable
Type invariants, preconditions, and postconditions are checked mechanically
Neurosymbolic refinement bridges stochastic tokens to deterministic proofs

Bridge Grammar

Problem: Standard autoregressive transformers output text tokens, but this system requires typed graphs.

Solution: A constrained-decoding grammar — a strict JSON schema that forces LLMs to output valid VPIR nodes:

Every LLM output must conform to the Bridge Grammar schema
Invalid outputs are structurally impossible (not just unlikely)
The grammar defines the complete set of valid VPIR node types
Translation pipeline converts Bridge Grammar JSON → HoTT morphisms → SMT constraints

Tree-sitter DKB Knowledge Graph

Problem: Codebases stored as flat files lose structural relationships between components.

Solution: Store the codebase as a non-Euclidean knowledge graph using Tree-sitter for parsing:

Code is parsed into AST nodes and stored as graph vertices
Dependencies, call chains, and type relationships are graph edges
Geometric Deep Learning (GNNs) reads and reasons about the graph structure
Queries are semantic (“functions that handle authentication”) not syntactic (grep -r "auth")

Adjacent Sciences

Several additional fields inform the design:

Field	Application
Geometric Deep Learning (GNNs)	Reading and reasoning about the code knowledge graph
Probabilistic ASP (P-ASP)	Bridging stochastic LLM tokens to deterministic solver inputs
Active Inference	Driving automated graph patching and self-repair