Skip to content

Theoretical Foundations

pnxt’s design rests on rigorous theoretical foundations spanning category theory, formal verification, process algebra, and information-flow control. These aren’t academic exercises — each directly addresses a concrete problem in LLM-driven programming.


Typed Tokenization (HoTT)

Problem: LLMs process flat token sequences, losing structural information about code.

Solution: Represent code as categorical objects using Homotopy Type Theory (HoTT). Programs become:

  • Objects — Types, functions, modules
  • Morphisms — Transformations between types
  • Paths — Proofs of equivalence between implementations

This enables LLMs to manipulate code at a semantic level rather than a syntactic one. Two programs that compute the same result are provably equivalent paths in the type space.


VPIR (Verifiable Programmatic Intermediate Representation)

Problem: LLM reasoning is opaque — there’s no way to verify that each step logically follows from the last.

Solution: VPIR forces reasoning chains where every step is mechanically verifiable:

  • Every operation is a typed, idempotent operator
  • Logical constraints are separated from execution
  • Each step carries a proof certificate that can be checked independently
  • The full chain forms a verifiable trace from inputs to outputs

VPIR enables trust without blind faith — you can verify the reasoning, not just the result.


Dataflow Process Networks (DPN)

Problem: Imperative loops require tracking mutable state across iterations — something LLMs handle poorly.

Solution: Model execution as a Dataflow Process Network where:

  • The LLM defines actors (processing nodes)
  • Actors communicate solely via non-blocking unidirectional FIFO channels
  • No shared mutable state — each actor owns its local state
  • Execution is inherently parallel and deterministic

This eliminates the loop-state tracking problem entirely. Instead of for i in range(n): accumulate(state), you define data transformations that flow through a directed graph.


LLMbda Calculus (IFC)

Problem: LLMs are vulnerable to prompt injection — malicious inputs that hijack the model’s behavior.

Solution: A typed lambda calculus enriched with Information Flow Control (IFC):

  • Every value carries a security label (e.g., trusted, user-input, untrusted)
  • Information can flow from low to high security but never from high to low
  • Mathematical noninterference guarantee: untrusted inputs cannot influence trusted outputs
  • Typing enables compile-time IFC checking, decidable Z3 queries, and LLM boundary validation
  • Provides formal protection against prompt injection at the calculus level

SMT Solvers

Problem: LLMs generate plausible but potentially incorrect code — there’s no automatic way to check constraints.

Solution: Integrate SMT solvers (Z3, CVC5) into the execution pipeline:

  • Atomic claims are autoformalized into first-order logic
  • The solver evaluates whether constraints are satisfiable
  • Type invariants, preconditions, and postconditions are checked mechanically
  • Neurosymbolic refinement bridges stochastic tokens to deterministic proofs

Bridge Grammar

Problem: Standard autoregressive transformers output text tokens, but this system requires typed graphs.

Solution: A constrained-decoding grammar — a strict JSON schema that forces LLMs to output valid VPIR nodes:

  • Every LLM output must conform to the Bridge Grammar schema
  • Invalid outputs are structurally impossible (not just unlikely)
  • The grammar defines the complete set of valid VPIR node types
  • Translation pipeline converts Bridge Grammar JSON → HoTT morphisms → SMT constraints

Tree-sitter DKB Knowledge Graph

Problem: Codebases stored as flat files lose structural relationships between components.

Solution: Store the codebase as a non-Euclidean knowledge graph using Tree-sitter for parsing:

  • Code is parsed into AST nodes and stored as graph vertices
  • Dependencies, call chains, and type relationships are graph edges
  • Geometric Deep Learning (GNNs) reads and reasons about the graph structure
  • Queries are semantic (“functions that handle authentication”) not syntactic (grep -r "auth")

Adjacent Sciences

Several additional fields inform the design:

FieldApplication
Geometric Deep Learning (GNNs)Reading and reasoning about the code knowledge graph
Probabilistic ASP (P-ASP)Bridging stochastic LLM tokens to deterministic solver inputs
Active InferenceDriving automated graph patching and self-repair

Further Reading