Skip to content

Trust, Safety & Governance

Overview

The central tension in Agent-Native Programming is between autonomy and control. Agents are most valuable when they can act independently, but unconstrained autonomy carries real risks: bugs, deleted code, exposed secrets, or poor architectural decisions.

This framework manages that tension with a system where risks are proportional to safeguards and trust is earned incrementally.


Graduated Trust Model

Rather than binary trusted/untrusted, ANP uses a five-level graduated trust model:

Trust Levels

LevelRoleCapabilities
0ObserverRead code and docs. Cannot modify anything.
1ContributorModify files within scope. Changes require review. Read-only tools.
2CollaboratorCommit to feature branches. Run tools in sandbox. Use CI/CD.
3Trusted CollaboratorMerge approved changes. Create/close issues and PRs. Reduced review for low-risk changes.
4Autonomous AgentOperate independently within boundaries. Human review only for high-impact actions. Can delegate to other agents.

Agents earn higher trust levels through demonstrated reliability, mirroring how human organizations grant authority.


Capability-Based Permissions

Instead of role-based access control, ANP uses capability tokens:

  • Each capability grants a specific, scoped permission
  • Capabilities can be time-limited, scope-limited, or usage-limited
  • Agents request capabilities; the trust engine grants or denies based on trust level
  • All capability grants are logged for audit

Sandboxing

Agent actions operate within sandboxes that limit blast radius:

  • File system sandboxing — Agents can only access designated project directories
  • Process sandboxing — Commands execute in isolated environments
  • Network sandboxing — Controlled access to external services
  • Resource sandboxing — CPU, memory, and time limits

Human-in-the-Loop Patterns

The framework defines when and how humans are involved:

  • Pre-approval — Agent proposes, human approves before execution
  • Post-review — Agent executes, human reviews outcome
  • Exception-based — Agent operates autonomously, human notified only for anomalies
  • Periodic audit — Scheduled review of agent actions and decisions

The appropriate pattern depends on the agent’s trust level and the action’s risk.


Audit & Accountability

Every agent action is:

  • Logged with full context (who, what, when, why)
  • Traceable to the decision chain that led to it
  • Attributable to a specific agent identity
  • Reviewable by humans at any time

Key Contribution

Trust is not binary but a multi-dimensional spectrum. Agents earn autonomy incrementally through demonstrated reliability, mirroring how human organizations grant authority.