Trust, Safety & Governance
Overview
The central tension in Agent-Native Programming is between autonomy and control. Agents are most valuable when they can act independently, but unconstrained autonomy carries real risks: bugs, deleted code, exposed secrets, or poor architectural decisions.
This framework manages that tension with a system where risks are proportional to safeguards and trust is earned incrementally.
Graduated Trust Model
Rather than binary trusted/untrusted, ANP uses a five-level graduated trust model:
Trust Levels
| Level | Role | Capabilities |
|---|---|---|
| 0 | Observer | Read code and docs. Cannot modify anything. |
| 1 | Contributor | Modify files within scope. Changes require review. Read-only tools. |
| 2 | Collaborator | Commit to feature branches. Run tools in sandbox. Use CI/CD. |
| 3 | Trusted Collaborator | Merge approved changes. Create/close issues and PRs. Reduced review for low-risk changes. |
| 4 | Autonomous Agent | Operate independently within boundaries. Human review only for high-impact actions. Can delegate to other agents. |
Agents earn higher trust levels through demonstrated reliability, mirroring how human organizations grant authority.
Capability-Based Permissions
Instead of role-based access control, ANP uses capability tokens:
- Each capability grants a specific, scoped permission
- Capabilities can be time-limited, scope-limited, or usage-limited
- Agents request capabilities; the trust engine grants or denies based on trust level
- All capability grants are logged for audit
Sandboxing
Agent actions operate within sandboxes that limit blast radius:
- File system sandboxing — Agents can only access designated project directories
- Process sandboxing — Commands execute in isolated environments
- Network sandboxing — Controlled access to external services
- Resource sandboxing — CPU, memory, and time limits
Human-in-the-Loop Patterns
The framework defines when and how humans are involved:
- Pre-approval — Agent proposes, human approves before execution
- Post-review — Agent executes, human reviews outcome
- Exception-based — Agent operates autonomously, human notified only for anomalies
- Periodic audit — Scheduled review of agent actions and decisions
The appropriate pattern depends on the agent’s trust level and the action’s risk.
Audit & Accountability
Every agent action is:
- Logged with full context (who, what, when, why)
- Traceable to the decision chain that led to it
- Attributable to a specific agent identity
- Reviewable by humans at any time
Key Contribution
Trust is not binary but a multi-dimensional spectrum. Agents earn autonomy incrementally through demonstrated reliability, mirroring how human organizations grant authority.