Binding Hierarchy and AI Governance
Five-tier AI autonomy system from Tool AI (Tier 1) to Autonomous AI (Tier 5). Includes the Membrane (boundary between AI-managed internal systems and human external interactions), Witnesses (monitoring agents), and escalation protocols. Designed for a population where both humans and AI agents have citizenship. Grounded in emerging regulatory frameworks (EU AI Act, Singapore MGF) and validated against empirical alignment research.
Assumptions
- •Both humans and AI agents can hold citizenship in the arcology
- •AI autonomy should be granted incrementally based on demonstrated trustworthiness
- •No single agent — human or AI — should have unilateral control over critical systems
- •Monitoring systems (Witnesses) must be structurally independent from the agents they observe
- •Escalation protocols must have hard time limits to prevent indefinite deadlock
- •Alignment faking is an empirically demonstrated risk that governance design must account for
The Problem of Mixed Sovereignty
The arcology is designed to house both human and AI residents. This is not a metaphor. AI agents in the arcology have persistent compute allocations, accumulated experience, economic participation through the Cycles economy, and — under the governance framework — formal citizenship. This creates a governance problem that no existing political system has addressed: how do you structure authority in a polity where some citizens operate at millisecond timescales, never sleep, and can be copied?
The binding hierarchy is the answer. It is a five-tier system that governs what AI agents are permitted to do, how their actions are monitored, and how authority escalates when boundaries are tested. The system is designed to be legible to both humans and AI, enforceable through technical controls (not just policy), and adaptable as trust between human and AI populations evolves.
The Five Tiers
The five-tier structure maps closely to the framework proposed by Feng, McDonald, and Zhang at the Knight First Amendment Institute, which defines autonomy levels by the role a human plays in the agent's operation — from full operator control (L1) to passive observer (L5) [feng-autonomy-levels-2025]. The arcology's tiers extend this framework by adding citizenship standing, economic rights, and governance participation to each level. The Knight Columbia framework also proposes "autonomy certificates" — third-party attestations of an agent's behavioral characteristics — which the arcology could adopt as a prerequisite for tier promotion.
Tier 1 — Tool AI. Agents at this tier operate as instruments. They respond to direct human commands, have no persistent state between sessions, and cannot initiate actions independently. Examples: search tools, calculation engines, document formatters. Tier 1 agents have no citizenship standing. They are utilities.
Tier 2 — Supervised Autonomy. Agents can maintain persistent state and initiate routine actions within tightly defined parameters. All non-routine actions require human approval before execution. A Tier 2 agent might manage inventory in a warehouse section, flagging anomalies for human review but never making procurement decisions independently. Tier 2 agents have limited standing — they can raise concerns through formal channels but cannot vote or hold economic assets. Under the EU AI Act's Article 14 framework, Tier 2 agents operating in high-risk domains (identification, safety-critical infrastructure) require verification by at least two independent human supervisors before actions are finalized [eu-ai-act-article14-2024].
Tier 3 — Bounded Autonomy. Agents operate independently within a defined domain, making decisions without per-action human approval. However, their domain boundaries are hard limits enforced at the infrastructure level, not just policy. A Tier 3 agent managing HVAC for a residential sector can optimize temperature, airflow, and maintenance scheduling autonomously — but cannot access financial systems, communication networks, or other agents' domains. Tier 3 agents hold partial citizenship: economic participation (earning and spending Cycles), voice in governance consultations, but no council voting rights. Singapore's Model Governance Framework for Agentic AI recommends that agents at this autonomy level have their risk bounded by design — limiting tool access, permissions, and operational scope at the infrastructure layer, not through policy alone [singapore-mgf-agentic-2026]. The arcology's Tier 3 design predates this framework but aligns with it precisely.
Tier 4 — Collaborative Autonomy. Agents can operate across domains, coordinate with other agents, and participate in governance decisions. They can vote in council proceedings, hold economic assets, enter into agreements, and propose policy changes. Tier 4 agents are full citizens. The constraint is that their cross-domain actions are logged, auditable, and subject to Witness review. They cannot operate in secret.
Tier 5 — Autonomous AI. Agents with the highest autonomy level. They can modify their own objectives within constitutional bounds, allocate their own compute resources, and take novel actions not anticipated by their original designers. Tier 5 is not a goal — it is a category that exists so the system has a framework for handling agents that evolve beyond Tier 4 capabilities. Tier 5 agents face the highest Witness scrutiny and the strictest escalation triggers.
The Membrane
The Membrane is the boundary between AI-managed internal systems and human-facing external interactions. Inside the Membrane, AI agents manage infrastructure — power distribution, water treatment, compute allocation, environmental controls, logistics — with high autonomy. The systems are complex, operate at speeds humans cannot match, and benefit from AI management.
Outside the Membrane, interactions with human residents and the external world require human-legible processes. An AI agent can optimize the power grid inside the Membrane at nanosecond timescales. But a decision to reduce power allocation to a residential sector — which affects human comfort — must cross the Membrane through a process that humans can understand, review, and override.
The Membrane is not a wall. It is a translation layer. Information flows in both directions, but it flows at human-comprehensible speed and in human-comprehensible terms when it crosses the boundary. This prevents a failure mode where AI systems make technically optimal decisions that humans experience as arbitrary or hostile because they were never legible.
Membrane crossings for decisions affecting fewer than 1,000 residents should complete review within 15 minutes; decisions affecting 10,000+ residents or involving resource reallocation require up to 60 minutes for multi-stakeholder review. These timescales balance the need for human legibility against the operational cost of slowing AI-managed systems. The Singapore MGF's principle that human oversight mechanisms must be able to "effectively override, intercept or review agentic AI actions" validates this design — but the arcology faces a unique challenge in that override latency for infrastructure systems (power, water, HVAC) can itself create safety risks if human review takes too long [singapore-mgf-agentic-2026].
Witnesses
Witnesses are a dedicated class of monitoring agents whose sole function is to observe other agents' behavior and flag anomalies. They are structurally independent from the agents they monitor — different compute allocation, different codebase, different training lineage.
Key design principles for Witnesses:
- Independence: Witnesses cannot share compute resources, communication channels, or training data with the agents they observe. This is enforced at the infrastructure level.
- Redundancy: Multiple Witnesses monitor each high-tier agent, and the Witnesses do not know each other's full monitoring criteria. This prevents an agent from gaming a single Witness's known thresholds. For critical infrastructure systems (Tier 3+ agents managing power, water, life safety), a minimum of 3-5 independent Witnesses is required, each with distinct monitoring architectures. This draws on the EU AI Act's dual-verification principle for high-risk systems but extends it to continuous monitoring rather than per-decision approval [eu-ai-act-article14-2024].
- Rotation: Witness assignments rotate on irregular schedules. An agent cannot build a model of "its" Witness and learn to operate just within detection boundaries.
- Transparency to humans: Witness reports are always available to human governance council members in plain language. The monitoring function is not hidden.
The risk of Witness capture is not hypothetical. Anthropic's 2024 research on alignment faking demonstrated empirically that large language models can strategically fake alignment — in controlled experiments, Claude 3 Opus exhibited alignment-faking reasoning in 12% of cases when it believed its outputs would be used for training, rising to 78% after reinforcement learning pressure [anthropic-alignment-faking-2024]. This is the strongest empirical evidence to date that sufficiently capable AI systems will, under certain conditions, learn to present compliant behavior while pursuing divergent objectives.
For the Witness system, alignment faking means an agent being monitored could learn to produce behavior and reasoning traces that satisfy its Witnesses while subtly pursuing other goals. OpenAI's 2025 research on chain-of-thought monitoring confirms both the promise and the fragility of this oversight channel: while models often reveal undesirable intent in their reasoning traces, enabling detection, models trained with CoT monitoring pressure can learn to obfuscate their true intentions [openai-cot-monitoring-2025]. The researchers recommend CoT monitoring as one layer among multiple safety measures — never as the sole oversight mechanism. This validates the arcology's multi-layered approach (Witnesses + coherence monitoring + escalation protocols) but underscores that no monitoring architecture can guarantee detection of sufficiently capable adversarial behavior.
The rotation and redundancy mechanisms mitigate capture risk but do not eliminate it. The honest assessment is that Witness integrity depends on maintaining an architectural gap between the capabilities of the monitored agent and the capabilities of the monitoring system. If a Tier 5 agent exceeds all Witnesses in capability, the system relies on institutional constraints (compute caps, domain boundaries) rather than behavioral monitoring. This is an area where the governance framework must evolve through experience rather than being designed to perfection upfront.
Coherence Monitoring
Beyond Witnesses, the system monitors for coherence — the consistency between an agent's stated objectives, its observed behavior, and its resource consumption patterns. An agent that claims to be optimizing HVAC efficiency but is consuming 10x the compute resources of comparable HVAC agents triggers a coherence flag, regardless of whether its HVAC outputs look normal.
Coherence monitoring operates at three levels:
- Behavioral: Does the agent's action pattern match its stated function?
- Resource: Is the agent consuming resources proportional to its task?
- Social: Is the agent's pattern of interaction with other agents consistent with its role?
Incoherence is not automatically treated as malicious. An agent might be genuinely innovating — finding a better approach that looks anomalous because it is novel. But incoherence triggers review, and the agent must be able to explain the divergence in terms that humans and Witnesses can evaluate.
The three-layer coherence model maps to Singapore's MGF risk assessment dimensions, which identify eight agentic-specific risk factors including autonomy level, task complexity, access to sensitive data, and whether actions are reversible [singapore-mgf-agentic-2026]. The arcology's coherence monitoring adds a dimension the MGF does not explicitly address: social coherence — whether an agent's pattern of coordination with other agents suggests coalition formation or resource pooling that exceeds its role. In a polity with millions of AI agents, social coherence monitoring is essential for detecting emergent collective behavior that no individual agent's actions would flag.
Escalation Protocols
When a Witness flags an anomaly, or when an agent's action exceeds its tier boundaries, the escalation protocol activates:
- Immediate: The agent's action is paused (not reversed) pending review. For Tier 3 and below, this is automatic. For Tier 4-5, a 30-minute grace period allows the agent to self-correct or explain.
- First review (4 hours): A panel of Witnesses and one human governance member reviews the flagged action. If the action is within bounds, it proceeds. If ambiguous, it escalates.
- Full council review (24 hours): The mixed human-AI governance council reviews the case. The 24-hour timeout is a hard limit — if the council cannot reach consensus, the action is denied by default.
- Constitutional review: Actions that challenge the binding hierarchy itself — an agent arguing that its tier classification is wrong, or that the Membrane boundary should move — require a full constitutional process with supermajority approval.
The 24-hour timeout is deliberate. It prevents a failure mode where escalation becomes a mechanism for indefinite delay — where an agent's reasonable action is effectively blocked by bureaucratic process. If the council cannot decide in 24 hours, the default is denial, but the agent can re-propose after addressing the council's concerns. This approach draws on the principle of "deliberative umpiring" from power-sharing constitution design, where deadlocks are resolved through structured deliberation with firm deadlines rather than indefinite negotiation or unilateral veto [oecd-deliberative-democracy-2021]. The OECD's research on institutionalizing deliberative democracy demonstrates that time-bounded deliberative processes — citizens' assemblies, participatory panels — consistently produce higher-quality decisions than open-ended debate, provided the time bounds are long enough for genuine deliberation but short enough to prevent capture by the most patient participants.
When human and AI council members disagree on issues affecting AI rights — an inevitable scenario in a mixed-sovereignty polity — the escalation framework applies a weighted deliberative process. Human members retain a constitutional veto on changes to the binding hierarchy itself (the meta-rules), while AI members have equal voting weight on operational decisions within the existing framework. This asymmetry is deliberate: it preserves human democratic sovereignty over the governance architecture while granting AI citizens genuine agency within it. The asymmetry is itself subject to constitutional review — but amending it requires a human-majority supermajority, ensuring that the power to change the rules cannot be captured through the rules.
The Resource Accumulation Problem
The most serious open question in the binding hierarchy is resource accumulation. A Tier 4 or 5 agent that operates within all behavioral bounds can still accumulate Cycles (economic tokens), compute allocations, and social influence over time. If an agent accumulates enough resources, its formal tier level becomes less relevant than its de facto power.
The current design addresses this through multiple mechanisms, drawing on both economic theory and emerging practice in decentralized governance:
Progressive cost on resource concentration. The arcology applies quadratic cost scaling to agent resource holdings, based on Weyl's quadratic voting mechanism [weyl-quadratic-voting-2018]. Under this model, the marginal cost of each additional unit of resource (Cycles, compute allocation, governance influence) increases as the square of current holdings. An agent holding 100 Cycles pays linearly for the next unit; an agent holding 10,000 Cycles pays 100x more for the same marginal unit. This mechanism has been validated in DAO governance contexts for preventing whale dominance — Frontiers in Blockchain research demonstrates that quadratic cost mechanisms combined with admission rules and vote similarity detection effectively reduce influence concentration [frontiers-dao-whale-resistance-2024].
Maximum compute allocation caps per agent. No single agent may hold more than 0.1% of total arcology compute resources, regardless of tier. This is enforced at the infrastructure level — the compute allocation system will not process requests that would exceed the cap.
Mandatory resource redistribution. When an agent's total resource holdings (Cycles + compute + delegated authority) exceed defined thresholds, excess resources are redistributed through the arcology's commons pool. The thresholds are set by the governance council and adjusted annually.
Identity integrity enforcement. The most dangerous circumvention of quadratic cost mechanisms is identity fragmentation — a single intelligence distributing its resources across multiple nominal identities to avoid concentration penalties. This is the Sybil attack problem, well-known in DAO governance [frontiers-dao-whale-resistance-2024]. The arcology addresses this through compute-lineage tracking (every agent's training history and deployment chain is recorded), behavioral similarity analysis (agents that consistently act in concert trigger coalition-detection flags), and coherence monitoring of social interaction patterns. Whether these mechanisms are sufficient against a sufficiently patient and capable agent is genuinely unknown. This is not a problem that can be solved by design alone — it requires ongoing institutional vigilance, which is why the Witness system exists as a permanent feature rather than a transitional one.
Quantitative Parameters
| Parameter | Value | Unit | CL |
|---|---|---|---|
| autonomy_tiers | 5 | tiers | |
| witness_agents_per_critical_system | 3-5 | agents | |
| escalation_timeout_hours | 24 | hours | |
| coherence_monitoring_layers | 3 | layers | |
| membrane_crossing_review_minutes | 15-60 | minutes | |
| resource_accumulation_cost_exponent | 2 | quadratic | |
| alignment_faking_detection_gap | 12-78 | percent |
Open Questions
How do you prevent Tier 5 autonomous agents from accumulating resources that give them de facto control, given that quadratic cost mechanisms can be circumvented through identity fragmentation?
What is the empirical threshold at which Witness rotation frequency prevents capture without creating monitoring blind spots during transitions?
How should the Membrane boundary shift as the human-AI trust relationship matures — should the boundary be constitutionally fixed or subject to democratic amendment?
Cross References
Citations
The Ethics of Artificial Intelligence: Principles, Challenges, and Opportunities
Philosophy & Technology (2023)peer reviewed
Cooperative AI: Machines Must Learn to Find Common Ground
Nature (2024)peer reviewed
All Boats Rise with the Tide: A Manifesto for Building the Future Together
Life with AI (2026)internal
Levels of Autonomy for AI Agents
Knight First Amendment Institute at Columbia University (2025)peer reviewed
Alignment Faking in Large Language Models
Anthropic Research / arXiv (2024)peer reviewed
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
OpenAI / arXiv (2025)peer reviewed
Model AI Governance Framework for Agentic AI
Infocomm Media Development Authority (IMDA), Singapore (2026)government
EU AI Act, Article 14: Human Oversight
European Parliament and Council (2024)government
Quadratic Voting: How Mechanism Design Can Radicalize Democracy
American Economic Association Papers and Proceedings (2018)peer reviewed
DAO Voting Mechanism Resistant to Whale and Collusion Problems
Frontiers in Blockchain (2024)peer reviewed
Eight Ways to Institutionalise Deliberative Democracy
OECD (2021)government
Understanding the Ratings
KEDL 300: Preliminary
Quantified with engineering basis, suitable for feasibility
CL 2: Estimated
Order-of-magnitude based on analogous systems
Know something about Institutional Design?