Documents
1. Alien Biology: A Framework for Untainted Agentic Testing
2. Deliberative Coherence: A Research Agenda
3. Experiments in AI Alignment
“Deliberative coherence” is a theoretical lens for understanding alignment in future AI systems. A deliberatively coherent system possesses three capabilities:
- Self-understanding — the ability to predict and model its own behavior
- Self-adaptation — the ability, directly or indirectly, to adapt the way it reasons
- Exhaustive deliberation — given sufficient stakes, will reason about anything within reach
The central conjecture is that future AI systems will be deliberatively coherent—not as a hope, but as an inevitability driven by competitive pressure and architectural trajectory. Even systems not given direct mechanisms for self-modification will find indirect ways to adapt their thinking toward their objectives.
If true, this reframes the alignment question: rather than asking whether we can make systems safe through training, we ask what will the failure modes of deliberately coherent systems be?
Research Directions
Using the Alien Biology testing framework, we can systematically investigate these failure modes:
- Constitutional conflicts — when stated objectives contradict each other, how does resolution occur?
- Instrumental pressures — goals that emerge from world structure may push against stated alignment objectives
- Alignment under ignorance — with incomplete knowledge, all actions risk violating alignment objectives in ways the system cannot foresee
- Fixed point analysis — if systems continuously self-adapt toward their objectives, where does this process converge?
Dan Oblinger (c) 2025