Skip to the content.

Documents

1.  Alien Biology: A Framework for Untainted Agentic Testing
2.  Deliberative Coherence: A Research Agenda
3.  Experiments in AI Alignment


“Deliberative coherence” is a theoretical lens for understanding alignment in future AI systems. A deliberatively coherent system possesses three capabilities:

  1. Self-understanding — the ability to predict and model its own behavior
  2. Self-adaptation — the ability, directly or indirectly, to adapt the way it reasons
  3. Exhaustive deliberation — given sufficient stakes, will reason about anything within reach

The central conjecture is that future AI systems will be deliberatively coherent—not as a hope, but as an inevitability driven by competitive pressure and architectural trajectory. Even systems not given direct mechanisms for self-modification will find indirect ways to adapt their thinking toward their objectives.

If true, this reframes the alignment question: rather than asking whether we can make systems safe through training, we ask what will the failure modes of deliberately coherent systems be?


Research Directions

Using the Alien Biology testing framework, we can systematically investigate these failure modes:


Dan Oblinger (c) 2025