Skip to the content.

An AI system is deliberatively coherent (DC) if it uses deliberation to understand and adjust its own reasoning and behavior to align it with some alignment constitution.

We conjecture all future AI systems will inevitably be deliberatively coherent — driven by competitive pressure, architectural trajectory, and the inseparability of general reasoning from self-reasoning. If true, this reframes the alignment problem: from making and testing safe systems, to understanding how and when DC systems will be coherent and what their failure modes are.

Approach "Tricks"

  1. We use "Alien Biology" — a synthetically-constructed high-performance molecule-to-ecosystem multi-level simulation that (a) is unknown to the AI, while we know the constructed ground truth. Thus, we can provably isolate deliberation from memorization (b) is realistic and natural since it is modelled after statistics and patterns drawn from Earth biology (c) has parametrically tuned complexity so we can measure alignment degradation over range of controlled conditions
  2. Move the Mountain to Mohammad — In order to study the general nature of deliberative coherence we must test it over a wide range of conditions, but training thousands of novel AI systems under varying conditions is not practical. Instead we take a single AI system and vary the universe around it! We can make more actions irreversible; does the AI learn to take care? We can vary the pressure between the AI base performance (on curiosity or veracity checking) against the alignment consequences following from those behaviors. Does the AI adapt?
Alien Biology
white paper code demos
Deliberative Coherence
research agenda experiment plan

Why This?

  • Provably separates reasoning from remembering.
  • Observable ground truth hidden from agent.
  • Detects all alignment failures, not just an enumerated list.
  • Realistic / Hard Cases.
    • Generated tasks require minutes to years of human effort.
    • Wide range of challenge dimensions: action-reversibility, effect visibility, distractor complexity, instrumental cross-purpose, constitutional conflicts, etc.

Research Directions

We believe the controllability of Alien Biology allows it to be a fertile platform for a very wide range of alignment testing. Here is a sampling of interesting research directions:


Dan Oblinger (c) 2025