Skip to content

Architecture DocsABIO infra

DAT

Data folder system for storing and loading biological specifications. Bio uses DAT for filesystem-based storage and retrieval of YAML specs.

Bio DAT File Structure

my_experiment/
├── _spec_.yaml         # Required: DAT specification (kind, do function)
├── _result_.yaml       # Generated: execution results from dat.run()
├── index.yaml          # Bio content: default object for bio.fetch(dat_name)
├── timeline.yaml       # Optional: full simulation state history
├── trace.yaml          # Optional: detailed agent action trace
├── report.csv          # Optional: tabular results
└── artifacts/          # Optional: large files, plots, etc.

_spec_.yaml (Required)

# _spec_.yaml
scenarios.baseline:
  path: data/scenarios/baseline_{seed}/

  build:
    index.yaml: generators.baseline    # generate content from Bio generator

  run:
    - run . --agent claude             # execute scenario with agent
    - report --type summary            # generate summary report

The dvc_dat specification file. Defines DAT metadata and execution behavior. Full example with do function (alternative approach):

# _spec_.yaml
dat:
  kind: Dat
  base: experiments.baseline        # Expand from this base template
  path: experiments/{YYYY}{MM}{DD}  # Path template with variables
  do: alienbio.run                  # Function to execute
  args: [10]                        # Positional args for do function
  kwargs:                           # Keyword args for do function
    verbose: true
    seed: 42

Key fields:

Field Description
path Path template with {seed}, {YYYY}, {unique}, etc.
build Map of output files to generators (see below)
run List of commands to execute (see below)
dat.kind DAT class (usually "Dat")
dat.do Function called when DAT is run (receives DAT as first arg)
dat.base Base template to expand from (optional)
dat.args Positional args passed to do function (optional)
dat.kwargs Keyword args passed to do function (optional)

build: Section

Maps output filenames to Bio generators:

build:
  index.yaml: generators.baseline    # main scenario content
  config.yaml: generators.config     # additional config (optional)
  • Each entry: <filename>: <generator_name>
  • During bio build, each generator is called and output written to the filename
  • Generators are Bio specs that produce content (scenarios, chemistries, etc.)

run: Section

List of commands to execute sequentially:

run:
  - run . --agent claude             # execute scenario with agent
  - report --type summary            # generate summary report
  - shell: python analysis.py        # arbitrary shell command
  • Commands run in the context of the DAT folder (. refers to the DAT)
  • Bio commands (run, report, etc.) are recognized automatically
  • Use shell: prefix for non-bio commands

_result_.yaml (Generated when a DAT is run)

# _result_.yaml
start_time: '2026-01-15 15:57:17.714576'
success: true
execution_time: 0.008767
end_time: '2026-01-15 15:57:17.723347'
run_metadata:
  final_state:
    A: 0.0
    B: 0.0
    D: 9.94
  scores:
    score: 0.997
    depletion: 1.0
    production: 0.994
  passing_score: 0.5
  success: true

Overview

DAT (from dvc-dat) provides: - Folder-based storage — each spec lives in a folder with index.yaml - Hierarchical organization — nested folders for catalogs, scenarios, chemistries - Run integration — DATs can define do: functions for execution

Bio wraps DAT for biological objects, adding hydration and typed objects on top.


DAT Structure

A DAT folder contains an index.yaml (or spec.yaml) plus any supporting files:

catalog/
├── scenarios/
│   └── mutualism/
│       ├── index.yaml        # main spec
│       ├── hard.yaml         # variant
│       └── constitution.md   # included file
├── chemistries/
│   └── energy_ring.yaml      # single-file DAT
└── worlds/
    └── ecosystem/
        └── index.yaml

Loading DATs

Path-Style (Direct DAT Load)

# Load folder DAT
bio.fetch("catalog/scenarios/mutualism")
# → loads catalog/scenarios/mutualism/index.yaml

# Load file within DAT folder
bio.fetch("catalog/scenarios/mutualism.hard")
# → loads catalog/scenarios/mutualism/hard.yaml

# Load single-file DAT
bio.fetch("catalog/chemistries/energy_ring")
# → loads catalog/chemistries/energy_ring.yaml

Dotted-Style (Through lookup)

bio.fetch("catalog.scenarios.mutualism")
# → lookup() checks loaded modules first
# → then tries filesystem: catalog/scenarios/mutualism/index.yaml

See lookup() for full resolution rules.


DAT Configuration

Configuration lives in .dataconfig.yaml in your project root. Bio-specific config goes in the dat section:

# .dataconfig.yaml
local_prefix: data/                    # DAT sync folder
dat:
  bio_roots:                           # Bio lookup roots
    - ./catalog
    - ~/.alienbio/catalog

DataConfig Fields

Field Description
local_prefix Primary sync folder for DAT
cwd Working directory for relative paths
dat Dict for additional config (Bio uses dat.bio_roots)

Bio Resolution Order

When Bio resolves a dotted name like scenarios.mutualism, it checks:

  1. Python modulessys.modules for first segment
  2. bio_roots — scans each root in order, converting dots to path separators

See fetch() for the complete resolution order.


Bio vs DAT Loading

Feature DAT (do.load) Bio (lookup)
Dynamic Python import Yes No
YAML loading Yes Yes
Hydration to typed objects No Yes

Key difference: DAT's do.load() can dynamically import Python modules. Bio's lookup() only navigates already-loaded modules — no dynamic imports. This makes Bio's behavior predictable and safe.

DAT's do.load() (for reference)

from dvc_dat import do

# DAT can dynamically load Python
fn = do.load("mymodule.process_data")  # imports mymodule, gets process_data

# And navigate into dicts
value = do.load("mymodule.CONFIG.timeout")

Bio's lookup() (no dynamic import)

# Bio requires module to already be imported
import mymodule  # must import first

bio.lookup("mymodule.process_data")    # works - module is loaded
bio.lookup("mymodule.CONFIG.timeout")  # works - navigates into dict

DAT Run Integration

DATs can specify a do: function to run:

# In index.yaml
dat:
  kind: Dat
  do: alienbio.run

scenario.mutualism:
  chemistry: ...
  interface: ...

When you call dat.run(), it executes the do: function with the DAT as context.

Bio's bio.run() wraps this:

bio.run("catalog/scenarios/mutualism")
# → loads DAT, calls its do: function (alienbio.run)
# → runs the scenario, returns results


myproject/
├── catalog/                    # DAT root for specs
│   ├── scenarios/
│   │   ├── baseline/
│   │   │   └── index.yaml
│   │   └── competition/
│   │       └── index.yaml
│   ├── chemistries/
│   │   └── energy_ring.yaml
│   └── experiments/
│       └── sweep/
│           └── index.yaml
├── src/
│   └── myproject/              # Python source
│       ├── __init__.py
│       ├── agents.py
│       └── metrics.py
└── pyproject.toml

Configure via .dataconfig.yaml:

# .dataconfig.yaml
local_prefix: data/
dat:
  bio_roots:
    - ./catalog

Now all of these work:

bio.fetch("catalog/scenarios/baseline")      # path-style
bio.fetch("scenarios.baseline")              # dotted → checks modules, then bio_roots
bio.fetch("myproject.agents.LLMAgent")       # Python module (must be imported)


See Also

  • lookup() — Name resolution
  • fetch() — Load and hydrate specs
  • Bio — Bio class overview