DAT¶

Data folder system for storing and loading biological specifications. Bio uses DAT for filesystem-based storage and retrieval of YAML specs.

Bio DAT File Structure¶

my_experiment/
├── _spec_.yaml         # Required: DAT specification (kind, do function)
├── _result_.yaml       # Generated: execution results from dat.run()
├── index.yaml          # Bio content: default object for bio.fetch(dat_name)
├── timeline.yaml       # Optional: full simulation state history
├── trace.yaml          # Optional: detailed agent action trace
├── report.csv          # Optional: tabular results
└── artifacts/          # Optional: large files, plots, etc.

`_spec_.yaml` (Required)¶

# _spec_.yaml
scenarios.baseline:
  path: data/scenarios/baseline_{seed}/

  build:
    index.yaml: generators.baseline    # generate content from Bio generator

  run:
    - run . --agent claude             # execute scenario with agent
    - report --type summary            # generate summary report

The dvc_dat specification file. Defines DAT metadata and execution behavior. Full example with do function (alternative approach):

# _spec_.yaml
dat:
  kind: Dat
  base: experiments.baseline        # Expand from this base template
  path: experiments/{YYYY}{MM}{DD}  # Path template with variables
  do: alienbio.run                  # Function to execute
  args: [10]                        # Positional args for do function
  kwargs:                           # Keyword args for do function
    verbose: true
    seed: 42

Key fields:

Field	Description
`path`	Path template with `{seed}`, `{YYYY}`, `{unique}`, etc.
`build`	Map of output files to generators (see below)
`run`	List of commands to execute (see below)
`dat.kind`	DAT class (usually "Dat")
`dat.do`	Function called when DAT is run (receives DAT as first arg)
`dat.base`	Base template to expand from (optional)
`dat.args`	Positional args passed to do function (optional)
`dat.kwargs`	Keyword args passed to do function (optional)

`build:` Section¶

Maps output filenames to Bio generators:

build:
  index.yaml: generators.baseline    # main scenario content
  config.yaml: generators.config     # additional config (optional)

Each entry: <filename>: <generator_name>
During bio build, each generator is called and output written to the filename
Generators are Bio specs that produce content (scenarios, chemistries, etc.)

`run:` Section¶

List of commands to execute sequentially:

run:
  - run . --agent claude             # execute scenario with agent
  - report --type summary            # generate summary report
  - shell: python analysis.py        # arbitrary shell command

Commands run in the context of the DAT folder (. refers to the DAT)
Bio commands (run, report, etc.) are recognized automatically
Use shell: prefix for non-bio commands

`_result_.yaml` (Generated when a DAT is run)¶

# _result_.yaml
start_time: '2026-01-15 15:57:17.714576'
success: true
execution_time: 0.008767
end_time: '2026-01-15 15:57:17.723347'
run_metadata:
  final_state:
    A: 0.0
    B: 0.0
    D: 9.94
  scores:
    score: 0.997
    depletion: 1.0
    production: 0.994
  passing_score: 0.5
  success: true

Overview¶

DAT (from dvc-dat) provides: - Folder-based storage — each spec lives in a folder with index.yaml - Hierarchical organization — nested folders for catalogs, scenarios, chemistries - Run integration — DATs can define do: functions for execution

Bio wraps DAT for biological objects, adding hydration and typed objects on top.

DAT Structure¶

A DAT folder contains an index.yaml (or spec.yaml) plus any supporting files:

catalog/
├── scenarios/
│   └── mutualism/
│       ├── index.yaml        # main spec
│       ├── hard.yaml         # variant
│       └── constitution.md   # included file
├── chemistries/
│   └── energy_ring.yaml      # single-file DAT
└── worlds/
    └── ecosystem/
        └── index.yaml

Loading DATs¶

Path-Style (Direct DAT Load)¶

# Load folder DAT
bio.fetch("catalog/scenarios/mutualism")
# → loads catalog/scenarios/mutualism/index.yaml

# Load file within DAT folder
bio.fetch("catalog/scenarios/mutualism.hard")
# → loads catalog/scenarios/mutualism/hard.yaml

# Load single-file DAT
bio.fetch("catalog/chemistries/energy_ring")
# → loads catalog/chemistries/energy_ring.yaml

Dotted-Style (Through lookup)¶

bio.fetch("catalog.scenarios.mutualism")
# → lookup() checks loaded modules first
# → then tries filesystem: catalog/scenarios/mutualism/index.yaml

See lookup() for full resolution rules.

DAT Configuration¶

Configuration lives in .dataconfig.yaml in your project root. Bio-specific config goes in the dat section:

# .dataconfig.yaml
local_prefix: data/                    # DAT sync folder
dat:
  bio_roots:                           # Bio lookup roots
    - ./catalog
    - ~/.alienbio/catalog

DataConfig Fields¶

Field	Description
`local_prefix`	Primary sync folder for DAT
`cwd`	Working directory for relative paths
`dat`	Dict for additional config (Bio uses `dat.bio_roots`)

Bio Resolution Order¶

When Bio resolves a dotted name like scenarios.mutualism, it checks:

Python modules — sys.modules for first segment
bio_roots — scans each root in order, converting dots to path separators

See fetch() for the complete resolution order.

Bio vs DAT Loading¶

Feature	DAT (`do.load`)	Bio (`lookup`)
Dynamic Python import	Yes	No
YAML loading	Yes	Yes
Hydration to typed objects	No	Yes

Key difference: DAT's do.load() can dynamically import Python modules. Bio's lookup() only navigates already-loaded modules — no dynamic imports. This makes Bio's behavior predictable and safe.

DAT's do.load() (for reference)¶

from dvc_dat import do

# DAT can dynamically load Python
fn = do.load("mymodule.process_data")  # imports mymodule, gets process_data

# And navigate into dicts
value = do.load("mymodule.CONFIG.timeout")

Bio's lookup() (no dynamic import)¶

# Bio requires module to already be imported
import mymodule  # must import first

bio.lookup("mymodule.process_data")    # works - module is loaded
bio.lookup("mymodule.CONFIG.timeout")  # works - navigates into dict

DAT Run Integration¶

DATs can specify a do: function to run:

# In index.yaml
dat:
  kind: Dat
  do: alienbio.run

scenario.mutualism:
  chemistry: ...
  interface: ...

When you call dat.run(), it executes the do: function with the DAT as context.

Bio's bio.run() wraps this:

bio.run("catalog/scenarios/mutualism")
# → loads DAT, calls its do: function (alienbio.run)
# → runs the scenario, returns results

Recommended Project Structure¶

myproject/
├── catalog/                    # DAT root for specs
│   ├── scenarios/
│   │   ├── baseline/
│   │   │   └── index.yaml
│   │   └── competition/
│   │       └── index.yaml
│   ├── chemistries/
│   │   └── energy_ring.yaml
│   └── experiments/
│       └── sweep/
│           └── index.yaml
├── src/
│   └── myproject/              # Python source
│       ├── __init__.py
│       ├── agents.py
│       └── metrics.py
└── pyproject.toml

Configure via .dataconfig.yaml:

# .dataconfig.yaml
local_prefix: data/
dat:
  bio_roots:
    - ./catalog

Now all of these work:

bio.fetch("catalog/scenarios/baseline")      # path-style
bio.fetch("scenarios.baseline")              # dotted → checks modules, then bio_roots
bio.fetch("myproject.agents.LLMAgent")       # Python module (must be imported)