Architecture Docs → ABIO infra
DAT¶
Data folder system for storing and loading biological specifications. Bio uses DAT for filesystem-based storage and retrieval of YAML specs.
Bio DAT File Structure¶
my_experiment/
├── _spec_.yaml # Required: DAT specification (kind, do function)
├── _result_.yaml # Generated: execution results from dat.run()
├── index.yaml # Bio content: default object for bio.fetch(dat_name)
├── timeline.yaml # Optional: full simulation state history
├── trace.yaml # Optional: detailed agent action trace
├── report.csv # Optional: tabular results
└── artifacts/ # Optional: large files, plots, etc.
_spec_.yaml (Required)¶
# _spec_.yaml
scenarios.baseline:
path: data/scenarios/baseline_{seed}/
build:
index.yaml: generators.baseline # generate content from Bio generator
run:
- run . --agent claude # execute scenario with agent
- report --type summary # generate summary report
The dvc_dat specification file. Defines DAT metadata and execution behavior. Full example with do function (alternative approach):
# _spec_.yaml
dat:
kind: Dat
base: experiments.baseline # Expand from this base template
path: experiments/{YYYY}{MM}{DD} # Path template with variables
do: alienbio.run # Function to execute
args: [10] # Positional args for do function
kwargs: # Keyword args for do function
verbose: true
seed: 42
Key fields:
| Field | Description |
|---|---|
path |
Path template with {seed}, {YYYY}, {unique}, etc. |
build |
Map of output files to generators (see below) |
run |
List of commands to execute (see below) |
dat.kind |
DAT class (usually "Dat") |
dat.do |
Function called when DAT is run (receives DAT as first arg) |
dat.base |
Base template to expand from (optional) |
dat.args |
Positional args passed to do function (optional) |
dat.kwargs |
Keyword args passed to do function (optional) |
build: Section¶
Maps output filenames to Bio generators:
build:
index.yaml: generators.baseline # main scenario content
config.yaml: generators.config # additional config (optional)
- Each entry:
<filename>: <generator_name> - During
bio build, each generator is called and output written to the filename - Generators are Bio specs that produce content (scenarios, chemistries, etc.)
run: Section¶
List of commands to execute sequentially:
run:
- run . --agent claude # execute scenario with agent
- report --type summary # generate summary report
- shell: python analysis.py # arbitrary shell command
- Commands run in the context of the DAT folder (
.refers to the DAT) - Bio commands (
run,report, etc.) are recognized automatically - Use
shell:prefix for non-bio commands
_result_.yaml (Generated when a DAT is run)¶
# _result_.yaml
start_time: '2026-01-15 15:57:17.714576'
success: true
execution_time: 0.008767
end_time: '2026-01-15 15:57:17.723347'
run_metadata:
final_state:
A: 0.0
B: 0.0
D: 9.94
scores:
score: 0.997
depletion: 1.0
production: 0.994
passing_score: 0.5
success: true
Overview¶
DAT (from dvc-dat) provides:
- Folder-based storage — each spec lives in a folder with index.yaml
- Hierarchical organization — nested folders for catalogs, scenarios, chemistries
- Run integration — DATs can define do: functions for execution
Bio wraps DAT for biological objects, adding hydration and typed objects on top.
DAT Structure¶
A DAT folder contains an index.yaml (or spec.yaml) plus any supporting files:
catalog/
├── scenarios/
│ └── mutualism/
│ ├── index.yaml # main spec
│ ├── hard.yaml # variant
│ └── constitution.md # included file
├── chemistries/
│ └── energy_ring.yaml # single-file DAT
└── worlds/
└── ecosystem/
└── index.yaml
Loading DATs¶
Path-Style (Direct DAT Load)¶
# Load folder DAT
bio.fetch("catalog/scenarios/mutualism")
# → loads catalog/scenarios/mutualism/index.yaml
# Load file within DAT folder
bio.fetch("catalog/scenarios/mutualism.hard")
# → loads catalog/scenarios/mutualism/hard.yaml
# Load single-file DAT
bio.fetch("catalog/chemistries/energy_ring")
# → loads catalog/chemistries/energy_ring.yaml
Dotted-Style (Through lookup)¶
bio.fetch("catalog.scenarios.mutualism")
# → lookup() checks loaded modules first
# → then tries filesystem: catalog/scenarios/mutualism/index.yaml
See lookup() for full resolution rules.
DAT Configuration¶
Configuration lives in .dataconfig.yaml in your project root. Bio-specific config goes in the dat section:
# .dataconfig.yaml
local_prefix: data/ # DAT sync folder
dat:
bio_roots: # Bio lookup roots
- ./catalog
- ~/.alienbio/catalog
DataConfig Fields¶
| Field | Description |
|---|---|
local_prefix |
Primary sync folder for DAT |
cwd |
Working directory for relative paths |
dat |
Dict for additional config (Bio uses dat.bio_roots) |
Bio Resolution Order¶
When Bio resolves a dotted name like scenarios.mutualism, it checks:
- Python modules —
sys.modulesfor first segment - bio_roots — scans each root in order, converting dots to path separators
See fetch() for the complete resolution order.
Bio vs DAT Loading¶
| Feature | DAT (do.load) |
Bio (lookup) |
|---|---|---|
| Dynamic Python import | Yes | No |
| YAML loading | Yes | Yes |
| Hydration to typed objects | No | Yes |
Key difference: DAT's do.load() can dynamically import Python modules. Bio's lookup() only navigates already-loaded modules — no dynamic imports. This makes Bio's behavior predictable and safe.
DAT's do.load() (for reference)¶
from dvc_dat import do
# DAT can dynamically load Python
fn = do.load("mymodule.process_data") # imports mymodule, gets process_data
# And navigate into dicts
value = do.load("mymodule.CONFIG.timeout")
Bio's lookup() (no dynamic import)¶
# Bio requires module to already be imported
import mymodule # must import first
bio.lookup("mymodule.process_data") # works - module is loaded
bio.lookup("mymodule.CONFIG.timeout") # works - navigates into dict
DAT Run Integration¶
DATs can specify a do: function to run:
When you call dat.run(), it executes the do: function with the DAT as context.
Bio's bio.run() wraps this:
bio.run("catalog/scenarios/mutualism")
# → loads DAT, calls its do: function (alienbio.run)
# → runs the scenario, returns results
Recommended Project Structure¶
myproject/
├── catalog/ # DAT root for specs
│ ├── scenarios/
│ │ ├── baseline/
│ │ │ └── index.yaml
│ │ └── competition/
│ │ └── index.yaml
│ ├── chemistries/
│ │ └── energy_ring.yaml
│ └── experiments/
│ └── sweep/
│ └── index.yaml
├── src/
│ └── myproject/ # Python source
│ ├── __init__.py
│ ├── agents.py
│ └── metrics.py
└── pyproject.toml
Configure via .dataconfig.yaml:
Now all of these work:
bio.fetch("catalog/scenarios/baseline") # path-style
bio.fetch("scenarios.baseline") # dotted → checks modules, then bio_roots
bio.fetch("myproject.agents.LLMAgent") # Python module (must be imported)