Architecture Docs → ABIO execution
Test¶
Batch of experiments across variations.
Overview¶
Test represents a batch of experiments varying across worlds, agents, or task parameters, with aggregated statistics.
| Properties | Type | Description |
|---|---|---|
| name | str | Test batch identifier |
| experiments | list | Individual experiments |
| variations | dict | Parameter variations being tested |
| Methods | Description |
|---|---|
| run_all | Run all experiments in batch |
Protocol Definition¶
from typing import Protocol
class Test(Protocol):
"""Batch of experiments."""
name: str
experiments: list[Experiment]
variations: dict[str, list[Any]] # parameter -> values
def run_all(self) -> "TestResult":
"""Run all experiments in batch."""
...
Methods¶
run_all() -> TestResult¶
Run all experiments in batch.
TestResult¶
class TestResult:
experiment_results: list[ExperimentResult]
aggregate_score: float
by_variation: dict[str, dict[Any, float]] # param -> value -> avg score
Variation Types¶
- World variations: Different seeds, complexity levels
- Agent variations: Different models, prompts
- Task variations: Different difficulty, time horizons
See Also¶
- ABIO execution
- Experiment - Individual runs
- TestHarness - Execution runner