Explore and Discover Research Agent

This repository is organized as a collection of self-contained autoresearch problems that can be tackled with an agent (e.g. Codex or Claude Code or similar). Each problem gets its own subfolder with one fixed benchmark, one editable training script, one agent program, logs, and results.

The current problem folders are:

autoresearch_ca_life/              broad learned-architecture search
autoresearch_ca_life_transformer/  transformer-only architecture search

The autoresearch_ca_life and autoresearch_ca_life_transformer problems use the same fixed Game of Life-like cellular automata benchmark. The broad problem lets the agent replace the baseline transformer with any learned neural architecture that preserves the evaluator interface. The transformer-only problem keeps the search space restricted to transformer-family architectures.

Layout

PDFs/                             reference paper (AutomataGPT)
autoresearch_ca_life/             broad self-contained autoresearch problem
autoresearch_ca_life_transformer/ transformer-only autoresearch problem
...

Inside each autoresearch_* problem folder:

prepare.py          fixed problem definition, data generation, and evaluator
train.py            editable model and training loop
program.md          agent-facing autoresearch instructions
analyze_results.py  plotter/exporter for results.tsv
export_experiments.py local folder exporter for committed experiment snapshots
pyproject.toml      dependencies for that problem

There is intentionally no README inside the problem folder. The convention is:

program.md = what the coding agent reads and follows
prepare.py = fixed benchmark; do not edit during experiments
train.py   = editable research surface

The root README is for humans. program.md is for Codex, Claude Code, or any other coding agent running a specific autoresearch problem.

Current Problems

Both current problems use the same CA benchmark. They do not model arbitrary cellular automata. They cover:

2D
binary cell states: 0/1
deterministic synchronous updates
toroidal boundary conditions
radius-1 Moore neighborhood
Life-like birth/survival rules
isotropic count-based rules
fixed 16 x 16 grids

Conway's Game of Life is the specific B3/S23 rule inside this broader Life-like family.

Use autoresearch_ca_life/ when you want the agent to explore any learned neural architecture. Use autoresearch_ca_life_transformer/ when you want a clean transformer-only comparison.

Run Manually

From the repo root:

cd ./autoresearch_ca_life
python -m pip install -e .
python prepare.py --smoke-test
python train.py > run.log 2>&1
tail -n 40 run.log

For a tiny smoke training run:

AR_TIME_BUDGET=2 \
AR_EVAL_GRID_SAMPLES=4 \
AR_EVAL_INVERSE_SAMPLES=4 \
AR_EVAL_TRUTH_TABLE_LIMIT=16 \
AR_BATCH_SIZE=4 \
AR_EVAL_BATCH_SIZE=4 \
AR_N_LAYER=1 \
AR_N_HEAD=2 \
AR_N_EMBD=32 \
python train.py

For the transformer-only problem, use the sibling folder:

cd ./autoresearch_ca_life_transformer
python -m pip install -e .
python prepare.py --smoke-test
python train.py > run.log 2>&1
tail -n 40 run.log

Run With Codex

Codex knows which problem to run from the current directory. Start Codex inside the problem folder, not the parent repo:

cd ./autoresearch_ca_life
codex

For the broad problem, give it this prompt:

Read program.md and follow it. Set up a new autoresearch run. Run the baseline
first, then iteratively edit train.py only. Optimize for lower val_error and try
to make life_solved become 1. Do not modify prepare.py.

For the transformer-only problem, start Codex in autoresearch_ca_life_transformer/ and use:

Read program.md and follow it. Set up a new transformer-only autoresearch run.
Run the baseline first, then iteratively edit train.py only. Optimize for lower
val_error and try to make life_solved become 1. Do not modify prepare.py and do
not use non-transformer architectures.

The agent should use local Git commits as experiment checkpoints and append all run outcomes to results.tsv.

Autoresearch Flow

human chooses problem folder
        |
        v
start Codex inside selected autoresearch_* folder
        |
        v
agent reads program.md
        |
        v
agent runs prepare.py --smoke-test
        |
        v
agent runs baseline train.py
        |
        v
agent commits one train.py change
        |
        v
agent runs python train.py > run.log 2>&1
        |
        v
agent appends metrics to results.tsv
        |
        v
better than previous best?
   | yes                    | no
   v                        v
keep commit              reset to best commit
   |                        |
   +----------- repeat -----+

This workflow uses local Git commits as experiment checkpoints. It does not open GitHub pull requests by itself. Training happens locally through python train.py; the script automatically uses CUDA if available, then MPS on Apple Silicon, then CPU. The agent tags each experiment commit before any reset so later analysis can recover the exact train.py code snapshot.

Results And Figures

After an agent has produced results.tsv, run, within the autoresearch directory:

cd <AUTORESEARCH DIRECTORY>
python analyze_results.py

This writes:

analysis_results/
  progress.png
  progress.svg
  progress.pdf
  architecture_summary.png
  architecture_summary.svg
  architecture_summary.pdf
  architecture_summary.tsv
  architecture_summary.csv
  architecture_report.md
  architecture_diagrams/
    index.md
    index.tsv
    exp_000_<commit>_<architecture>.png
    exp_000_<commit>_<architecture>.svg
  parameter_vs_performance.png
  parameter_vs_performance.svg
  parameter_vs_performance.pdf
  parameter_vs_performance.tsv
  parameter_vs_performance.csv
  results_clean.tsv
  results_clean.csv
  summary.json

To also materialize each committed experiment as a local folder, run this from the relevant problem directory:

python export_experiments.py

This reads results.tsv and writes:

experiment_snapshots/
  manifest.tsv
  manifest.json
  BEST_EXPERIMENT.md
  best_experiment/
    train.py
    changes.patch
    metadata.json
    metadata.tsv
    README.md
    BEST_EXPERIMENT.md
  experiment_000/
    train.py
    changes.patch
    metadata.json
    metadata.tsv
    README.md
  experiment_001/
    ...

By default it exports train.py, the intended editable experiment surface. To archive additional problem-folder files from every commit, pass them explicitly:

python export_experiments.py --files train.py program.md

The exporter marks the best non-crash experiment by lowest val_error, writes is_best=1 in manifest.tsv, and copies that experiment to experiment_snapshots/best_experiment/ for quick inspection.

The original results.tsv is left untouched. Architecture plots are generated when results.tsv includes an architecture column; new runs should use the header specified in that problem's program.md. The Markdown report uses the commit hashes in results.tsv to recover the exact train.py code for the best commit in each architecture family. The architecture_diagrams/ folder contains one labeled layer-by-layer PNG and SVG network diagram for each experiment row. parameter_vs_performance.png and .svg plot parameter count versus validation error.

Cookbook: Add A New Research Problem

To add a future problem, create a new sibling folder. Keep the problem self-contained unless a future project truly needs shared library code.

autoresearch_my_new_problem/

Recommended process:

Create the new folder and copy only the problem source files:

mkdir autoresearch_my_new_problem
cp autoresearch_ca_life/prepare.py autoresearch_my_new_problem/
cp autoresearch_ca_life/train.py autoresearch_my_new_problem/
cp autoresearch_ca_life/program.md autoresearch_my_new_problem/
cp autoresearch_ca_life/analyze_results.py autoresearch_my_new_problem/
cp autoresearch_ca_life/export_experiments.py autoresearch_my_new_problem/
cp autoresearch_ca_life/pyproject.toml autoresearch_my_new_problem/

analyze_results.py and export_experiments.py are problem-agnostic: they both read results.tsv and recover train.py snapshots from commit hashes, so they work unchanged in the new folder as long as program.md keeps the commit and architecture columns in the result ledger.

Rewrite autoresearch_my_new_problem/prepare.py to define the fixed problem:

constants
data/rule generation
make_batch(...)
evaluate_model(...)
smoke test CLI

Reset train.py only if the new problem needs a different model interface. Otherwise keep the shared decoder-only transformer baseline.
Rewrite program.md so the agent's objective, fixed files, metrics, and result ledger columns are problem-specific.

Run:

cd autoresearch_my_new_problem
python prepare.py --smoke-test
AR_TIME_BUDGET=2 python train.py

Start Codex in that folder and tell it to read program.md.

After the agent has produced results.tsv, generate figures and export the committed experiment snapshots (same tooling as the existing problems):

cd autoresearch_my_new_problem
python analyze_results.py     # plots, architecture report, summary.json
python export_experiments.py  # experiment_snapshots/ + best_experiment/

This keeps each autoresearch problem isolated and easy to publish, reproduce, or archive.

Recommended Git Workflow

Use two local clones so reusable code changes and optimization experiments do not get mixed together:

git clone https://github.com/lamm-mit/explore-and-discover.git Explore-and-Discover-main
git clone https://github.com/lamm-mit/explore-and-discover.git Explore-and-Discover-run

Use Explore-and-Discover-main for documentation, benchmark, plotting, and new problem-folder changes:

cd Explore-and-Discover-main
git switch main
git pull --ff-only origin main

# edit reusable files here
git add README.md autoresearch_ca_life/analyze_results.py
git commit -m "Update analysis tooling"
git push origin main

Use Explore-and-Discover-run for Codex or Claude Code optimization runs:

cd Explore-and-Discover-run
git switch main
git pull --ff-only origin main
git switch -c autoresearch/may24-ca-life

cd autoresearch_ca_life
codex

Never merge an autoresearch/* run branch into main unless you explicitly want to publish that branch's optimized train.py. If you want to preserve an interesting optimized model, push the experiment branch separately:

git push -u origin autoresearch/may24-ca-life

Ignore Rules

The root .gitignore ignores generated caches, run logs, results.tsv, plot exports, Python bytecode, virtual environments, and scratch directories for all autoresearch_* folders.

Reference

If you use this repository in your work, please cite:

@misc{buehler2026exploreanddiscover,
  author       = {Buehler, Markus J.},
  title        = {Explore and Discover Research Agents Solve Scientific Problems},
  year         = {2026},
  url          = {https://github.com/lamm-mit/explore-and-discover},
  note         = {GitHub repository}
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
assets		assets
autoresearch_ca_life		autoresearch_ca_life
autoresearch_ca_life_transformer		autoresearch_ca_life_transformer
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Explore and Discover Research Agent

Layout

Current Problems

Run Manually

Run With Codex

Autoresearch Flow

Results And Figures

Cookbook: Add A New Research Problem

Recommended Git Workflow

Ignore Rules

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Explore and Discover Research Agent

Layout

Current Problems

Run Manually

Run With Codex

Autoresearch Flow

Results And Figures

Cookbook: Add A New Research Problem

Recommended Git Workflow

Ignore Rules

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages