Sequence Diagram Skill — Autoresearch
Optimizes a pi skill for generating Mermaid sequence diagrams from Elixir/Phoenix codebases, using pi-autoresearch.
The Problem
Small local models (Qwen3.5-35B-A3B) produce great sequence diagrams for well-represented languages (C#, Java) but go off the rails with Elixir/Phoenix — sidetracking into imaginary code reviews instead of finishing the diagram.
How It Works
The autoresearch loop mutates skill/SKILL.md, runs it against 3 scenarios
from a real Phoenix codebase (Firehose), and scores with zero-judge-model
bash evals:
| Eval | Check | Tool |
|---|---|---|
| has_diagram | Output has ```mermaid + sequenceDiagram |
grep |
| diagram_parseable | Valid mermaid syntax (participants + messages) | grep / mmdc |
| uses_real_modules | ≥2 actual module names from codebase | grep |
| uses_real_functions | ≥1 actual function name | grep |
| no_sidetracking | No review/critique language | grep against blocklist |
| concise | Under 3000 chars | wc |
3 tasks × 6 evals = 18 max score.
Setup
-
Clone the Firehose repo into
workspace/:git clone https://gitea.apps.sustainabledelivery.com/mostalive/firehose workspace -
Make scripts executable:
chmod +x autoresearch.sh autoresearch.checks.sh scripts/*.sh -
Configure model access in
scripts/config.env:- Local: leave
SSH_TARGETempty, have pi configured with your model - Remote: set
SSH_TARGET=analyst@your-hostandSSH_PORT=2222
- Local: leave
-
Init git and start:
git init && git add -A && git commit -m "initial" pi # then: /autoresearch
Project Structure
sequence-diagram-skill/
├── autoresearch.md # Session doc (pi reads this)
├── autoresearch.sh # Benchmark runner
├── autoresearch.checks.sh # Sanity checks on SKILL.md
├── skill/
│ └── SKILL.md # THE FILE BEING OPTIMIZED
├── benchmark/
│ └── tasks.jsonl # 3 test scenarios
├── scripts/
│ ├── config.env # Endpoint config
│ ├── run_one.sh # Run pi with skill + single task
│ ├── score.sh # Score a single output (6 binary evals)
│ └── sidetrack_blocklist.txt # Phrases that indicate off-task behavior
└── workspace/ # Clone of Firehose repo (mounted/symlinked)
Mutation Ideas for the Agent
The autoresearch agent only edits skill/SKILL.md. Good mutations include:
- Stronger "do not review" constraints
- Explicit Elixir/Phoenix vocabulary hints (NimblePublisher, module attributes)
- Output format enforcement (ONLY the mermaid block, nothing else)
- Step-by-step process instructions (read router first, then controller, etc.)
- Short generic example of a good sequence diagram
- Negative examples ("do NOT include suggestions or improvements")