# Sequence Diagram Skill — Autoresearch Optimizes a pi skill for generating Mermaid sequence diagrams from Elixir/Phoenix codebases, using [pi-autoresearch](https://github.com/davebcn87/pi-autoresearch). ## The Problem Small local models (Qwen3.5-35B-A3B) produce great sequence diagrams for well-represented languages (C#, Java) but go off the rails with Elixir/Phoenix — sidetracking into imaginary code reviews instead of finishing the diagram. ## How It Works The autoresearch loop mutates `skill/SKILL.md`, runs it against 3 scenarios from a real Phoenix codebase (Firehose), and scores with **zero-judge-model bash evals**: | Eval | Check | Tool | |------|-------|------| | has_diagram | Output has `` ```mermaid `` + `sequenceDiagram` | grep | | diagram_parseable | Valid mermaid syntax (participants + messages) | grep / mmdc | | uses_real_modules | ≥2 actual module names from codebase | grep | | uses_real_functions | ≥1 actual function name | grep | | no_sidetracking | No review/critique language | grep against blocklist | | concise | Under 3000 chars | wc | 3 tasks × 6 evals = 18 max score. ## Setup 1. Clone the Firehose repo into `workspace/`: ```bash git clone https://gitea.apps.sustainabledelivery.com/mostalive/firehose workspace ``` 2. Make scripts executable: ```bash chmod +x autoresearch.sh autoresearch.checks.sh scripts/*.sh ``` 3. Configure model access in `scripts/config.env`: - Local: leave `SSH_TARGET` empty, have pi configured with your model - Remote: set `SSH_TARGET=analyst@your-host` and `SSH_PORT=2222` 4. Init git and start: ```bash git init && git add -A && git commit -m "initial" pi # then: /autoresearch ``` ## Project Structure ``` sequence-diagram-skill/ ├── autoresearch.md # Session doc (pi reads this) ├── autoresearch.sh # Benchmark runner ├── autoresearch.checks.sh # Sanity checks on SKILL.md ├── skill/ │ └── SKILL.md # THE FILE BEING OPTIMIZED ├── benchmark/ │ └── tasks.jsonl # 3 test scenarios ├── scripts/ │ ├── config.env # Endpoint config │ ├── run_one.sh # Run pi with skill + single task │ ├── score.sh # Score a single output (6 binary evals) │ └── sidetrack_blocklist.txt # Phrases that indicate off-task behavior └── workspace/ # Clone of Firehose repo (mounted/symlinked) ``` ## Mutation Ideas for the Agent The autoresearch agent only edits `skill/SKILL.md`. Good mutations include: - Stronger "do not review" constraints - Explicit Elixir/Phoenix vocabulary hints (NimblePublisher, module attributes) - Output format enforcement (ONLY the mermaid block, nothing else) - Step-by-step process instructions (read router first, then controller, etc.) - Short generic example of a good sequence diagram - Negative examples ("do NOT include suggestions or improvements")