81 lines
2.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Sequence Diagram Skill — Autoresearch
Optimizes a pi skill for generating Mermaid sequence diagrams from
Elixir/Phoenix codebases, using [pi-autoresearch](https://github.com/davebcn87/pi-autoresearch).
## The Problem
Small local models (Qwen3.5-35B-A3B) produce great sequence diagrams for
well-represented languages (C#, Java) but go off the rails with Elixir/Phoenix —
sidetracking into imaginary code reviews instead of finishing the diagram.
## How It Works
The autoresearch loop mutates `skill/SKILL.md`, runs it against 3 scenarios
from a real Phoenix codebase (Firehose), and scores with **zero-judge-model
bash evals**:
| Eval | Check | Tool |
|------|-------|------|
| has_diagram | Output has `` ```mermaid `` + `sequenceDiagram` | grep |
| diagram_parseable | Valid mermaid syntax (participants + messages) | grep / mmdc |
| uses_real_modules | ≥2 actual module names from codebase | grep |
| uses_real_functions | ≥1 actual function name | grep |
| no_sidetracking | No review/critique language | grep against blocklist |
| concise | Under 3000 chars | wc |
3 tasks × 6 evals = 18 max score.
## Setup
1. Clone the Firehose repo into `workspace/`:
```bash
git clone https://gitea.apps.sustainabledelivery.com/mostalive/firehose workspace
```
2. Make scripts executable:
```bash
chmod +x autoresearch.sh autoresearch.checks.sh scripts/*.sh
```
3. Configure model access in `scripts/config.env`:
- Local: leave `SSH_TARGET` empty, have pi configured with your model
- Remote: set `SSH_TARGET=analyst@your-host` and `SSH_PORT=2222`
4. Init git and start:
```bash
git init && git add -A && git commit -m "initial"
pi
# then: /autoresearch
```
## Project Structure
```
sequence-diagram-skill/
├── autoresearch.md # Session doc (pi reads this)
├── autoresearch.sh # Benchmark runner
├── autoresearch.checks.sh # Sanity checks on SKILL.md
├── skill/
│ └── SKILL.md # THE FILE BEING OPTIMIZED
├── benchmark/
│ └── tasks.jsonl # 3 test scenarios
├── scripts/
│ ├── config.env # Endpoint config
│ ├── run_one.sh # Run pi with skill + single task
│ ├── score.sh # Score a single output (6 binary evals)
│ └── sidetrack_blocklist.txt # Phrases that indicate off-task behavior
└── workspace/ # Clone of Firehose repo (mounted/symlinked)
```
## Mutation Ideas for the Agent
The autoresearch agent only edits `skill/SKILL.md`. Good mutations include:
- Stronger "do not review" constraints
- Explicit Elixir/Phoenix vocabulary hints (NimblePublisher, module attributes)
- Output format enforcement (ONLY the mermaid block, nothing else)
- Step-by-step process instructions (read router first, then controller, etc.)
- Short generic example of a good sequence diagram
- Negative examples ("do NOT include suggestions or improvements")