nono sandbox
This commit is contained in:
parent
87e6490f85
commit
b3cdd93de8
1
nono.sh
1
nono.sh
@ -4,6 +4,7 @@ nono run \
|
|||||||
--allow-cwd \
|
--allow-cwd \
|
||||||
--allow /Users/willem/.local/share/mise \
|
--allow /Users/willem/.local/share/mise \
|
||||||
--allow /Users/willem/.pi \
|
--allow /Users/willem/.pi \
|
||||||
|
--read /Users/willem/.git \
|
||||||
--allow /Users/willem/Library/Caches/mise \
|
--allow /Users/willem/Library/Caches/mise \
|
||||||
--allow-net \
|
--allow-net \
|
||||||
-- pi --verbose -p 'write a haiku'
|
-- pi --verbose -p 'write a haiku'
|
||||||
|
|||||||
@ -1,80 +0,0 @@
|
|||||||
# Sequence Diagram Skill — Autoresearch
|
|
||||||
|
|
||||||
Optimizes a pi skill for generating Mermaid sequence diagrams from
|
|
||||||
Elixir/Phoenix codebases, using [pi-autoresearch](https://github.com/davebcn87/pi-autoresearch).
|
|
||||||
|
|
||||||
## The Problem
|
|
||||||
|
|
||||||
Small local models (Qwen3.5-35B-A3B) produce great sequence diagrams for
|
|
||||||
well-represented languages (C#, Java) but go off the rails with Elixir/Phoenix —
|
|
||||||
sidetracking into imaginary code reviews instead of finishing the diagram.
|
|
||||||
|
|
||||||
## How It Works
|
|
||||||
|
|
||||||
The autoresearch loop mutates `skill/SKILL.md`, runs it against 3 scenarios
|
|
||||||
from a real Phoenix codebase (Firehose), and scores with **zero-judge-model
|
|
||||||
bash evals**:
|
|
||||||
|
|
||||||
| Eval | Check | Tool |
|
|
||||||
|------|-------|------|
|
|
||||||
| has_diagram | Output has `` ```mermaid `` + `sequenceDiagram` | grep |
|
|
||||||
| diagram_parseable | Valid mermaid syntax (participants + messages) | grep / mmdc |
|
|
||||||
| uses_real_modules | ≥2 actual module names from codebase | grep |
|
|
||||||
| uses_real_functions | ≥1 actual function name | grep |
|
|
||||||
| no_sidetracking | No review/critique language | grep against blocklist |
|
|
||||||
| concise | Under 3000 chars | wc |
|
|
||||||
|
|
||||||
3 tasks × 6 evals = 18 max score.
|
|
||||||
|
|
||||||
## Setup
|
|
||||||
|
|
||||||
1. Clone the Firehose repo into `workspace/`:
|
|
||||||
```bash
|
|
||||||
git clone https://gitea.apps.sustainabledelivery.com/mostalive/firehose workspace
|
|
||||||
```
|
|
||||||
|
|
||||||
2. Make scripts executable:
|
|
||||||
```bash
|
|
||||||
chmod +x autoresearch.sh autoresearch.checks.sh scripts/*.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
3. Configure model access in `scripts/config.env`:
|
|
||||||
- Local: leave `SSH_TARGET` empty, have pi configured with your model
|
|
||||||
- Remote: set `SSH_TARGET=analyst@your-host` and `SSH_PORT=2222`
|
|
||||||
|
|
||||||
4. Init git and start:
|
|
||||||
```bash
|
|
||||||
git init && git add -A && git commit -m "initial"
|
|
||||||
pi
|
|
||||||
# then: /autoresearch
|
|
||||||
```
|
|
||||||
|
|
||||||
## Project Structure
|
|
||||||
|
|
||||||
```
|
|
||||||
sequence-diagram-skill/
|
|
||||||
├── autoresearch.md # Session doc (pi reads this)
|
|
||||||
├── autoresearch.sh # Benchmark runner
|
|
||||||
├── autoresearch.checks.sh # Sanity checks on SKILL.md
|
|
||||||
├── skill/
|
|
||||||
│ └── SKILL.md # THE FILE BEING OPTIMIZED
|
|
||||||
├── benchmark/
|
|
||||||
│ └── tasks.jsonl # 3 test scenarios
|
|
||||||
├── scripts/
|
|
||||||
│ ├── config.env # Endpoint config
|
|
||||||
│ ├── run_one.sh # Run pi with skill + single task
|
|
||||||
│ ├── score.sh # Score a single output (6 binary evals)
|
|
||||||
│ └── sidetrack_blocklist.txt # Phrases that indicate off-task behavior
|
|
||||||
└── workspace/ # Clone of Firehose repo (mounted/symlinked)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Mutation Ideas for the Agent
|
|
||||||
|
|
||||||
The autoresearch agent only edits `skill/SKILL.md`. Good mutations include:
|
|
||||||
|
|
||||||
- Stronger "do not review" constraints
|
|
||||||
- Explicit Elixir/Phoenix vocabulary hints (NimblePublisher, module attributes)
|
|
||||||
- Output format enforcement (ONLY the mermaid block, nothing else)
|
|
||||||
- Step-by-step process instructions (read router first, then controller, etc.)
|
|
||||||
- Short generic example of a good sequence diagram
|
|
||||||
- Negative examples ("do NOT include suggestions or improvements")
|
|
||||||
Loading…
x
Reference in New Issue
Block a user