From 51c59e3388ff9eae9cbd8090df3ce5d8f2457c03 Mon Sep 17 00:00:00 2001 From: Firehose Bot Date: Tue, 24 Mar 2026 12:13:05 +0000 Subject: [PATCH] nono sandbox --- nono.sh | 1 + sequence-diagram-skill/README.md | 80 -------------------------------- 2 files changed, 1 insertion(+), 80 deletions(-) delete mode 100644 sequence-diagram-skill/README.md diff --git a/nono.sh b/nono.sh index c160bb9..a52d41c 100644 --- a/nono.sh +++ b/nono.sh @@ -4,6 +4,7 @@ nono run \ --allow-cwd \ --allow /Users/willem/.local/share/mise \ --allow /Users/willem/.pi \ + --read /Users/willem/.git \ --allow /Users/willem/Library/Caches/mise \ --allow-net \ -- pi --verbose -p 'write a haiku' diff --git a/sequence-diagram-skill/README.md b/sequence-diagram-skill/README.md deleted file mode 100644 index c78b201..0000000 --- a/sequence-diagram-skill/README.md +++ /dev/null @@ -1,80 +0,0 @@ -# Sequence Diagram Skill — Autoresearch - -Optimizes a pi skill for generating Mermaid sequence diagrams from -Elixir/Phoenix codebases, using [pi-autoresearch](https://github.com/davebcn87/pi-autoresearch). - -## The Problem - -Small local models (Qwen3.5-35B-A3B) produce great sequence diagrams for -well-represented languages (C#, Java) but go off the rails with Elixir/Phoenix — -sidetracking into imaginary code reviews instead of finishing the diagram. - -## How It Works - -The autoresearch loop mutates `skill/SKILL.md`, runs it against 3 scenarios -from a real Phoenix codebase (Firehose), and scores with **zero-judge-model -bash evals**: - -| Eval | Check | Tool | -|------|-------|------| -| has_diagram | Output has `` ```mermaid `` + `sequenceDiagram` | grep | -| diagram_parseable | Valid mermaid syntax (participants + messages) | grep / mmdc | -| uses_real_modules | ≥2 actual module names from codebase | grep | -| uses_real_functions | ≥1 actual function name | grep | -| no_sidetracking | No review/critique language | grep against blocklist | -| concise | Under 3000 chars | wc | - -3 tasks × 6 evals = 18 max score. - -## Setup - -1. Clone the Firehose repo into `workspace/`: - ```bash - git clone https://gitea.apps.sustainabledelivery.com/mostalive/firehose workspace - ``` - -2. Make scripts executable: - ```bash - chmod +x autoresearch.sh autoresearch.checks.sh scripts/*.sh - ``` - -3. Configure model access in `scripts/config.env`: - - Local: leave `SSH_TARGET` empty, have pi configured with your model - - Remote: set `SSH_TARGET=analyst@your-host` and `SSH_PORT=2222` - -4. Init git and start: - ```bash - git init && git add -A && git commit -m "initial" - pi - # then: /autoresearch - ``` - -## Project Structure - -``` -sequence-diagram-skill/ -├── autoresearch.md # Session doc (pi reads this) -├── autoresearch.sh # Benchmark runner -├── autoresearch.checks.sh # Sanity checks on SKILL.md -├── skill/ -│ └── SKILL.md # THE FILE BEING OPTIMIZED -├── benchmark/ -│ └── tasks.jsonl # 3 test scenarios -├── scripts/ -│ ├── config.env # Endpoint config -│ ├── run_one.sh # Run pi with skill + single task -│ ├── score.sh # Score a single output (6 binary evals) -│ └── sidetrack_blocklist.txt # Phrases that indicate off-task behavior -└── workspace/ # Clone of Firehose repo (mounted/symlinked) -``` - -## Mutation Ideas for the Agent - -The autoresearch agent only edits `skill/SKILL.md`. Good mutations include: - -- Stronger "do not review" constraints -- Explicit Elixir/Phoenix vocabulary hints (NimblePublisher, module attributes) -- Output format enforcement (ONLY the mermaid block, nothing else) -- Step-by-step process instructions (read router first, then controller, etc.) -- Short generic example of a good sequence diagram -- Negative examples ("do NOT include suggestions or improvements")