From 51c59e3388ff9eae9cbd8090df3ce5d8f2457c03 Mon Sep 17 00:00:00 2001
From: Firehose Bot <firehose@example.com>
Date: Tue, 24 Mar 2026 12:13:05 +0000
Subject: [PATCH] nono sandbox

---
 nono.sh                          |  1 +
 sequence-diagram-skill/README.md | 80 --------------------------------
 2 files changed, 1 insertion(+), 80 deletions(-)
 delete mode 100644 sequence-diagram-skill/README.md

diff --git a/nono.sh b/nono.sh
index c160bb9..a52d41c 100644
--- a/nono.sh
+++ b/nono.sh
@@ -4,6 +4,7 @@ nono run \
   --allow-cwd \
   --allow /Users/willem/.local/share/mise \
   --allow /Users/willem/.pi \
+  --read /Users/willem/.git \
   --allow /Users/willem/Library/Caches/mise \
   --allow-net \
   -- pi --verbose -p 'write a haiku'
diff --git a/sequence-diagram-skill/README.md b/sequence-diagram-skill/README.md
deleted file mode 100644
index c78b201..0000000
--- a/sequence-diagram-skill/README.md
+++ /dev/null
@@ -1,80 +0,0 @@
-# Sequence Diagram Skill — Autoresearch
-
-Optimizes a pi skill for generating Mermaid sequence diagrams from
-Elixir/Phoenix codebases, using [pi-autoresearch](https://github.com/davebcn87/pi-autoresearch).
-
-## The Problem
-
-Small local models (Qwen3.5-35B-A3B) produce great sequence diagrams for
-well-represented languages (C#, Java) but go off the rails with Elixir/Phoenix —
-sidetracking into imaginary code reviews instead of finishing the diagram.
-
-## How It Works
-
-The autoresearch loop mutates `skill/SKILL.md`, runs it against 3 scenarios
-from a real Phoenix codebase (Firehose), and scores with **zero-judge-model
-bash evals**:
-
-| Eval | Check | Tool |
-|------|-------|------|
-| has_diagram | Output has `` ```mermaid `` + `sequenceDiagram` | grep |
-| diagram_parseable | Valid mermaid syntax (participants + messages) | grep / mmdc |
-| uses_real_modules | ≥2 actual module names from codebase | grep |
-| uses_real_functions | ≥1 actual function name | grep |
-| no_sidetracking | No review/critique language | grep against blocklist |
-| concise | Under 3000 chars | wc |
-
-3 tasks × 6 evals = 18 max score.
-
-## Setup
-
-1. Clone the Firehose repo into `workspace/`:
-   ```bash
-   git clone https://gitea.apps.sustainabledelivery.com/mostalive/firehose workspace
-   ```
-
-2. Make scripts executable:
-   ```bash
-   chmod +x autoresearch.sh autoresearch.checks.sh scripts/*.sh
-   ```
-
-3. Configure model access in `scripts/config.env`:
-   - Local: leave `SSH_TARGET` empty, have pi configured with your model
-   - Remote: set `SSH_TARGET=analyst@your-host` and `SSH_PORT=2222`
-
-4. Init git and start:
-   ```bash
-   git init && git add -A && git commit -m "initial"
-   pi
-   # then: /autoresearch
-   ```
-
-## Project Structure
-
-```
-sequence-diagram-skill/
-├── autoresearch.md           # Session doc (pi reads this)
-├── autoresearch.sh           # Benchmark runner
-├── autoresearch.checks.sh    # Sanity checks on SKILL.md
-├── skill/
-│   └── SKILL.md              # THE FILE BEING OPTIMIZED
-├── benchmark/
-│   └── tasks.jsonl           # 3 test scenarios
-├── scripts/
-│   ├── config.env            # Endpoint config
-│   ├── run_one.sh            # Run pi with skill + single task
-│   ├── score.sh              # Score a single output (6 binary evals)
-│   └── sidetrack_blocklist.txt  # Phrases that indicate off-task behavior
-└── workspace/                # Clone of Firehose repo (mounted/symlinked)
-```
-
-## Mutation Ideas for the Agent
-
-The autoresearch agent only edits `skill/SKILL.md`. Good mutations include:
-
-- Stronger "do not review" constraints
-- Explicit Elixir/Phoenix vocabulary hints (NimblePublisher, module attributes)
-- Output format enforcement (ONLY the mermaid block, nothing else)
-- Step-by-step process instructions (read router first, then controller, etc.)
-- Short generic example of a good sequence diagram
-- Negative examples ("do NOT include suggestions or improvements")