nono sandbox

2026-03-24 12:13:05 +00:00 · 2026-03-24 12:13:05 +00:00 · 51c59e3388
commit 51c59e3388
parent c76853efec
2 changed files with 1 additions and 80 deletions
--- a/nono.sh
+++ b/nono.sh
@ -4,6 +4,7 @@ nono run \
  --allow-cwd \
  --allow /Users/willem/.local/share/mise \
  --allow /Users/willem/.pi \
+  --read /Users/willem/.git \
  --allow /Users/willem/Library/Caches/mise \
  --allow-net \
  -- pi --verbose -p 'write a haiku'
--- a/sequence-diagram-skill/README.md
+++ b/sequence-diagram-skill/README.md
@ -1,80 +0,0 @@
-# Sequence Diagram Skill — Autoresearch
-
-Optimizes a pi skill for generating Mermaid sequence diagrams from
-Elixir/Phoenix codebases, using [pi-autoresearch](https://github.com/davebcn87/pi-autoresearch).
-
-## The Problem
-
-Small local models (Qwen3.5-35B-A3B) produce great sequence diagrams for
-well-represented languages (C#, Java) but go off the rails with Elixir/Phoenix —
-sidetracking into imaginary code reviews instead of finishing the diagram.
-
-## How It Works
-
-The autoresearch loop mutates `skill/SKILL.md`, runs it against 3 scenarios
-from a real Phoenix codebase (Firehose), and scores with **zero-judge-model
-bash evals**:
-
-| Eval | Check | Tool |
-|------|-------|------|
-| has_diagram | Output has `` ```mermaid `` + `sequenceDiagram` | grep |
-| diagram_parseable | Valid mermaid syntax (participants + messages) | grep / mmdc |
-| uses_real_modules | ≥2 actual module names from codebase | grep |
-| uses_real_functions | ≥1 actual function name | grep |
-| no_sidetracking | No review/critique language | grep against blocklist |
-| concise | Under 3000 chars | wc |
-
-3 tasks × 6 evals = 18 max score.
-
-## Setup
-
-1. Clone the Firehose repo into `workspace/`:
-   ```bash
-   git clone https://gitea.apps.sustainabledelivery.com/mostalive/firehose workspace
-   ```
-
-2. Make scripts executable:
-   ```bash
-   chmod +x autoresearch.sh autoresearch.checks.sh scripts/*.sh
-   ```
-
-3. Configure model access in `scripts/config.env`:
-   - Local: leave `SSH_TARGET` empty, have pi configured with your model
-   - Remote: set `SSH_TARGET=analyst@your-host` and `SSH_PORT=2222`
-
-4. Init git and start:
-   ```bash
-   git init && git add -A && git commit -m "initial"
-   pi
-   # then: /autoresearch
-   ```
-
-## Project Structure
-
-```
-sequence-diagram-skill/
-├── autoresearch.md           # Session doc (pi reads this)
-├── autoresearch.sh           # Benchmark runner
-├── autoresearch.checks.sh    # Sanity checks on SKILL.md
-├── skill/
-│   └── SKILL.md              # THE FILE BEING OPTIMIZED
-├── benchmark/
-│   └── tasks.jsonl           # 3 test scenarios
-├── scripts/
-│   ├── config.env            # Endpoint config
-│   ├── run_one.sh            # Run pi with skill + single task
-│   ├── score.sh              # Score a single output (6 binary evals)
-│   └── sidetrack_blocklist.txt  # Phrases that indicate off-task behavior
-└── workspace/                # Clone of Firehose repo (mounted/symlinked)
-```
-
-## Mutation Ideas for the Agent
-
-The autoresearch agent only edits `skill/SKILL.md`. Good mutations include:
-
- Stronger "do not review" constraints
- Explicit Elixir/Phoenix vocabulary hints (NimblePublisher, module attributes)
- Output format enforcement (ONLY the mermaid block, nothing else)
- Step-by-step process instructions (read router first, then controller, etc.)
- Short generic example of a good sequence diagram
- Negative examples ("do NOT include suggestions or improvements")