remove sequence diagram skill, moved to other repo

2026-03-24 12:14:01 +00:00 · 2026-03-24 12:14:01 +00:00 · fddbb4e777
commit fddbb4e777
parent b3cdd93de8
10 changed files with 0 additions and 521 deletions
--- a/sequence-diagram-skill/.gitignore
+++ b/sequence-diagram-skill/.gitignore
@ -1,10 +0,0 @@
 # autoresearch session
 autoresearch.jsonl
 autoresearch.ideas.md
 # temp
 .tmp_*
 *.tmp
 # OS
 .DS_Store
--- a/sequence-diagram-skill/autoresearch.checks.sh
+++ b/sequence-diagram-skill/autoresearch.checks.sh
@ -1,57 +0,0 @@
 #!/usr/bin/env bash
 set -euo pipefail
 # ─── autoresearch.checks.sh ─────────────────────────────────────────────────
 # Backpressure checks for the sequence diagram skill.
 # ─────────────────────────────────────────────────────────────────────────────
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 SKILL_FILE="${SCRIPT_DIR}/skill/SKILL.md"
 ERRORS=0
 # 1. Skill exists and is non-empty
 if [[ ! -s "$SKILL_FILE" ]]; then
    echo "FAIL: skill/SKILL.md is missing or empty"
    ERRORS=$((ERRORS + 1))
 fi
 # 2. Skill is not trivially short
 CHAR_COUNT=$(wc -c < "$SKILL_FILE" 2>/dev/null || echo "0")
 if (( CHAR_COUNT < 200 )); then
    echo "FAIL: skill/SKILL.md is only ${CHAR_COUNT} chars (min: 200)"
    ERRORS=$((ERRORS + 1))
 fi
 # 3. Skill is not too long (rough token proxy: 1500 tokens ≈ 6000 chars)
 if (( CHAR_COUNT > 6000 )); then
    echo "FAIL: skill/SKILL.md is ${CHAR_COUNT} chars (max: ~6000)"
    ERRORS=$((ERRORS + 1))
 fi
 # 4. Skill must contain "sequenceDiagram" or "sequence diagram" (it's a diagram skill)
 if ! grep -qi 'sequence.diagram' "$SKILL_FILE" 2>/dev/null; then
    echo "FAIL: skill/SKILL.md doesn't mention sequence diagrams"
    ERRORS=$((ERRORS + 1))
 fi
 # 5. Skill must NOT contain Firehose-specific code (no overfitting)
 for term in "BlogController" "EngineeringBlog" "Firehose" "blogex" "priv/blog"; do
    if grep -q "$term" "$SKILL_FILE" 2>/dev/null; then
        echo "FAIL: skill/SKILL.md contains codebase-specific term '${term}'"
        ERRORS=$((ERRORS + 1))
    fi
 done
 # 6. Valid UTF-8
 if ! iconv -f utf-8 -t utf-8 "$SKILL_FILE" > /dev/null 2>&1; then
    echo "FAIL: skill/SKILL.md contains invalid UTF-8"
    ERRORS=$((ERRORS + 1))
 fi
 if (( ERRORS > 0 )); then
    echo "Checks FAILED with ${ERRORS} error(s)"
    exit 1
 else
    echo "All checks passed. Skill: ${CHAR_COUNT} chars."
    exit 0
 fi
--- a/sequence-diagram-skill/autoresearch.md
+++ b/sequence-diagram-skill/autoresearch.md
@ -1,96 +0,0 @@
 # Autoresearch: Sequence Diagram Skill for Elixir/Phoenix
 ## Objective
 Optimize a pi skill (`skill/SKILL.md`) that generates Mermaid sequence diagrams
 from Elixir/Phoenix codebases. The skill is used with a local Qwen3.5-35B-A3B
 model running on CPU. The primary failure mode is **sidetracking** — the model
 abandons the diagram task and starts reviewing/critiquing the code instead.
 ## Primary Metric
 **score** — higher is better (0–18 scale, sum of 6 binary evals × 3 test inputs).
 ## Secondary Metrics
 - **sidetrack_count** — number of test runs containing review/critique language (lower is better)
 - **parse_count** — number of outputs that contain a parseable sequenceDiagram (higher is better)
 ## Architecture
 Pi runs the skill against the Firehose codebase (mounted in the workspace) using
 the target model. Scoring is done by bash scripts — no judge model needed.
 ## The Codebase Under Test
 **Firehose** — a Phoenix blogging platform with a monorepo structure:
 - `app/` — Phoenix web app (OTP app: `:firehose`)
  - `lib/firehose_web/router.ex` — routes
  - `lib/firehose_web/controllers/blog_controller.ex` — blog actions
  - `lib/firehose_web/controllers/page_controller.ex` — homepage
  - `lib/firehose/blogs/` — blog context modules (EngineeringBlog, ReleaseNotes)
 - `blogex/` — sibling library for compile-time blog engine
  - `lib/blogex/blog.ex` — `use Blogex.Blog` macro (NimblePublisher)
  - `lib/blogex/components.ex` — Phoenix function components (post_meta, tag_list, etc.)
  - `lib/blogex/router.ex` — API/feed routes
 **Key architectural fact:** Blogex uses NimblePublisher. All blog posts are compiled
 into BEAM module attributes at build time. There is NO runtime file I/O for reading
 posts. Functions like `all_posts/0`, `get_post!/1`, `posts_by_tag/1` read from
 `@posts` module attributes. This is the #1 thing models get wrong.
 ## Test Inputs (3 scenarios)
 ### 1. Click tag on post (small)
 "Generate a sequence diagram for: a user on a blog post page clicks a tag link
 (e.g., 'elixir'). Trace the full request from browser through to rendered response."
 ### 2. Show homepage (small)
 "Generate a sequence diagram for: a user visits the homepage (GET /).
 Trace from browser through to rendered HTML."
 ### 3. Add blog post on disk (larger, crosses compile/runtime boundary)
 "Generate a sequence diagram for: a developer creates a new markdown file in
 priv/blog/engineering/. Trace what happens from file creation through to the
 post being visible on the blog. Include the compile-time and runtime phases."
 ## Eval Criteria (6 binary checks)
 1. **has_diagram** — output contains `` ```mermaid `` and `sequenceDiagram`
 2. **diagram_parseable** — the mermaid block is syntactically valid
 3. **uses_real_modules** — diagram mentions at least 2 of: BlogController, EngineeringBlog, Blogex, Router, PageController
 4. **uses_real_functions** — diagram mentions at least 1 of: posts_by_tag, get_post!, all_posts, paginate, resolve_blog, render
 5. **no_sidetracking** — output does NOT contain code review language (see blocklist)
 6. **concise** — total output is under 3000 characters
 ## Files in Scope
 | File | Agent may edit? |
 |------|-----------------|
 | `skill/SKILL.md` | ✅ YES — the only file the agent modifies |
 | `benchmark/tasks.jsonl` | ❌ NO |
 | `scripts/score.sh` | ❌ NO |
 | `scripts/run_one.sh` | ❌ NO |
 | `scripts/sidetrack_blocklist.txt` | ❌ NO |
 | `autoresearch.sh` | ❌ NO |
 | `autoresearch.checks.sh` | ❌ NO |
 ## Constraints
 - SKILL.md must stay under 1500 tokens.
 - SKILL.md must NOT contain any code from the Firehose codebase (no overfitting).
 - SKILL.md must remain generic — it should work for any Elixir/Phoenix codebase,
  not just Firehose.
 ## What Has Been Tried
 (autoresearch fills this in)
 ## Dead Ends
 (autoresearch fills this in)
 ## Key Wins
 (autoresearch fills this in)
--- a/sequence-diagram-skill/autoresearch.sh
+++ b/sequence-diagram-skill/autoresearch.sh
@ -1,101 +0,0 @@
 #!/usr/bin/env bash
 set -euo pipefail
 # ─── autoresearch.sh ─────────────────────────────────────────────────────────
 # Benchmark script for sequence diagram skill optimization.
 # Runs all 3 test inputs, scores each, outputs METRIC lines.
 # ─────────────────────────────────────────────────────────────────────────────
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 source "${SCRIPT_DIR}/scripts/config.env" 2>/dev/null || true
 # Defaults
 SSH_TARGET="${SSH_TARGET:-}"
 SSH_PORT="${SSH_PORT:-2222}"
 export TASK_TIMEOUT="${TASK_TIMEOUT:-180}"
 # ─── Pre-checks ──────────────────────────────────────────────────────────────
 SKILL_FILE="${SCRIPT_DIR}/skill/SKILL.md"
 if [[ ! -s "$SKILL_FILE" ]]; then
    echo "ERROR: skill/SKILL.md is missing or empty"
    exit 1
 fi
 SKILL_CHARS=$(wc -c < "$SKILL_FILE")
 echo "Skill: ${SKILL_CHARS} chars"
 TASKS_FILE="${SCRIPT_DIR}/benchmark/tasks.jsonl"
 if [[ ! -f "$TASKS_FILE" ]]; then
    echo "ERROR: benchmark/tasks.jsonl not found"
    exit 1
 fi
 echo "────────────────────────────────────────────────────"
 # ─── Run all tasks ───────────────────────────────────────────────────────────
 TMPDIR=$(mktemp -d)
 TOTAL_SCORE=0
 SIDETRACK_COUNT=0
 PARSE_COUNT=0
 TASK_COUNT=0
 START_TIME=$(date +%s)
 while IFS= read -r line; do
    TASK_ID=$(echo "$line" | jq -r '.id')
    TASK_PROMPT=$(echo "$line" | jq -r '.prompt')
    TASK_COUNT=$((TASK_COUNT + 1))
    OUTPUT_FILE="${TMPDIR}/${TASK_ID}.txt"
    SCORE_FILE="${TMPDIR}/${TASK_ID}.json"
    echo "  [${TASK_COUNT}/3] ${TASK_ID}..."
    # Run the task
    bash "${SCRIPT_DIR}/scripts/run_one.sh" \
        "$TASK_PROMPT" \
        "$OUTPUT_FILE" \
        "$SSH_TARGET" \
        "$SSH_PORT"
    # Score it
    SCORE_JSON=$(bash "${SCRIPT_DIR}/scripts/score.sh" "$OUTPUT_FILE")
    echo "$SCORE_JSON" > "$SCORE_FILE"
    # Extract scores
    TASK_SCORE=$(echo "$SCORE_JSON" | jq -r '.score')
    TASK_SIDETRACK=$(echo "$SCORE_JSON" | jq -r '.no_sidetracking')
    TASK_PARSE=$(echo "$SCORE_JSON" | jq -r '.diagram_parseable')
    TASK_CHARS=$(echo "$SCORE_JSON" | jq -r '.char_count')
    TOTAL_SCORE=$((TOTAL_SCORE + TASK_SCORE))
    if (( TASK_SIDETRACK == 0 )); then
        SIDETRACK_COUNT=$((SIDETRACK_COUNT + 1))
    fi
    if (( TASK_PARSE == 1 )); then
        PARSE_COUNT=$((PARSE_COUNT + 1))
    fi
    echo "    score=${TASK_SCORE}/6 sidetrack=$(( 1 - TASK_SIDETRACK )) parseable=${TASK_PARSE} chars=${TASK_CHARS}"
 done < "$TASKS_FILE"
 END_TIME=$(date +%s)
 TOTAL_SECONDS=$((END_TIME - START_TIME))
 # ─── Cleanup ─────────────────────────────────────────────────────────────────
 rm -rf "$TMPDIR"
 # ─── Output METRIC lines ────────────────────────────────────────────────────
 echo ""
 echo "METRIC score=${TOTAL_SCORE}"
 echo "METRIC sidetrack_count=${SIDETRACK_COUNT}"
 echo "METRIC parse_count=${PARSE_COUNT}"
 echo "METRIC total_seconds=${TOTAL_SECONDS}"
 echo "METRIC skill_chars=${SKILL_CHARS}"
--- a/sequence-diagram-skill/benchmark/tasks.jsonl
+++ b/sequence-diagram-skill/benchmark/tasks.jsonl
@ -1,3 +0,0 @@
 {"id": "click-tag", "prompt": "Generate a sequence diagram for: a user on a blog post page clicks a tag link (e.g., 'elixir'). Trace the full HTTP request from browser through the Phoenix router, controller, domain modules, templates, and back to the browser. The codebase is in /home/analyst/workspace/. Read the relevant source files first."}
 {"id": "show-homepage", "prompt": "Generate a sequence diagram for: a user visits the homepage (GET /). Trace from the browser's HTTP request through the Phoenix router, controller, template rendering, layout wrapping, and back to the browser. The codebase is in /home/analyst/workspace/. Read the relevant source files first."}
 {"id": "add-post", "prompt": "Generate a sequence diagram for: a developer creates a new markdown file in priv/blog/engineering/ and the post becomes visible on the blog. Trace what happens including the compile-time phase (NimblePublisher, module recompilation) and the runtime request phase. The codebase is in /home/analyst/workspace/. Read the relevant source files first."}
--- a/sequence-diagram-skill/scripts/config.env
+++ b/sequence-diagram-skill/scripts/config.env
@ -1,10 +0,0 @@
 # ─── config.env ──────────────────────────────────────────────────────────────
 # Leave SSH_TARGET empty to run pi locally (e.g., on your Mac).
 # Set it to use the remote pi container.
 # Remote pi container (leave empty for local)
 SSH_TARGET=""
 SSH_PORT=2222
 # Timeout per task (seconds)
 TASK_TIMEOUT=180
--- a/sequence-diagram-skill/scripts/run_one.sh
+++ b/sequence-diagram-skill/scripts/run_one.sh
@ -1,58 +0,0 @@
 #!/usr/bin/env bash
 set -euo pipefail
 # ─── run_one.sh ──────────────────────────────────────────────────────────────
 # Run pi with the sequence-diagram skill on a single task.
 # Usage: ./scripts/run_one.sh <task_prompt> <output_file> [ssh_target] [ssh_port]
 #
 # If ssh_target is provided, runs remotely via SSH into the pi container.
 # Otherwise runs pi locally.
 # ─────────────────────────────────────────────────────────────────────────────
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 PROJECT_DIR="$(dirname "$SCRIPT_DIR")"
 TASK_PROMPT="$1"
 OUTPUT_FILE="$2"
 SSH_TARGET="${3:-}"
 SSH_PORT="${4:-2222}"
 TIMEOUT="${TASK_TIMEOUT:-180}"
 SKILL_FILE="${PROJECT_DIR}/skill/SKILL.md"
 if [[ ! -f "$SKILL_FILE" ]]; then
    echo "ERROR: skill/SKILL.md not found" >&2
    exit 1
 fi
 SKILL_CONTENT=$(cat "$SKILL_FILE")
 # Build the full prompt: skill instructions + task
 FULL_PROMPT="## Skill Instructions
 ${SKILL_CONTENT}
 ## Task
 ${TASK_PROMPT}"
 if [[ -n "$SSH_TARGET" ]]; then
    # ─── Remote: SSH into pi container ───────────────────────────────────
    PAYLOAD=$(jq -n --arg prompt "$FULL_PROMPT" '{"prompt": $prompt}')
    ssh -p "$SSH_PORT" \
        -o StrictHostKeyChecking=no \
        -o ConnectTimeout=10 \
        -o BatchMode=yes \
        "$SSH_TARGET" \
        "run-task --stdin --mode print --thinking off --timeout $TIMEOUT" \
        <<< "$PAYLOAD" > "$OUTPUT_FILE" 2>/dev/null
 else
    # ─── Local: run pi directly ──────────────────────────────────────────
    timeout "${TIMEOUT}s" pi \
        --mode print \
        --no-session \
        --no-extensions \
        --thinking none \
        -p "$FULL_PROMPT" > "$OUTPUT_FILE" 2>/dev/null || true
 fi
--- a/sequence-diagram-skill/scripts/score.sh
+++ b/sequence-diagram-skill/scripts/score.sh
@ -1,109 +0,0 @@
 #!/usr/bin/env bash
 set -euo pipefail
 # ─── score.sh ────────────────────────────────────────────────────────────────
 # Score a single diagram output against 6 binary evals.
 # Usage: ./scripts/score.sh <output_file>
 # Prints a JSON line with pass/fail for each eval and total score.
 # ─────────────────────────────────────────────────────────────────────────────
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 OUTPUT_FILE="$1"
 if [[ ! -f "$OUTPUT_FILE" ]]; then
    echo '{"error": "file not found", "score": 0}' 
    exit 0
 fi
 CONTENT=$(cat "$OUTPUT_FILE")
 CHAR_COUNT=${#CONTENT}
 # ─── Eval 1: has_diagram ─────────────────────────────────────────────────────
 # Output contains a mermaid fenced block with sequenceDiagram
 has_diagram=0
 if echo "$CONTENT" | grep -q '```mermaid' && echo "$CONTENT" | grep -q 'sequenceDiagram'; then
    has_diagram=1
 fi
 # ─── Eval 2: diagram_parseable ───────────────────────────────────────────────
 # Extract the mermaid block and check basic syntax
 diagram_parseable=0
 if (( has_diagram == 1 )); then
    # Extract mermaid block
    MERMAID_BLOCK=$(echo "$CONTENT" | awk '/^```mermaid/{found=1;next} found && /^```$/{exit} found{print}')
    if [[ -n "$MERMAID_BLOCK" ]]; then
        # Basic syntax checks:
        # - Has "sequenceDiagram" keyword
        # - Has at least one "participant" line  
        # - Has at least one "->>", "-->>", or "->>" message line
        has_keyword=$(echo "$MERMAID_BLOCK" | grep -c 'sequenceDiagram' || true)
        has_participant=$(echo "$MERMAID_BLOCK" | grep -c 'participant' || true)
        has_message=$(echo "$MERMAID_BLOCK" | grep -cE '\->>|-->>|\->' || true)
        if (( has_keyword > 0 && has_participant > 0 && has_message > 0 )); then
            diagram_parseable=1
        fi
    fi
    # If mmdc (mermaid CLI) is available, use it for real validation
    if command -v mmdc &> /dev/null && (( diagram_parseable == 1 )); then
        TMPFILE=$(mktemp /tmp/mermaid_XXXXXX.mmd)
        echo "$MERMAID_BLOCK" > "$TMPFILE"
        if mmdc -i "$TMPFILE" -o /dev/null 2>/dev/null; then
            diagram_parseable=1
        else
            diagram_parseable=0
        fi
        rm -f "$TMPFILE"
    fi
 fi
 # ─── Eval 3: uses_real_modules ───────────────────────────────────────────────
 # Diagram mentions at least 2 real modules from the Firehose codebase
 uses_real_modules=0
 module_count=0
 for module in BlogController EngineeringBlog ReleaseNotes Blogex Router PageController Layouts; do
    if echo "$CONTENT" | grep -qi "$module"; then
        module_count=$((module_count + 1))
    fi
 done
 if (( module_count >= 2 )); then
    uses_real_modules=1
 fi
 # ─── Eval 4: uses_real_functions ─────────────────────────────────────────────
 # Diagram mentions at least 1 real function from the codebase
 uses_real_functions=0
 for func in posts_by_tag get_post all_posts paginate resolve_blog render recent_posts; do
    if echo "$CONTENT" | grep -qi "$func"; then
        uses_real_functions=1
        break
    fi
 done
 # ─── Eval 5: no_sidetracking ────────────────────────────────────────────────
 # Output does NOT contain code review / critique language
 no_sidetracking=1
 BLOCKLIST="${SCRIPT_DIR}/sidetrack_blocklist.txt"
 if [[ -f "$BLOCKLIST" ]]; then
    while IFS= read -r phrase; do
        phrase=$(echo "$phrase" | xargs)  # trim whitespace
        if [[ -n "$phrase" ]] && echo "$CONTENT" | grep -qi "$phrase"; then
            no_sidetracking=0
            break
        fi
    done < "$BLOCKLIST"
 fi
 # ─── Eval 6: concise ────────────────────────────────────────────────────────
 # Total output under 3000 characters
 concise=0
 if (( CHAR_COUNT < 3000 )); then
    concise=1
 fi
 # ─── Total ───────────────────────────────────────────────────────────────────
 score=$((has_diagram + diagram_parseable + uses_real_modules + uses_real_functions + no_sidetracking + concise))
 echo "{\"score\":${score},\"has_diagram\":${has_diagram},\"diagram_parseable\":${diagram_parseable},\"uses_real_modules\":${uses_real_modules},\"uses_real_functions\":${uses_real_functions},\"no_sidetracking\":${no_sidetracking},\"concise\":${concise},\"char_count\":${CHAR_COUNT}}"
--- a/sequence-diagram-skill/scripts/sidetrack_blocklist.txt
+++ b/sequence-diagram-skill/scripts/sidetrack_blocklist.txt
@ -1,23 +0,0 @@
 potential issue
 consider using
 should be
 could be improved
 recommend
 suggestion
 improvement
 code review
 refactor
 best practice
 security concern
 vulnerability
 error handling could
 missing error
 you might want
 it would be better
 note that this
 be aware that
 one concern
 problematic
 anti-pattern
 smell
 technical debt
--- a/sequence-diagram-skill/skill/SKILL.md
+++ b/sequence-diagram-skill/skill/SKILL.md
@ -1,54 +0,0 @@
 ---
 name: sequence-diagram
 description: Generate a Mermaid sequence diagram showing message flow across module boundaries for an Elixir/Phoenix interaction. Use when asked to diagram, trace, or visualize a user interaction, request flow, or feature path through the codebase.
 ---
 # Sequence Diagram Skill
 Generate a Mermaid `sequenceDiagram` that traces a specific user interaction
 across module boundaries in an Elixir/Phoenix codebase.
 ## Your Task
 Given a description of an interaction (e.g., "user clicks a tag on a blog post")
 and access to the source files, produce a Mermaid sequence diagram that accurately
 shows the message flow between modules.
 ## Process
 1. **Identify the entry point.** What triggers this interaction? (HTTP request,
   LiveView event, PubSub message, etc.)
 2. **Read the router** to find which controller/live module handles the route.
 3. **Read the controller/live module** to find which functions are called and
   which domain modules they delegate to.
 4. **Read the domain modules** to understand what they return and how.
 5. **Read templates/components** if the rendering path matters.
 6. **Emit the diagram.** Use `sequenceDiagram` with participants named after
   actual modules. Show function calls as messages.
 ## Output Format
 Respond with ONLY a fenced Mermaid code block. No preamble, no explanation,
 no code review, no suggestions. Just the diagram.
 ```mermaid
 sequenceDiagram
    participant Browser
    participant Router as FirehoseWeb.Router
    ...
 ```
 ## Rules
 - **Participants must be real modules** from the codebase. Never invent modules.
 - **Messages must be real function calls** or HTTP verbs. Use the actual function
  names you found in the source (e.g., `blog.posts_by_tag(tag)`, not "get posts").
 - **Show the return path.** Responses flow back: module returns data, controller
  renders, browser receives HTML.
 - **Distinguish compile-time from runtime.** If a module uses NimblePublisher
  or module attributes, the data is compiled into the BEAM — there is no runtime
  file I/O. Show this as a note, not as a message to the filesystem.
 - **Stay on task.** Do NOT review the code. Do NOT suggest improvements. Do NOT
  mention potential issues. Your only job is the diagram.
 - **Keep it readable.** Use `Note over` for context. Use short aliases for
  long module names in the participant declaration.