remove sequence diagram skill, moved to other repo
This commit is contained in:
parent
b3cdd93de8
commit
fddbb4e777
10
sequence-diagram-skill/.gitignore
vendored
10
sequence-diagram-skill/.gitignore
vendored
@ -1,10 +0,0 @@
|
|||||||
# autoresearch session
|
|
||||||
autoresearch.jsonl
|
|
||||||
autoresearch.ideas.md
|
|
||||||
|
|
||||||
# temp
|
|
||||||
.tmp_*
|
|
||||||
*.tmp
|
|
||||||
|
|
||||||
# OS
|
|
||||||
.DS_Store
|
|
||||||
@ -1,57 +0,0 @@
|
|||||||
#!/usr/bin/env bash
|
|
||||||
set -euo pipefail
|
|
||||||
|
|
||||||
# ─── autoresearch.checks.sh ─────────────────────────────────────────────────
|
|
||||||
# Backpressure checks for the sequence diagram skill.
|
|
||||||
# ─────────────────────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
||||||
SKILL_FILE="${SCRIPT_DIR}/skill/SKILL.md"
|
|
||||||
ERRORS=0
|
|
||||||
|
|
||||||
# 1. Skill exists and is non-empty
|
|
||||||
if [[ ! -s "$SKILL_FILE" ]]; then
|
|
||||||
echo "FAIL: skill/SKILL.md is missing or empty"
|
|
||||||
ERRORS=$((ERRORS + 1))
|
|
||||||
fi
|
|
||||||
|
|
||||||
# 2. Skill is not trivially short
|
|
||||||
CHAR_COUNT=$(wc -c < "$SKILL_FILE" 2>/dev/null || echo "0")
|
|
||||||
if (( CHAR_COUNT < 200 )); then
|
|
||||||
echo "FAIL: skill/SKILL.md is only ${CHAR_COUNT} chars (min: 200)"
|
|
||||||
ERRORS=$((ERRORS + 1))
|
|
||||||
fi
|
|
||||||
|
|
||||||
# 3. Skill is not too long (rough token proxy: 1500 tokens ≈ 6000 chars)
|
|
||||||
if (( CHAR_COUNT > 6000 )); then
|
|
||||||
echo "FAIL: skill/SKILL.md is ${CHAR_COUNT} chars (max: ~6000)"
|
|
||||||
ERRORS=$((ERRORS + 1))
|
|
||||||
fi
|
|
||||||
|
|
||||||
# 4. Skill must contain "sequenceDiagram" or "sequence diagram" (it's a diagram skill)
|
|
||||||
if ! grep -qi 'sequence.diagram' "$SKILL_FILE" 2>/dev/null; then
|
|
||||||
echo "FAIL: skill/SKILL.md doesn't mention sequence diagrams"
|
|
||||||
ERRORS=$((ERRORS + 1))
|
|
||||||
fi
|
|
||||||
|
|
||||||
# 5. Skill must NOT contain Firehose-specific code (no overfitting)
|
|
||||||
for term in "BlogController" "EngineeringBlog" "Firehose" "blogex" "priv/blog"; do
|
|
||||||
if grep -q "$term" "$SKILL_FILE" 2>/dev/null; then
|
|
||||||
echo "FAIL: skill/SKILL.md contains codebase-specific term '${term}'"
|
|
||||||
ERRORS=$((ERRORS + 1))
|
|
||||||
fi
|
|
||||||
done
|
|
||||||
|
|
||||||
# 6. Valid UTF-8
|
|
||||||
if ! iconv -f utf-8 -t utf-8 "$SKILL_FILE" > /dev/null 2>&1; then
|
|
||||||
echo "FAIL: skill/SKILL.md contains invalid UTF-8"
|
|
||||||
ERRORS=$((ERRORS + 1))
|
|
||||||
fi
|
|
||||||
|
|
||||||
if (( ERRORS > 0 )); then
|
|
||||||
echo "Checks FAILED with ${ERRORS} error(s)"
|
|
||||||
exit 1
|
|
||||||
else
|
|
||||||
echo "All checks passed. Skill: ${CHAR_COUNT} chars."
|
|
||||||
exit 0
|
|
||||||
fi
|
|
||||||
@ -1,96 +0,0 @@
|
|||||||
# Autoresearch: Sequence Diagram Skill for Elixir/Phoenix
|
|
||||||
|
|
||||||
## Objective
|
|
||||||
|
|
||||||
Optimize a pi skill (`skill/SKILL.md`) that generates Mermaid sequence diagrams
|
|
||||||
from Elixir/Phoenix codebases. The skill is used with a local Qwen3.5-35B-A3B
|
|
||||||
model running on CPU. The primary failure mode is **sidetracking** — the model
|
|
||||||
abandons the diagram task and starts reviewing/critiquing the code instead.
|
|
||||||
|
|
||||||
## Primary Metric
|
|
||||||
|
|
||||||
**score** — higher is better (0–18 scale, sum of 6 binary evals × 3 test inputs).
|
|
||||||
|
|
||||||
## Secondary Metrics
|
|
||||||
|
|
||||||
- **sidetrack_count** — number of test runs containing review/critique language (lower is better)
|
|
||||||
- **parse_count** — number of outputs that contain a parseable sequenceDiagram (higher is better)
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
Pi runs the skill against the Firehose codebase (mounted in the workspace) using
|
|
||||||
the target model. Scoring is done by bash scripts — no judge model needed.
|
|
||||||
|
|
||||||
## The Codebase Under Test
|
|
||||||
|
|
||||||
**Firehose** — a Phoenix blogging platform with a monorepo structure:
|
|
||||||
|
|
||||||
- `app/` — Phoenix web app (OTP app: `:firehose`)
|
|
||||||
- `lib/firehose_web/router.ex` — routes
|
|
||||||
- `lib/firehose_web/controllers/blog_controller.ex` — blog actions
|
|
||||||
- `lib/firehose_web/controllers/page_controller.ex` — homepage
|
|
||||||
- `lib/firehose/blogs/` — blog context modules (EngineeringBlog, ReleaseNotes)
|
|
||||||
- `blogex/` — sibling library for compile-time blog engine
|
|
||||||
- `lib/blogex/blog.ex` — `use Blogex.Blog` macro (NimblePublisher)
|
|
||||||
- `lib/blogex/components.ex` — Phoenix function components (post_meta, tag_list, etc.)
|
|
||||||
- `lib/blogex/router.ex` — API/feed routes
|
|
||||||
|
|
||||||
**Key architectural fact:** Blogex uses NimblePublisher. All blog posts are compiled
|
|
||||||
into BEAM module attributes at build time. There is NO runtime file I/O for reading
|
|
||||||
posts. Functions like `all_posts/0`, `get_post!/1`, `posts_by_tag/1` read from
|
|
||||||
`@posts` module attributes. This is the #1 thing models get wrong.
|
|
||||||
|
|
||||||
## Test Inputs (3 scenarios)
|
|
||||||
|
|
||||||
### 1. Click tag on post (small)
|
|
||||||
"Generate a sequence diagram for: a user on a blog post page clicks a tag link
|
|
||||||
(e.g., 'elixir'). Trace the full request from browser through to rendered response."
|
|
||||||
|
|
||||||
### 2. Show homepage (small)
|
|
||||||
"Generate a sequence diagram for: a user visits the homepage (GET /).
|
|
||||||
Trace from browser through to rendered HTML."
|
|
||||||
|
|
||||||
### 3. Add blog post on disk (larger, crosses compile/runtime boundary)
|
|
||||||
"Generate a sequence diagram for: a developer creates a new markdown file in
|
|
||||||
priv/blog/engineering/. Trace what happens from file creation through to the
|
|
||||||
post being visible on the blog. Include the compile-time and runtime phases."
|
|
||||||
|
|
||||||
## Eval Criteria (6 binary checks)
|
|
||||||
|
|
||||||
1. **has_diagram** — output contains `` ```mermaid `` and `sequenceDiagram`
|
|
||||||
2. **diagram_parseable** — the mermaid block is syntactically valid
|
|
||||||
3. **uses_real_modules** — diagram mentions at least 2 of: BlogController, EngineeringBlog, Blogex, Router, PageController
|
|
||||||
4. **uses_real_functions** — diagram mentions at least 1 of: posts_by_tag, get_post!, all_posts, paginate, resolve_blog, render
|
|
||||||
5. **no_sidetracking** — output does NOT contain code review language (see blocklist)
|
|
||||||
6. **concise** — total output is under 3000 characters
|
|
||||||
|
|
||||||
## Files in Scope
|
|
||||||
|
|
||||||
| File | Agent may edit? |
|
|
||||||
|------|-----------------|
|
|
||||||
| `skill/SKILL.md` | ✅ YES — the only file the agent modifies |
|
|
||||||
| `benchmark/tasks.jsonl` | ❌ NO |
|
|
||||||
| `scripts/score.sh` | ❌ NO |
|
|
||||||
| `scripts/run_one.sh` | ❌ NO |
|
|
||||||
| `scripts/sidetrack_blocklist.txt` | ❌ NO |
|
|
||||||
| `autoresearch.sh` | ❌ NO |
|
|
||||||
| `autoresearch.checks.sh` | ❌ NO |
|
|
||||||
|
|
||||||
## Constraints
|
|
||||||
|
|
||||||
- SKILL.md must stay under 1500 tokens.
|
|
||||||
- SKILL.md must NOT contain any code from the Firehose codebase (no overfitting).
|
|
||||||
- SKILL.md must remain generic — it should work for any Elixir/Phoenix codebase,
|
|
||||||
not just Firehose.
|
|
||||||
|
|
||||||
## What Has Been Tried
|
|
||||||
|
|
||||||
(autoresearch fills this in)
|
|
||||||
|
|
||||||
## Dead Ends
|
|
||||||
|
|
||||||
(autoresearch fills this in)
|
|
||||||
|
|
||||||
## Key Wins
|
|
||||||
|
|
||||||
(autoresearch fills this in)
|
|
||||||
@ -1,101 +0,0 @@
|
|||||||
#!/usr/bin/env bash
|
|
||||||
set -euo pipefail
|
|
||||||
|
|
||||||
# ─── autoresearch.sh ─────────────────────────────────────────────────────────
|
|
||||||
# Benchmark script for sequence diagram skill optimization.
|
|
||||||
# Runs all 3 test inputs, scores each, outputs METRIC lines.
|
|
||||||
# ─────────────────────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
||||||
source "${SCRIPT_DIR}/scripts/config.env" 2>/dev/null || true
|
|
||||||
|
|
||||||
# Defaults
|
|
||||||
SSH_TARGET="${SSH_TARGET:-}"
|
|
||||||
SSH_PORT="${SSH_PORT:-2222}"
|
|
||||||
export TASK_TIMEOUT="${TASK_TIMEOUT:-180}"
|
|
||||||
|
|
||||||
# ─── Pre-checks ──────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
SKILL_FILE="${SCRIPT_DIR}/skill/SKILL.md"
|
|
||||||
if [[ ! -s "$SKILL_FILE" ]]; then
|
|
||||||
echo "ERROR: skill/SKILL.md is missing or empty"
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
SKILL_CHARS=$(wc -c < "$SKILL_FILE")
|
|
||||||
echo "Skill: ${SKILL_CHARS} chars"
|
|
||||||
|
|
||||||
TASKS_FILE="${SCRIPT_DIR}/benchmark/tasks.jsonl"
|
|
||||||
if [[ ! -f "$TASKS_FILE" ]]; then
|
|
||||||
echo "ERROR: benchmark/tasks.jsonl not found"
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
echo "────────────────────────────────────────────────────"
|
|
||||||
|
|
||||||
# ─── Run all tasks ───────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
TMPDIR=$(mktemp -d)
|
|
||||||
TOTAL_SCORE=0
|
|
||||||
SIDETRACK_COUNT=0
|
|
||||||
PARSE_COUNT=0
|
|
||||||
TASK_COUNT=0
|
|
||||||
|
|
||||||
START_TIME=$(date +%s)
|
|
||||||
|
|
||||||
while IFS= read -r line; do
|
|
||||||
TASK_ID=$(echo "$line" | jq -r '.id')
|
|
||||||
TASK_PROMPT=$(echo "$line" | jq -r '.prompt')
|
|
||||||
TASK_COUNT=$((TASK_COUNT + 1))
|
|
||||||
|
|
||||||
OUTPUT_FILE="${TMPDIR}/${TASK_ID}.txt"
|
|
||||||
SCORE_FILE="${TMPDIR}/${TASK_ID}.json"
|
|
||||||
|
|
||||||
echo " [${TASK_COUNT}/3] ${TASK_ID}..."
|
|
||||||
|
|
||||||
# Run the task
|
|
||||||
bash "${SCRIPT_DIR}/scripts/run_one.sh" \
|
|
||||||
"$TASK_PROMPT" \
|
|
||||||
"$OUTPUT_FILE" \
|
|
||||||
"$SSH_TARGET" \
|
|
||||||
"$SSH_PORT"
|
|
||||||
|
|
||||||
# Score it
|
|
||||||
SCORE_JSON=$(bash "${SCRIPT_DIR}/scripts/score.sh" "$OUTPUT_FILE")
|
|
||||||
echo "$SCORE_JSON" > "$SCORE_FILE"
|
|
||||||
|
|
||||||
# Extract scores
|
|
||||||
TASK_SCORE=$(echo "$SCORE_JSON" | jq -r '.score')
|
|
||||||
TASK_SIDETRACK=$(echo "$SCORE_JSON" | jq -r '.no_sidetracking')
|
|
||||||
TASK_PARSE=$(echo "$SCORE_JSON" | jq -r '.diagram_parseable')
|
|
||||||
TASK_CHARS=$(echo "$SCORE_JSON" | jq -r '.char_count')
|
|
||||||
|
|
||||||
TOTAL_SCORE=$((TOTAL_SCORE + TASK_SCORE))
|
|
||||||
|
|
||||||
if (( TASK_SIDETRACK == 0 )); then
|
|
||||||
SIDETRACK_COUNT=$((SIDETRACK_COUNT + 1))
|
|
||||||
fi
|
|
||||||
|
|
||||||
if (( TASK_PARSE == 1 )); then
|
|
||||||
PARSE_COUNT=$((PARSE_COUNT + 1))
|
|
||||||
fi
|
|
||||||
|
|
||||||
echo " score=${TASK_SCORE}/6 sidetrack=$(( 1 - TASK_SIDETRACK )) parseable=${TASK_PARSE} chars=${TASK_CHARS}"
|
|
||||||
|
|
||||||
done < "$TASKS_FILE"
|
|
||||||
|
|
||||||
END_TIME=$(date +%s)
|
|
||||||
TOTAL_SECONDS=$((END_TIME - START_TIME))
|
|
||||||
|
|
||||||
# ─── Cleanup ─────────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
rm -rf "$TMPDIR"
|
|
||||||
|
|
||||||
# ─── Output METRIC lines ────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
echo ""
|
|
||||||
echo "METRIC score=${TOTAL_SCORE}"
|
|
||||||
echo "METRIC sidetrack_count=${SIDETRACK_COUNT}"
|
|
||||||
echo "METRIC parse_count=${PARSE_COUNT}"
|
|
||||||
echo "METRIC total_seconds=${TOTAL_SECONDS}"
|
|
||||||
echo "METRIC skill_chars=${SKILL_CHARS}"
|
|
||||||
@ -1,3 +0,0 @@
|
|||||||
{"id": "click-tag", "prompt": "Generate a sequence diagram for: a user on a blog post page clicks a tag link (e.g., 'elixir'). Trace the full HTTP request from browser through the Phoenix router, controller, domain modules, templates, and back to the browser. The codebase is in /home/analyst/workspace/. Read the relevant source files first."}
|
|
||||||
{"id": "show-homepage", "prompt": "Generate a sequence diagram for: a user visits the homepage (GET /). Trace from the browser's HTTP request through the Phoenix router, controller, template rendering, layout wrapping, and back to the browser. The codebase is in /home/analyst/workspace/. Read the relevant source files first."}
|
|
||||||
{"id": "add-post", "prompt": "Generate a sequence diagram for: a developer creates a new markdown file in priv/blog/engineering/ and the post becomes visible on the blog. Trace what happens including the compile-time phase (NimblePublisher, module recompilation) and the runtime request phase. The codebase is in /home/analyst/workspace/. Read the relevant source files first."}
|
|
||||||
@ -1,10 +0,0 @@
|
|||||||
# ─── config.env ──────────────────────────────────────────────────────────────
|
|
||||||
# Leave SSH_TARGET empty to run pi locally (e.g., on your Mac).
|
|
||||||
# Set it to use the remote pi container.
|
|
||||||
|
|
||||||
# Remote pi container (leave empty for local)
|
|
||||||
SSH_TARGET=""
|
|
||||||
SSH_PORT=2222
|
|
||||||
|
|
||||||
# Timeout per task (seconds)
|
|
||||||
TASK_TIMEOUT=180
|
|
||||||
@ -1,58 +0,0 @@
|
|||||||
#!/usr/bin/env bash
|
|
||||||
set -euo pipefail
|
|
||||||
|
|
||||||
# ─── run_one.sh ──────────────────────────────────────────────────────────────
|
|
||||||
# Run pi with the sequence-diagram skill on a single task.
|
|
||||||
# Usage: ./scripts/run_one.sh <task_prompt> <output_file> [ssh_target] [ssh_port]
|
|
||||||
#
|
|
||||||
# If ssh_target is provided, runs remotely via SSH into the pi container.
|
|
||||||
# Otherwise runs pi locally.
|
|
||||||
# ─────────────────────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
||||||
PROJECT_DIR="$(dirname "$SCRIPT_DIR")"
|
|
||||||
|
|
||||||
TASK_PROMPT="$1"
|
|
||||||
OUTPUT_FILE="$2"
|
|
||||||
SSH_TARGET="${3:-}"
|
|
||||||
SSH_PORT="${4:-2222}"
|
|
||||||
TIMEOUT="${TASK_TIMEOUT:-180}"
|
|
||||||
|
|
||||||
SKILL_FILE="${PROJECT_DIR}/skill/SKILL.md"
|
|
||||||
|
|
||||||
if [[ ! -f "$SKILL_FILE" ]]; then
|
|
||||||
echo "ERROR: skill/SKILL.md not found" >&2
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
SKILL_CONTENT=$(cat "$SKILL_FILE")
|
|
||||||
|
|
||||||
# Build the full prompt: skill instructions + task
|
|
||||||
FULL_PROMPT="## Skill Instructions
|
|
||||||
|
|
||||||
${SKILL_CONTENT}
|
|
||||||
|
|
||||||
## Task
|
|
||||||
|
|
||||||
${TASK_PROMPT}"
|
|
||||||
|
|
||||||
if [[ -n "$SSH_TARGET" ]]; then
|
|
||||||
# ─── Remote: SSH into pi container ───────────────────────────────────
|
|
||||||
PAYLOAD=$(jq -n --arg prompt "$FULL_PROMPT" '{"prompt": $prompt}')
|
|
||||||
|
|
||||||
ssh -p "$SSH_PORT" \
|
|
||||||
-o StrictHostKeyChecking=no \
|
|
||||||
-o ConnectTimeout=10 \
|
|
||||||
-o BatchMode=yes \
|
|
||||||
"$SSH_TARGET" \
|
|
||||||
"run-task --stdin --mode print --thinking off --timeout $TIMEOUT" \
|
|
||||||
<<< "$PAYLOAD" > "$OUTPUT_FILE" 2>/dev/null
|
|
||||||
else
|
|
||||||
# ─── Local: run pi directly ──────────────────────────────────────────
|
|
||||||
timeout "${TIMEOUT}s" pi \
|
|
||||||
--mode print \
|
|
||||||
--no-session \
|
|
||||||
--no-extensions \
|
|
||||||
--thinking none \
|
|
||||||
-p "$FULL_PROMPT" > "$OUTPUT_FILE" 2>/dev/null || true
|
|
||||||
fi
|
|
||||||
@ -1,109 +0,0 @@
|
|||||||
#!/usr/bin/env bash
|
|
||||||
set -euo pipefail
|
|
||||||
|
|
||||||
# ─── score.sh ────────────────────────────────────────────────────────────────
|
|
||||||
# Score a single diagram output against 6 binary evals.
|
|
||||||
# Usage: ./scripts/score.sh <output_file>
|
|
||||||
# Prints a JSON line with pass/fail for each eval and total score.
|
|
||||||
# ─────────────────────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
||||||
OUTPUT_FILE="$1"
|
|
||||||
|
|
||||||
if [[ ! -f "$OUTPUT_FILE" ]]; then
|
|
||||||
echo '{"error": "file not found", "score": 0}'
|
|
||||||
exit 0
|
|
||||||
fi
|
|
||||||
|
|
||||||
CONTENT=$(cat "$OUTPUT_FILE")
|
|
||||||
CHAR_COUNT=${#CONTENT}
|
|
||||||
|
|
||||||
# ─── Eval 1: has_diagram ─────────────────────────────────────────────────────
|
|
||||||
# Output contains a mermaid fenced block with sequenceDiagram
|
|
||||||
has_diagram=0
|
|
||||||
if echo "$CONTENT" | grep -q '```mermaid' && echo "$CONTENT" | grep -q 'sequenceDiagram'; then
|
|
||||||
has_diagram=1
|
|
||||||
fi
|
|
||||||
|
|
||||||
# ─── Eval 2: diagram_parseable ───────────────────────────────────────────────
|
|
||||||
# Extract the mermaid block and check basic syntax
|
|
||||||
diagram_parseable=0
|
|
||||||
if (( has_diagram == 1 )); then
|
|
||||||
# Extract mermaid block
|
|
||||||
MERMAID_BLOCK=$(echo "$CONTENT" | awk '/^```mermaid/{found=1;next} found && /^```$/{exit} found{print}')
|
|
||||||
|
|
||||||
if [[ -n "$MERMAID_BLOCK" ]]; then
|
|
||||||
# Basic syntax checks:
|
|
||||||
# - Has "sequenceDiagram" keyword
|
|
||||||
# - Has at least one "participant" line
|
|
||||||
# - Has at least one "->>", "-->>", or "->>" message line
|
|
||||||
has_keyword=$(echo "$MERMAID_BLOCK" | grep -c 'sequenceDiagram' || true)
|
|
||||||
has_participant=$(echo "$MERMAID_BLOCK" | grep -c 'participant' || true)
|
|
||||||
has_message=$(echo "$MERMAID_BLOCK" | grep -cE '\->>|-->>|\->' || true)
|
|
||||||
|
|
||||||
if (( has_keyword > 0 && has_participant > 0 && has_message > 0 )); then
|
|
||||||
diagram_parseable=1
|
|
||||||
fi
|
|
||||||
fi
|
|
||||||
|
|
||||||
# If mmdc (mermaid CLI) is available, use it for real validation
|
|
||||||
if command -v mmdc &> /dev/null && (( diagram_parseable == 1 )); then
|
|
||||||
TMPFILE=$(mktemp /tmp/mermaid_XXXXXX.mmd)
|
|
||||||
echo "$MERMAID_BLOCK" > "$TMPFILE"
|
|
||||||
if mmdc -i "$TMPFILE" -o /dev/null 2>/dev/null; then
|
|
||||||
diagram_parseable=1
|
|
||||||
else
|
|
||||||
diagram_parseable=0
|
|
||||||
fi
|
|
||||||
rm -f "$TMPFILE"
|
|
||||||
fi
|
|
||||||
fi
|
|
||||||
|
|
||||||
# ─── Eval 3: uses_real_modules ───────────────────────────────────────────────
|
|
||||||
# Diagram mentions at least 2 real modules from the Firehose codebase
|
|
||||||
uses_real_modules=0
|
|
||||||
module_count=0
|
|
||||||
for module in BlogController EngineeringBlog ReleaseNotes Blogex Router PageController Layouts; do
|
|
||||||
if echo "$CONTENT" | grep -qi "$module"; then
|
|
||||||
module_count=$((module_count + 1))
|
|
||||||
fi
|
|
||||||
done
|
|
||||||
if (( module_count >= 2 )); then
|
|
||||||
uses_real_modules=1
|
|
||||||
fi
|
|
||||||
|
|
||||||
# ─── Eval 4: uses_real_functions ─────────────────────────────────────────────
|
|
||||||
# Diagram mentions at least 1 real function from the codebase
|
|
||||||
uses_real_functions=0
|
|
||||||
for func in posts_by_tag get_post all_posts paginate resolve_blog render recent_posts; do
|
|
||||||
if echo "$CONTENT" | grep -qi "$func"; then
|
|
||||||
uses_real_functions=1
|
|
||||||
break
|
|
||||||
fi
|
|
||||||
done
|
|
||||||
|
|
||||||
# ─── Eval 5: no_sidetracking ────────────────────────────────────────────────
|
|
||||||
# Output does NOT contain code review / critique language
|
|
||||||
no_sidetracking=1
|
|
||||||
BLOCKLIST="${SCRIPT_DIR}/sidetrack_blocklist.txt"
|
|
||||||
if [[ -f "$BLOCKLIST" ]]; then
|
|
||||||
while IFS= read -r phrase; do
|
|
||||||
phrase=$(echo "$phrase" | xargs) # trim whitespace
|
|
||||||
if [[ -n "$phrase" ]] && echo "$CONTENT" | grep -qi "$phrase"; then
|
|
||||||
no_sidetracking=0
|
|
||||||
break
|
|
||||||
fi
|
|
||||||
done < "$BLOCKLIST"
|
|
||||||
fi
|
|
||||||
|
|
||||||
# ─── Eval 6: concise ────────────────────────────────────────────────────────
|
|
||||||
# Total output under 3000 characters
|
|
||||||
concise=0
|
|
||||||
if (( CHAR_COUNT < 3000 )); then
|
|
||||||
concise=1
|
|
||||||
fi
|
|
||||||
|
|
||||||
# ─── Total ───────────────────────────────────────────────────────────────────
|
|
||||||
score=$((has_diagram + diagram_parseable + uses_real_modules + uses_real_functions + no_sidetracking + concise))
|
|
||||||
|
|
||||||
echo "{\"score\":${score},\"has_diagram\":${has_diagram},\"diagram_parseable\":${diagram_parseable},\"uses_real_modules\":${uses_real_modules},\"uses_real_functions\":${uses_real_functions},\"no_sidetracking\":${no_sidetracking},\"concise\":${concise},\"char_count\":${CHAR_COUNT}}"
|
|
||||||
@ -1,23 +0,0 @@
|
|||||||
potential issue
|
|
||||||
consider using
|
|
||||||
should be
|
|
||||||
could be improved
|
|
||||||
recommend
|
|
||||||
suggestion
|
|
||||||
improvement
|
|
||||||
code review
|
|
||||||
refactor
|
|
||||||
best practice
|
|
||||||
security concern
|
|
||||||
vulnerability
|
|
||||||
error handling could
|
|
||||||
missing error
|
|
||||||
you might want
|
|
||||||
it would be better
|
|
||||||
note that this
|
|
||||||
be aware that
|
|
||||||
one concern
|
|
||||||
problematic
|
|
||||||
anti-pattern
|
|
||||||
smell
|
|
||||||
technical debt
|
|
||||||
@ -1,54 +0,0 @@
|
|||||||
---
|
|
||||||
name: sequence-diagram
|
|
||||||
description: Generate a Mermaid sequence diagram showing message flow across module boundaries for an Elixir/Phoenix interaction. Use when asked to diagram, trace, or visualize a user interaction, request flow, or feature path through the codebase.
|
|
||||||
---
|
|
||||||
|
|
||||||
# Sequence Diagram Skill
|
|
||||||
|
|
||||||
Generate a Mermaid `sequenceDiagram` that traces a specific user interaction
|
|
||||||
across module boundaries in an Elixir/Phoenix codebase.
|
|
||||||
|
|
||||||
## Your Task
|
|
||||||
|
|
||||||
Given a description of an interaction (e.g., "user clicks a tag on a blog post")
|
|
||||||
and access to the source files, produce a Mermaid sequence diagram that accurately
|
|
||||||
shows the message flow between modules.
|
|
||||||
|
|
||||||
## Process
|
|
||||||
|
|
||||||
1. **Identify the entry point.** What triggers this interaction? (HTTP request,
|
|
||||||
LiveView event, PubSub message, etc.)
|
|
||||||
2. **Read the router** to find which controller/live module handles the route.
|
|
||||||
3. **Read the controller/live module** to find which functions are called and
|
|
||||||
which domain modules they delegate to.
|
|
||||||
4. **Read the domain modules** to understand what they return and how.
|
|
||||||
5. **Read templates/components** if the rendering path matters.
|
|
||||||
6. **Emit the diagram.** Use `sequenceDiagram` with participants named after
|
|
||||||
actual modules. Show function calls as messages.
|
|
||||||
|
|
||||||
## Output Format
|
|
||||||
|
|
||||||
Respond with ONLY a fenced Mermaid code block. No preamble, no explanation,
|
|
||||||
no code review, no suggestions. Just the diagram.
|
|
||||||
|
|
||||||
```mermaid
|
|
||||||
sequenceDiagram
|
|
||||||
participant Browser
|
|
||||||
participant Router as FirehoseWeb.Router
|
|
||||||
...
|
|
||||||
```
|
|
||||||
|
|
||||||
## Rules
|
|
||||||
|
|
||||||
- **Participants must be real modules** from the codebase. Never invent modules.
|
|
||||||
- **Messages must be real function calls** or HTTP verbs. Use the actual function
|
|
||||||
names you found in the source (e.g., `blog.posts_by_tag(tag)`, not "get posts").
|
|
||||||
- **Show the return path.** Responses flow back: module returns data, controller
|
|
||||||
renders, browser receives HTML.
|
|
||||||
- **Distinguish compile-time from runtime.** If a module uses NimblePublisher
|
|
||||||
or module attributes, the data is compiled into the BEAM — there is no runtime
|
|
||||||
file I/O. Show this as a note, not as a message to the filesystem.
|
|
||||||
- **Stay on task.** Do NOT review the code. Do NOT suggest improvements. Do NOT
|
|
||||||
mention potential issues. Your only job is the diagram.
|
|
||||||
- **Keep it readable.** Use `Note over` for context. Use short aliases for
|
|
||||||
long module names in the participant declaration.
|
|
||||||
Loading…
x
Reference in New Issue
Block a user