Compare commits

..

13 Commits

Author SHA1 Message Date
f6d7416b00 Publish pi-notifications and pi-llm-performance 2026-04-28 13:13:52 +01:00
93a5675f06 chore(pi-notifications): remove remaining console.log 2026-04-28 13:06:44 +01:00
113878e83f chore(pi-notifications): remove debug mode and console.log noise
- Remove PI_NOTIFICATION_DEBUG env var and steer signal logic
- Remove console.log on extension load
- Keep only: session_start notification + agent_end beep
- Clean README without debug references
2026-04-28 13:04:33 +01:00
7faabcb038 chore: update llm-metrics log 2026-04-28 13:03:38 +01:00
d7eabfffbb docs(pi-notifications): update README for audio-based alerts 2026-04-28 13:03:11 +01:00
823af3c486 feat(pi-notifications): switch to afplay audio instead of desktop notifications
- Uses afplay to play an audio file (default: Glass.aiff)
- Configurable via PI_NOTIFICATION_AUDIO env var
- Works from sandboxed context — no osascript needed
- test-notify.ts verifies audio playback standalone
- Synced to auto-discovery extension path
2026-04-28 13:02:18 +01:00
040513e1d6 feat(pi-notifications): use 'tell application' for notifications to suppress Show button
- Tell target app (default: Ghostty) to display notification instead of raw osascript
- This attributes notification to the app, avoiding the 'Show' button that opens Script Editor
- Configurable via PI_NOTIFICATION_APP env var
- test-notify.ts falls back to plain display notification if target app isn't running
- Synced to auto-discovery extension path
2026-04-28 12:56:02 +01:00
383cb46fe7 feat(pi-notifications): add standalone test-notify.ts and fix AppleScript sound bug
- Add packages/pi-notifications/src/test-notify.ts for isolated testing
  - Run with: node --input-type=module -e "import {createJiti} from ..." ./packages/pi-notifications/src/test-notify.ts
  - Decoupled from agent loop — verifies osascript in extension context
- Fix: 'default' is a reserved word in AppleScript, skip sound param when sound='default'
- Synced fix to auto-discovery extension path
2026-04-28 12:10:30 +01:00
45a13fd08c feat(pi-notifications): add PI_NOTIFICATION_DEBUG mode with visible steer signal
- Add PI_NOTIFICATION_DEBUG=true env var
- When enabled, calls ctx.ui.steer() instead of desktop notification
- Lets you verify trigger logic in the agent loop without actual notifications
- Synced to both monorepo and auto-discovery extension paths
2026-04-28 12:09:06 +01:00
ce4d6c5971 ignore llm-metrics log 2026-04-28 10:53:44 +01:00
98e18643c5 pi-performance: Make Time to first token more accurate.
Summary of changes:

 ┌──────┬──────────────────────────────────────────────────────────────────┬──────────┐
 │ Step │ Change                                                           │ Result   │
 ├──────┼──────────────────────────────────────────────────────────────────┼──────────┤
 │ 1    │ Removed duplicate llm-performance-metrics.test.ts                │ 14 tests │
 ├──────┼──────────────────────────────────────────────────────────────────┼──────────┤
 │ 2    │ Added rawTimestamps assertions to toLogEntry test                │ 14 tests │
 ├──────┼──────────────────────────────────────────────────────────────────┼──────────┤
 │ 3    │ Added rawTimestamps assertions to single-turn aggregate test     │ 14 tests │
 ├──────┼──────────────────────────────────────────────────────────────────┼──────────┤
 │ 4    │ Added rawTimestamps assertions to multi-turn aggregate test      │ 14 tests │
 ├──────┼──────────────────────────────────────────────────────────────────┼──────────┤
 │ 5    │ Added negative TTFT filtering test                               │ 15 tests │
 ├──────┼──────────────────────────────────────────────────────────────────┼──────────┤
 │ 6    │ Added "first turn missing TTFT, later turns have it" test        │ 16 tests │
 ├──────┼──────────────────────────────────────────────────────────────────┼──────────┤
 │ 7    │ Added sanity check tests (warn on >500 tok/s, no warn otherwise) │ 18 tests │
 └──────┴──────────────────────────────────────────────────────────────────┴──────────┘

This is what it looks like now when I run `pi`
 📊 Performance: llama.cpp/Qwen3.6-35B-A3B-MXFP4_MOE.gguf
   Prefill: 15,460 tokens @ 20104.0 tok/s
   Generation: 12,179 tokens @ 52.6 tok/s
   Combined: 27,639 tokens @ 118.9 tok/s (3.9m total)
   TTFT: 769ms
   Turns: 36
2026-04-28 10:52:00 +01:00
a38c76c65e move pi-llm-performance to monorepo, update README and add deno.json 2026-04-28 10:06:03 +01:00
0cf13ed54e move pi-llm-performance to this repo 2026-04-28 10:00:45 +01:00
19 changed files with 1660 additions and 1 deletions

1
.gitignore vendored
View File

@ -1,3 +1,4 @@
node_modules/
.pnpm-store/
pnpm-lock.yaml
.pi/llm-metrics.log

0
.nvimlog Normal file
View File

View File

@ -6,7 +6,7 @@ Experimental monorepo for [Pi coding agent](https://github.com/mariozechner/pi-c
### `pi-turn-limit`
Limits the number of turns (agent round-trips) in a Pi session. When the limit is reached, the user is prompted to continue or abort.
Limits the number of turns (agent round-trips) in a Pi session. When the limit is reached, the user is prompted to continue or abort. Use when you want to be in-the-loop, or when a model misbehaves and does too many tool calls. It is a good way to control the *Time To Next Interaction*.
- **Default limit:** 25 turns
- **Override:** set `PI_MAX_TURNS` environment variable to a positive integer
@ -15,6 +15,26 @@ Limits the number of turns (agent round-trips) in a Pi session. When the limit i
See [packages/pi-turn-limit/README.md](packages/pi-turn-limit/README.md) for details and the [Allium spec](packages/pi-turn-limit/turn-limit.allium).
### `pi-notifications`
Audio alerts via `afplay` when the agent finishes a turn. Run the agent and step away — you'll hear when input is needed. No multi-tasking required, but it gives you the breathing room to stretch, grab a coffee, or write something down without staring at the screen.
- **Config:** `PI_NOTIFICATION_ENABLED`, `PI_NOTIFICATION_AGENT_END`, `PI_NOTIFICATION_AUDIO` (defaults to macOS Glass sound)
- **Platform:** macOS (uses `afplay`)
See [packages/pi-notifications/README.md](packages/pi-notifications/README.md) for details.
### `pi-llm-performance`
Captures and displays LLM inference performance metrics (TTFT, prefill/generation throughput, combined speed) after each prompt. Lets you benchmark shiny new local inference server optimizations at a glance — no need to dig through different server logs.
- **Output:** TUI notification + status bar (`📊 tok/s`) + JSONL log at `.pi/llm-metrics.log`
- **Sanity checks:** Warns when generation speed exceeds 500 tok/s (physically impossible)
See [packages/pi-llm-performance/README.md](packages/pi-llm-performance/README.md) for details.
## Installation
This is an early release — not on npm yet. To install from source:

View File

@ -1,5 +1,6 @@
[tools]
bun = "latest"
deno = "latest"
elixir = "latest"
erlang = "latest"
node = "24"

View File

@ -0,0 +1,94 @@
# pi-llm-performance
Pi coding agent extension that captures and displays LLM inference performance metrics.
## Why
Understanding model performance helps you:
- **Compare models** — measure throughput differences between providers and model sizes
- **Debug slowdowns** — spot when prefill or generation degrades unexpectedly
- **Validate hardware** — confirm your setup delivers expected token throughput
- **Tune parameters** — evaluate the impact of speculative decoding, context window size, etc.
## What it measures
| Metric | Description |
|--------|-------------|
| **TTFT** | Time to first token (ms) — how long before you see output |
| **Prefill speed** | Input tokens processed per second during the prefill phase |
| **Generation speed** | Output tokens generated per second during the generation phase |
| **Combined speed** | Total tokens (input + output) per second across the full prompt |
## How it works
The extension hooks into pi's agent lifecycle events:
| Event | Behavior |
|-------|----------|
| `agent_start` | Records provider/model, resets counters |
| `turn_start` | Marks turn boundary |
| `message_update` | Captures TTFT on first token delta |
| `turn_end` | Records token counts and turn duration |
| `agent_end` | Aggregates metrics, displays in TUI, logs to `.pi/llm-metrics.log` |
## Output
### TUI notification
After each prompt completes, a notification shows:
```
📊 Performance: llama.cpp/Qwen3.6-35B-A3B-MXFP4_MOE.gguf
Prefill: 1,240 tokens @ 68.3 tok/s
Generation: 312 tokens @ 89.9 tok/s
Combined: 1,552 tokens @ 78.4 tok/s (19.8s total)
TTFT: 1250ms
```
### Status bar
The footer status shows combined throughput: `📊 78.4 tok/s`
### Log file
Each prompt writes a JSONL entry to `.pi/llm-metrics.log`:
```json
{
"timestamp": "2026-04-28T10:05:00.000Z",
"provider": "llama.cpp",
"model": "Qwen3.6-35B-A3B-MXFP4_MOE.gguf",
"turnCount": 1,
"inputTokens": 1240,
"outputTokens": 312,
"totalTokens": 1552,
"prefillTokensPerSec": 68.3,
"generationTokensPerSec": 89.9,
"combinedTokensPerSec": 78.4,
"totalDurationMs": 19800,
"timeToFirstTokenMs": 1250,
"rawTimestamps": {
"ttftMs": 1250,
"generationDurationMs": 18550,
"turns": [{"turnId": "turn-0", "durationMs": 19800, "ttftMs": 1250}]
}
}
```
## Sanity checks
The extension warns to the console if generation speed exceeds 500 tok/s (physically impossible for any known model/hardware setup). This helps catch timing bugs early.
## Development
This package lives in the `pi-extensions` monorepo.
```bash
pnpm install # workspace setup
deno test # run tests
```
## License
MIT

View File

@ -0,0 +1,8 @@
{
"imports": {
"@std/assert": "jsr:@std/assert@^1.0.0"
},
"tasks": {
"test": "deno test src/"
}
}

31
packages/pi-llm-performance/deno.lock generated Normal file
View File

@ -0,0 +1,31 @@
{
"version": "5",
"specifiers": {
"jsr:@std/assert@*": "1.0.19",
"jsr:@std/assert@^1.0.19": "1.0.19",
"jsr:@std/internal@^1.0.12": "1.0.12",
"jsr:@std/testing@*": "1.0.18"
},
"jsr": {
"@std/assert@1.0.19": {
"integrity": "eaada96ee120cb980bc47e040f82814d786fe8162ecc53c91d8df60b8755991e",
"dependencies": [
"jsr:@std/internal"
]
},
"@std/internal@1.0.12": {
"integrity": "972a634fd5bc34b242024402972cd5143eac68d8dffaca5eaa4dba30ce17b027"
},
"@std/testing@1.0.18": {
"integrity": "d3152f57b11666bf6358d0e127c7e3488e91178b0c2d8fbf0793e1c53cd13cb1",
"dependencies": [
"jsr:@std/assert@^1.0.19"
]
}
},
"workspace": {
"dependencies": [
"jsr:@std/assert@1"
]
}
}

View File

@ -0,0 +1,17 @@
{
"name": "pi-llm-performance",
"version": "0.1.0",
"description": "LLM performance metrics extension",
"type": "module",
"exports": {
".": "./src/llm-performance-metrics.ts"
},
"keywords": ["pi-package"],
"pi": {
"extensions": ["src/llm-performance-metrics.ts"]
},
"peerDependencies": {
"@mariozechner/pi-coding-agent": "*"
},
"license": "MIT"
}

View File

@ -0,0 +1,558 @@
import {
calculateTurnMetrics,
aggregatePromptMetrics,
formatMetricsForDisplay,
toLogEntry,
type TurnMetrics,
type PromptMetrics,
} from "./llm-metrics-core.ts";
import { assertEquals, assertGreaterOrEqual, assertLessOrEqual } from "jsr:@std/assert";
Deno.test("calculateTurnMetrics - creates turn metrics object", () => {
const result = calculateTurnMetrics({
turnId: "turn-1",
inputTokens: 100,
outputTokens: 50,
durationMs: 2000,
timeToFirstTokenMs: 500,
});
assertEquals(result.turnId, "turn-1");
assertEquals(result.inputTokens, 100);
assertEquals(result.outputTokens, 50);
assertEquals(result.durationMs, 2000);
assertEquals(result.timeToFirstTokenMs, 500);
});
Deno.test("calculateTurnMetrics - handles missing timeToFirstToken", () => {
const result = calculateTurnMetrics({
turnId: "turn-1",
inputTokens: 100,
outputTokens: 50,
durationMs: 2000,
});
assertEquals(result.timeToFirstTokenMs, undefined);
});
Deno.test("aggregatePromptMetrics - aggregates single turn", () => {
const turnMetrics: TurnMetrics[] = [
{
turnId: "turn-1",
inputTokens: 1000,
outputTokens: 200,
durationMs: 5000,
timeToFirstTokenMs: 800,
},
];
const result = aggregatePromptMetrics({
provider: "anthropic",
model: "claude-sonnet-4",
turnMetrics,
});
assertEquals(result.provider, "anthropic");
assertEquals(result.model, "claude-sonnet-4");
assertEquals(result.turnCount, 1);
assertEquals(result.inputTokens, 1000);
assertEquals(result.outputTokens, 200);
assertEquals(result.totalTokens, 1200);
assertEquals(result.totalDurationMs, 5000);
assertEquals(result.timeToFirstTokenMs, 800);
// Tokens per second calculations
// prefill: 1000 input tokens / 0.8s TTFT = 1250 tok/s
assertEquals(result.prefillTokensPerSec, 1250);
// generation: 200 output tokens / 4.2s (5s - 0.8s) = 47.62 tok/s
assertGreaterOrEqual(result.generationTokensPerSec, 47.6);
assertLessOrEqual(result.generationTokensPerSec, 47.7);
// combined: 1200 total tokens / 5s = 240 tok/s
assertEquals(result.combinedTokensPerSec, 240);
// rawTimestamps
assertEquals(result.rawTimestamps?.ttftMs, 800);
assertEquals(result.rawTimestamps?.allTtftMs, [800]);
assertEquals(result.rawTimestamps?.generationDurationMs, 4200);
});
Deno.test("aggregatePromptMetrics - aggregates multiple turns", () => {
const turnMetrics: TurnMetrics[] = [
{
turnId: "turn-1",
inputTokens: 1000,
outputTokens: 200,
durationMs: 3000,
timeToFirstTokenMs: 800,
},
{
turnId: "turn-2",
inputTokens: 500,
outputTokens: 150,
durationMs: 2000,
},
{
turnId: "turn-3",
inputTokens: 300,
outputTokens: 100,
durationMs: 1500,
},
];
const result = aggregatePromptMetrics({
provider: "openai",
model: "gpt-4o",
turnMetrics,
});
assertEquals(result.turnCount, 3);
assertEquals(result.inputTokens, 1800); // 1000 + 500 + 300
assertEquals(result.outputTokens, 450); // 200 + 150 + 100
assertEquals(result.totalTokens, 2250);
assertEquals(result.totalDurationMs, 6500); // 3000 + 2000 + 1500
assertEquals(result.timeToFirstTokenMs, 800); // From first turn only
// Tokens per second: prefill uses TTFT (0.8s), generation uses (total - TTFT) = 5.7s
// prefill: 1800 / 0.8 = 2250 tok/s
assertEquals(result.prefillTokensPerSec, 2250);
// generation: 450 / 5.7 = 78.95 tok/s
assertGreaterOrEqual(result.generationTokensPerSec, 78.9);
assertLessOrEqual(result.generationTokensPerSec, 79.0);
// combined: 2250 / 6.5 = 346.15 tok/s
assertGreaterOrEqual(result.combinedTokensPerSec, 346.1);
assertLessOrEqual(result.combinedTokensPerSec, 346.2);
// rawTimestamps: only turn-1 has valid TTFT, turns 2+ have none
assertEquals(result.rawTimestamps?.ttftMs, 800);
assertEquals(result.rawTimestamps?.allTtftMs, [800]);
assertEquals(result.rawTimestamps?.generationDurationMs, 5700);
assertEquals(result.rawTimestamps?.turns.length, 3);
assertEquals(result.rawTimestamps?.turns[0].ttftMs, 800);
assertEquals(result.rawTimestamps?.turns[1].ttftMs, undefined);
assertEquals(result.rawTimestamps?.turns[2].ttftMs, undefined);
});
Deno.test("aggregatePromptMetrics - handles empty turn list", () => {
const result = aggregatePromptMetrics({
provider: "anthropic",
model: "claude-sonnet-4",
turnMetrics: [],
});
assertEquals(result.turnCount, 0);
assertEquals(result.inputTokens, 0);
assertEquals(result.outputTokens, 0);
assertEquals(result.totalTokens, 0);
assertEquals(result.prefillTokensPerSec, 0);
assertEquals(result.generationTokensPerSec, 0);
assertEquals(result.combinedTokensPerSec, 0);
assertEquals(result.totalDurationMs, 0);
assertEquals(result.timeToFirstTokenMs, undefined);
});
Deno.test("formatMetricsForDisplay - formats single turn metrics", () => {
const metrics: PromptMetrics = {
provider: "anthropic",
model: "claude-sonnet-4",
turnCount: 1,
inputTokens: 1250,
outputTokens: 342,
totalTokens: 1592,
prefillTokensPerSec: 482.1,
generationTokensPerSec: 18.3,
combinedTokensPerSec: 38.0,
totalDurationMs: 21600,
timeToFirstTokenMs: 850,
turns: [],
};
const display = formatMetricsForDisplay(metrics);
assertEquals(display.includes("anthropic/claude-sonnet-4"), true);
assertEquals(display.includes("1,250 tokens"), true);
assertEquals(display.includes("482.1 tok/s"), true);
assertEquals(display.includes("342 tokens"), true);
assertEquals(display.includes("18.3 tok/s"), true);
assertEquals(display.includes("1,592 tokens"), true);
assertEquals(display.includes("38.0 tok/s"), true);
assertEquals(display.includes("21.6s"), true);
assertEquals(display.includes("TTFT: 850ms"), true);
});
Deno.test("formatMetricsForDisplay - formats duration as minutes when over 60s", () => {
const metrics: PromptMetrics = {
provider: "openai",
model: "gpt-4o",
turnCount: 1,
inputTokens: 5000,
outputTokens: 1000,
totalTokens: 6000,
prefillTokensPerSec: 50,
generationTokensPerSec: 10,
combinedTokensPerSec: 60,
totalDurationMs: 120000, // 2 minutes
timeToFirstTokenMs: 1500,
turns: [],
};
const display = formatMetricsForDisplay(metrics);
assertEquals(display.includes("2.0m"), true);
});
Deno.test("formatMetricsForDisplay - omits turn count when single turn", () => {
const metrics: PromptMetrics = {
provider: "anthropic",
model: "claude-sonnet-4",
turnCount: 1,
inputTokens: 100,
outputTokens: 50,
totalTokens: 150,
prefillTokensPerSec: 20,
generationTokensPerSec: 10,
combinedTokensPerSec: 30,
totalDurationMs: 5000,
timeToFirstTokenMs: 500,
turns: [],
};
const display = formatMetricsForDisplay(metrics);
assertEquals(display.includes("Turns: 1"), false);
});
Deno.test("formatMetricsForDisplay - omits prefill/generation when TTFT is unavailable", () => {
const metrics: PromptMetrics = {
provider: "openai",
model: "gpt-4o",
turnCount: 1,
inputTokens: 1000,
outputTokens: 200,
totalTokens: 1200,
prefillTokensPerSec: 0,
generationTokensPerSec: 0,
combinedTokensPerSec: 240,
totalDurationMs: 5000,
timeToFirstTokenMs: undefined,
turns: [],
};
const display = formatMetricsForDisplay(metrics);
assertEquals(display.includes("Prefill:"), false);
assertEquals(display.includes("Generation:"), false);
assertEquals(display.includes("1,200 tokens"), true);
assertEquals(display.includes("240.0 tok/s"), true);
});
Deno.test("formatMetricsForDisplay - shows turn count when multiple turns", () => {
const metrics: PromptMetrics = {
provider: "anthropic",
model: "claude-sonnet-4",
turnCount: 3,
inputTokens: 100,
outputTokens: 50,
totalTokens: 150,
prefillTokensPerSec: 20,
generationTokensPerSec: 10,
combinedTokensPerSec: 30,
totalDurationMs: 5000,
timeToFirstTokenMs: 500,
turns: [],
};
const display = formatMetricsForDisplay(metrics);
assertEquals(display.includes("Turns: 3"), true);
});
Deno.test("aggregatePromptMetrics - uses first valid TTFT when turn-0 has none", () => {
// Edge case: turn-0 has no TTFT, turn-1 does. Should use turn-1's TTFT.
const turnMetrics: TurnMetrics[] = [
{
turnId: "turn-0",
inputTokens: 1000,
outputTokens: 200,
durationMs: 3000,
// No timeToFirstTokenMs
},
{
turnId: "turn-1",
inputTokens: 500,
outputTokens: 150,
durationMs: 2000,
timeToFirstTokenMs: 600,
},
];
const result = aggregatePromptMetrics({
provider: "llama.cpp",
model: "Qwen3.6-35B",
turnMetrics,
});
// First valid TTFT is from turn-1 (600ms)
assertEquals(result.rawTimestamps?.allTtftMs, [600]);
assertEquals(result.rawTimestamps?.ttftMs, 600);
// Generation duration = totalDuration - firstValidTTFT = 5000 - 600 = 4400
assertEquals(result.rawTimestamps?.generationDurationMs, 4400);
// prefill: 1500 / 0.6 = 2500
assertEquals(result.prefillTokensPerSec, 2500);
// generation: 350 / 4.4 = 79.55
assertGreaterOrEqual(result.generationTokensPerSec, 79.5);
assertLessOrEqual(result.generationTokensPerSec, 79.6);
});
Deno.test("aggregatePromptMetrics - filters out negative TTFT values", () => {
// Simulates the bug where turn-2 got TTFT=-20390 from the old global-firstToken code
const turnMetrics: TurnMetrics[] = [
{
turnId: "turn-0",
inputTokens: 1000,
outputTokens: 200,
durationMs: 3000,
timeToFirstTokenMs: 800,
},
{
turnId: "turn-1",
inputTokens: 500,
outputTokens: 150,
durationMs: 2000,
timeToFirstTokenMs: -5000, // Invalid: negative
},
];
const result = aggregatePromptMetrics({
provider: "llama.cpp",
model: "Qwen3.6-35B",
turnMetrics,
});
// Only turn-0's TTFT (800) should be used; turn-1's negative value is filtered
assertEquals(result.rawTimestamps?.allTtftMs, [800]);
assertEquals(result.rawTimestamps?.ttftMs, 800);
// Generation duration = totalDuration - firstTurnTTFT = 5000 - 800 = 4200
assertEquals(result.rawTimestamps?.generationDurationMs, 4200);
// prefill: 1500 / 0.8 = 1875
assertEquals(result.prefillTokensPerSec, 1875);
// generation: 350 / 4.2 = 83.33
assertGreaterOrEqual(result.generationTokensPerSec, 83.3);
assertLessOrEqual(result.generationTokensPerSec, 83.4);
});
Deno.test("toLogEntry - creates JSON-serializable log entry", () => {
const metrics: PromptMetrics = {
provider: "anthropic",
model: "claude-sonnet-4",
turnCount: 2,
inputTokens: 1250,
outputTokens: 342,
totalTokens: 1592,
prefillTokensPerSec: 482.12345,
generationTokensPerSec: 18.34567,
combinedTokensPerSec: 38.09876,
totalDurationMs: 21600,
timeToFirstTokenMs: 850,
rawTimestamps: {
ttftMs: 850,
allTtftMs: [850],
generationDurationMs: 20750,
turns: [],
},
turns: [],
};
const logEntry = toLogEntry(metrics);
assertEquals(logEntry.provider, "anthropic");
assertEquals(logEntry.model, "claude-sonnet-4");
assertEquals(logEntry.turnCount, 2);
assertEquals(logEntry.inputTokens, 1250);
assertEquals(logEntry.outputTokens, 342);
assertEquals(logEntry.totalTokens, 1592);
// Rounded to 2 decimal places
assertEquals(logEntry.prefillTokensPerSec, 482.12);
assertEquals(logEntry.generationTokensPerSec, 18.35);
assertEquals(logEntry.combinedTokensPerSec, 38.1);
assertEquals(logEntry.totalDurationMs, 21600);
assertEquals(logEntry.timeToFirstTokenMs, 850);
// Should have ISO timestamp
assertEquals(logEntry.timestamp.includes("T"), true);
assertEquals(logEntry.timestamp.includes("Z"), true);
// Should be JSON serializable
const json = JSON.stringify(logEntry);
assertEquals(json.length > 0, true);
const parsed = JSON.parse(json);
assertEquals(parsed.provider, "anthropic");
// rawTimestamps should be included
assertEquals(logEntry.rawTimestamps?.ttftMs, 850);
assertEquals(logEntry.rawTimestamps?.allTtftMs, [850]);
assertEquals(logEntry.rawTimestamps?.generationDurationMs, 20750);
assertEquals(logEntry.rawTimestamps?.turns.length, 0);
});
Deno.test("aggregatePromptMetrics - warns when generation speed is physically impossible", () => {
const originalWarn = console.warn;
let warnCall: string | undefined;
console.warn = (msg: string) => { warnCall = msg; };
try {
const turnMetrics: TurnMetrics[] = [
{
turnId: "turn-0",
inputTokens: 100,
outputTokens: 1000,
durationMs: 1000,
timeToFirstTokenMs: 100,
},
];
aggregatePromptMetrics({
provider: "llama.cpp",
model: "Qwen3.6-35B",
turnMetrics,
});
// generation: 1000 / 0.9 = 1111.11 tok/s > 500
assertGreaterOrEqual(warnCall, "");
assertEquals(warnCall!.includes("Suspicious generation speed"), true);
assertEquals(warnCall!.includes("1111.1 tok/s"), true);
assertEquals(warnCall!.includes("output=1000"), true);
} finally {
console.warn = originalWarn;
}
});
Deno.test("aggregatePromptMetrics - does not warn for normal speeds", () => {
const originalWarn = console.warn;
let warnCall: string | undefined;
console.warn = (msg: string) => { warnCall = msg; };
try {
const turnMetrics: TurnMetrics[] = [
{
turnId: "turn-0",
inputTokens: 1000,
outputTokens: 200,
durationMs: 5000,
timeToFirstTokenMs: 800,
},
];
aggregatePromptMetrics({
provider: "llama.cpp",
model: "Qwen3.6-35B",
turnMetrics,
});
assertEquals(warnCall, undefined);
} finally {
console.warn = originalWarn;
}
});
Deno.test("aggregatePromptMetrics - uses full duration when TTFT is undefined", () => {
const turnMetrics: TurnMetrics[] = [
{
turnId: "turn-1",
inputTokens: 1000,
outputTokens: 200,
durationMs: 5000,
// No timeToFirstTokenMs
},
];
const result = aggregatePromptMetrics({
provider: "openai",
model: "gpt-4o",
turnMetrics,
});
assertEquals(result.turnCount, 1);
assertEquals(result.inputTokens, 1000);
assertEquals(result.outputTokens, 200);
// Without TTFT, prefill and generation rates are 0 (cannot separate phases)
// Only combined rate is meaningful
assertEquals(result.prefillTokensPerSec, 0);
assertEquals(result.generationTokensPerSec, 0);
assertEquals(result.combinedTokensPerSec, 240);
});
Deno.test("toLogEntry - handles missing timeToFirstToken", () => {
const metrics: PromptMetrics = {
provider: "anthropic",
model: "claude-sonnet-4",
turnCount: 1,
inputTokens: 100,
outputTokens: 50,
totalTokens: 150,
prefillTokensPerSec: 20,
generationTokensPerSec: 10,
combinedTokensPerSec: 30,
totalDurationMs: 5000,
timeToFirstTokenMs: undefined,
turns: [],
};
const logEntry = toLogEntry(metrics);
assertEquals(logEntry.timeToFirstTokenMs, undefined);
});
Deno.test("Integration - full flow from turns to log entry", () => {
// Simulate a real scenario with multiple turns
const turn1 = calculateTurnMetrics({
turnId: "turn-1",
inputTokens: 2000,
outputTokens: 500,
durationMs: 8000,
timeToFirstTokenMs: 1200,
});
const turn2 = calculateTurnMetrics({
turnId: "turn-2",
inputTokens: 800,
outputTokens: 200,
durationMs: 3000,
});
const promptMetrics = aggregatePromptMetrics({
provider: "groq",
model: "llama-3.1-70b",
turnMetrics: [turn1, turn2],
});
const display = formatMetricsForDisplay(promptMetrics);
const logEntry = toLogEntry(promptMetrics);
// Verify aggregation
assertEquals(promptMetrics.turnCount, 2);
assertEquals(promptMetrics.inputTokens, 2800);
assertEquals(promptMetrics.outputTokens, 700);
assertEquals(promptMetrics.totalTokens, 3500);
assertEquals(promptMetrics.totalDurationMs, 11000);
assertEquals(promptMetrics.timeToFirstTokenMs, 1200);
// Verify corrected rate calculations
// prefill: 2800 / 1.2 = 2333.33 tok/s
assertGreaterOrEqual(promptMetrics.prefillTokensPerSec, 2333.3);
assertLessOrEqual(promptMetrics.prefillTokensPerSec, 2333.4);
// generation: 700 / 9.8 = 71.43 tok/s
assertGreaterOrEqual(promptMetrics.generationTokensPerSec, 71.4);
assertLessOrEqual(promptMetrics.generationTokensPerSec, 71.5);
// combined: 3500 / 11 = 318.18 tok/s
assertGreaterOrEqual(promptMetrics.combinedTokensPerSec, 318.1);
assertLessOrEqual(promptMetrics.combinedTokensPerSec, 318.2);
// Verify display contains key info
assertEquals(display.includes("groq/llama-3.1-70b"), true);
assertEquals(display.includes("TTFT: 1200ms"), true);
// Verify log entry
assertEquals(logEntry.provider, "groq");
assertEquals(logEntry.model, "llama-3.1-70b");
assertEquals(logEntry.turnCount, 2);
});

View File

@ -0,0 +1,234 @@
// Functional core for LLM performance metrics calculation
// Extracted warning function so tests can mock it without touching console
export function warn(msg: string): void {
console.warn(msg);
}
export interface TurnMetrics {
turnId: string;
inputTokens: number;
outputTokens: number;
durationMs: number;
timeToFirstTokenMs?: number;
}
export interface PromptMetrics {
provider: string;
model: string;
turnCount: number;
inputTokens: number;
outputTokens: number;
totalTokens: number;
prefillTokensPerSec: number;
generationTokensPerSec: number;
combinedTokensPerSec: number;
totalDurationMs: number;
timeToFirstTokenMs?: number;
rawTimestamps?: {
ttftMs?: number;
allTtftMs?: number[];
generationDurationMs?: number;
turns: Array<{ turnId: string; durationMs: number; ttftMs?: number }>;
};
turns: TurnMetrics[];
}
export interface MetricLogEntry {
timestamp: string;
provider: string;
model: string;
turnCount: number;
inputTokens: number;
outputTokens: number;
totalTokens: number;
prefillTokensPerSec: number;
generationTokensPerSec: number;
combinedTokensPerSec: number;
totalDurationMs: number;
timeToFirstTokenMs?: number;
rawTimestamps?: {
ttftMs?: number;
allTtftMs?: number[];
generationDurationMs?: number;
turns: Array<{ turnId: string; durationMs: number; ttftMs?: number }>;
};
}
/**
* Calculate metrics for a single turn
*/
export function calculateTurnMetrics(params: {
turnId: string;
inputTokens: number;
outputTokens: number;
durationMs: number;
timeToFirstTokenMs?: number;
}): TurnMetrics {
return {
turnId: params.turnId,
inputTokens: params.inputTokens,
outputTokens: params.outputTokens,
durationMs: params.durationMs,
timeToFirstTokenMs: params.timeToFirstTokenMs,
};
}
/**
* Aggregate multiple turn metrics into prompt-level metrics
*/
export function aggregatePromptMetrics(params: {
provider: string;
model: string;
turnMetrics: TurnMetrics[];
}): PromptMetrics {
const { provider, model, turnMetrics } = params;
if (turnMetrics.length === 0) {
return {
provider,
model,
turnCount: 0,
inputTokens: 0,
outputTokens: 0,
totalTokens: 0,
prefillTokensPerSec: 0,
generationTokensPerSec: 0,
combinedTokensPerSec: 0,
totalDurationMs: 0,
rawTimestamps: { turns: [] },
turns: [],
};
}
// Sum tokens across all turns
const inputTokens = turnMetrics.reduce((sum, t) => sum + t.inputTokens, 0);
const outputTokens = turnMetrics.reduce((sum, t) => sum + t.outputTokens, 0);
const totalTokens = inputTokens + outputTokens;
// Sum duration across all turns
const totalDurationMs = turnMetrics.reduce((sum, t) => sum + t.durationMs, 0);
const totalDurationSec = totalDurationMs / 1000;
// Collect per-turn TTFTs; prefill boundary is the first turn's TTFT
const ttftValues = turnMetrics.map(t => t.timeToFirstTokenMs).filter((t): t is number => t !== undefined && t >= 0);
const firstTurnTtftMs = ttftValues.length > 0 ? ttftValues[0] : undefined;
// Calculate tokens per second
// Prefill: input tokens / first-turn TTFT (prefill happens once at the start)
// Generation: output tokens / (totalDuration - firstTurnTTFT) (generation phase)
// Combined: total tokens / total duration
// When first-turn TTFT is unavailable, prefill and generation phases cannot be separated,
// so we set them to 0 and only report combined.
const ttftSec = firstTurnTtftMs !== undefined ? firstTurnTtftMs / 1000 : undefined;
const generationDurationSec = firstTurnTtftMs !== undefined
? (totalDurationMs - firstTurnTtftMs) / 1000
: undefined;
const prefillTokensPerSec = (ttftSec && ttftSec > 0) ? inputTokens / ttftSec : 0;
const generationTokensPerSec = (generationDurationSec !== undefined && generationDurationSec > 0)
? outputTokens / generationDurationSec
: 0;
const combinedTokensPerSec = totalDurationSec > 0 ? totalTokens / totalDurationSec : 0;
// Sanity check: flag physically impossible generation speeds
if (generationTokensPerSec > 500) {
warn(
`[metrics] Suspicious generation speed: ${generationTokensPerSec.toFixed(1)} tok/s (input=${inputTokens}, output=${outputTokens}, totalDuration=${totalDurationMs}ms, TTFT=${firstTurnTtftMs}ms)`
);
}
return {
provider,
model,
turnCount: turnMetrics.length,
inputTokens,
outputTokens,
totalTokens,
prefillTokensPerSec,
generationTokensPerSec,
combinedTokensPerSec,
totalDurationMs,
timeToFirstTokenMs: firstTurnTtftMs,
rawTimestamps: {
ttftMs: firstTurnTtftMs,
allTtftMs: ttftValues,
generationDurationMs: generationDurationSec !== undefined ? generationDurationSec * 1000 : undefined,
turns: turnMetrics.map(t => ({ turnId: t.turnId, durationMs: t.durationMs, ttftMs: t.timeToFirstTokenMs })),
},
turns: turnMetrics,
};
}
/**
* Format metrics for TUI display
*/
export function formatMetricsForDisplay(metrics: PromptMetrics): string {
const lines: string[] = [];
// Header with provider/model
lines.push(`📊 Performance: ${metrics.provider}/${metrics.model}`);
if (metrics.turnCount === 0) {
lines.push(" No turns recorded");
return lines.join("\n");
}
// Format duration display
const durationSec = metrics.totalDurationMs / 1000;
const durationDisplay = durationSec >= 60
? `${(durationSec / 60).toFixed(1)}m`
: `${durationSec.toFixed(1)}s`;
// Prefill metrics (only when TTFT was available)
if (metrics.prefillTokensPerSec > 0) {
lines.push(
` Prefill: ${metrics.inputTokens.toLocaleString()} tokens @ ${metrics.prefillTokensPerSec.toFixed(1)} tok/s`
);
}
// Generation metrics (only when TTFT was available)
if (metrics.generationTokensPerSec > 0) {
lines.push(
` Generation: ${metrics.outputTokens.toLocaleString()} tokens @ ${metrics.generationTokensPerSec.toFixed(1)} tok/s`
);
}
// Combined metrics
lines.push(
` Combined: ${metrics.totalTokens.toLocaleString()} tokens @ ${metrics.combinedTokensPerSec.toFixed(1)} tok/s (${durationDisplay} total)`
);
// Time to first token
if (metrics.timeToFirstTokenMs !== undefined) {
lines.push(` TTFT: ${metrics.timeToFirstTokenMs.toFixed(0)}ms`);
}
// Turn count
if (metrics.turnCount > 1) {
lines.push(` Turns: ${metrics.turnCount}`);
}
return lines.join("\n");
}
/**
* Convert PromptMetrics to JSONL log entry
*/
export function toLogEntry(metrics: PromptMetrics): MetricLogEntry {
return {
timestamp: new Date().toISOString(),
provider: metrics.provider,
model: metrics.model,
turnCount: metrics.turnCount,
inputTokens: metrics.inputTokens,
outputTokens: metrics.outputTokens,
totalTokens: metrics.totalTokens,
prefillTokensPerSec: Math.round(metrics.prefillTokensPerSec * 100) / 100,
generationTokensPerSec: Math.round(metrics.generationTokensPerSec * 100) / 100,
combinedTokensPerSec: Math.round(metrics.combinedTokensPerSec * 100) / 100,
totalDurationMs: metrics.totalDurationMs,
timeToFirstTokenMs: metrics.timeToFirstTokenMs,
rawTimestamps: metrics.rawTimestamps,
};
}

View File

@ -0,0 +1,101 @@
// LLM Performance Metrics Extension
// Captures and displays LLM inference performance metrics
import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";
import { appendFileSync, mkdirSync } from "node:fs";
import { dirname, join } from "node:path";
// Re-export core functions from the shared metrics module
import {
calculateTurnMetrics,
aggregatePromptMetrics,
formatMetricsForDisplay,
toLogEntry,
type TurnMetrics,
type PromptMetrics,
type MetricLogEntry,
} from "./llm-metrics-core.ts";
// ============================================================================
// Extension Event Handlers (imperative shell)
// ============================================================================
// State tracking
let promptStartMs: number | undefined;
let currentTurnStartMs: number | undefined;
let currentTurnId: string | undefined;
let turnMetrics: TurnMetrics[] = [];
let currentTurnFirstTokenMs: number | undefined; // Per-turn TTFT
let provider: string | undefined;
let model: string | undefined;
export default function (pi: ExtensionAPI) {
const logFile = join(process.cwd(), ".pi", "llm-metrics.log");
pi.on("agent_start", async (_event, ctx) => {
if (!ctx.model) return;
promptStartMs = Date.now();
turnMetrics = [];
currentTurnFirstTokenMs = undefined;
provider = ctx.model.provider;
model = ctx.model.id;
});
pi.on("turn_start", async (event, _ctx) => {
currentTurnStartMs = Date.now();
currentTurnId = `turn-${event.turnIndex}`;
currentTurnFirstTokenMs = undefined; // Reset TTFT for this turn
});
pi.on("message_update", async (event, _ctx) => {
// Capture per-turn TTFT on first token
if (currentTurnFirstTokenMs === undefined && event.assistantMessageEvent?.type === "text_delta") {
currentTurnFirstTokenMs = Date.now();
}
});
pi.on("turn_end", async (event, _ctx) => {
if (event.message.role !== "assistant") return;
const inputTokens = event.message.usage?.input ?? 0;
const outputTokens = event.message.usage?.output ?? 0;
const durationMs = currentTurnStartMs ? Date.now() - currentTurnStartMs : 0;
const ttftMs = currentTurnFirstTokenMs && currentTurnStartMs
? currentTurnFirstTokenMs - currentTurnStartMs
: undefined;
const turnMetric = calculateTurnMetrics({
turnId: currentTurnId!,
inputTokens,
outputTokens,
durationMs,
timeToFirstTokenMs: ttftMs,
});
turnMetrics.push(turnMetric);
});
pi.on("agent_end", async (_event, ctx) => {
if (!provider || !model || promptStartMs === undefined) return;
const promptMetrics = aggregatePromptMetrics({
provider,
model,
turnMetrics,
});
// Display in TUI
const display = formatMetricsForDisplay(promptMetrics);
ctx.ui.notify(display, "info");
ctx.ui.setStatus("metrics", `📊 ${promptMetrics.combinedTokensPerSec.toFixed(1)} tok/s`);
// Log to JSONL file
const logEntry = toLogEntry(promptMetrics);
mkdirSync(dirname(logFile), { recursive: true });
appendFileSync(logFile, JSON.stringify(logEntry) + "\n", "utf8");
// Reset state
promptStartMs = undefined;
turnMetrics = [];
currentTurnFirstTokenMs = undefined;
});
}

View File

@ -0,0 +1,62 @@
# pi-notifications
Audio alerts for pi agent events via `afplay`.
## What it does
Plays a sound when the agent finishes a turn, so you can step away and get alerted when input is needed.
## Configuration
| Env var | Default | Description |
|---------|---------|-------------|
| `PI_NOTIFICATIONS_ENABLED` | `true` | Set to `false` to disable all notifications |
| `PI_NOTIFICATION_AGENT_END` | `true` | Play sound when agent finishes |
| `PI_NOTIFICATION_AUDIO` | `/System/Library/Sounds/Glass.aiff` | Path to audio file (.aiff/.wav/.mp3) |
## Standalone tester
Verify audio playback:
```bash
node --input-type=module -e "import {createJiti} from './node_modules/.pnpm/@mariozechner+jiti@2.6.5/node_modules/@mariozechner/jiti/lib/jiti.mjs'; const jiti = createJiti(); await jiti.import('./packages/pi-notifications/src/test-notify.ts');"
```
## Available macOS sounds
```
/System/Library/Sounds/Bottle.aiff
/System/Library/Sounds/Cork.aiff
/System/Library/Sounds/Frog.aiff
/System/Library/Sounds/Glass.aiff ← default
/System/Library/Sounds/Hero.aiff
/System/Library/Sounds/Morse.aiff
/System/Library/Sounds/Ping.aiff
/System/Library/Sounds/Pop.aiff
/System/Library/Sounds/Submarine.aiff
/System/Library/Sounds/Sosumi.aiff
/System/Library/Sounds/Tink.aiff
```
## Usage
Add to `~/.pi/agent/settings.json`:
```json
{
"packages": [
"/path/to/packages/pi-notifications"
]
}
```
Then reload pi:
```bash
/reload
```
## License
MIT

View File

@ -0,0 +1,17 @@
{
"name": "pi-notifications",
"version": "0.1.0",
"description": "Desktop notifications for pi agent events",
"type": "module",
"exports": {
".": "./src/index.ts"
},
"keywords": ["pi-package"],
"pi": {
"extensions": ["src/index.ts"]
},
"peerDependencies": {
"@mariozechner/pi-coding-agent": "*"
},
"license": "MIT"
}

View File

@ -0,0 +1,36 @@
// Desktop notifications for pi agent events
// Plays an audio file to alert the user
import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";
import { execSync } from "node:child_process";
import { existsSync } from "node:fs";
// Configuration via environment variables
const enabled = process.env.PI_NOTIFICATIONS_ENABLED !== "false";
const agentEndEnabled = process.env.PI_NOTIFICATION_AGENT_END !== "false";
const audioPath = process.env.PI_NOTIFICATION_AUDIO || "/System/Library/Sounds/Glass.aiff";
function notify(body: string, subtitle?: string): void {
if (!enabled) return;
try {
if (existsSync(audioPath)) {
execSync(`afplay "${audioPath}"`, { stdio: "ignore" });
}
} catch {
// audio playback failed — silently fail
}
}
export default function (pi: ExtensionAPI) {
pi.on("session_start", async (_event, _ctx) => {
if (enabled) {
notify("pi-notifications active", "Listening for agent_end");
}
});
pi.on("agent_end", async (event, _ctx) => {
if (!agentEndEnabled) return;
notify("Agent finished", `${event.messages?.length ?? 0} turns`);
});
}

View File

@ -0,0 +1,24 @@
// Standalone audio tester — run from bash to verify audio playback works
// Usage: npx jiti packages/pi-notifications/src/test-notify.ts
//
// This is completely decoupled from the agent loop.
// Use it to verify that audio playback works before debugging event handler wiring.
import { execSync } from "node:child_process";
import { existsSync } from "node:fs";
const audioPath = process.env.PI_NOTIFICATION_AUDIO || "/System/Library/Sounds/Glass.aiff";
try {
if (!existsSync(audioPath)) {
console.error("[test-audio] ❌ Audio file not found:", audioPath);
console.error("[test-audio] Set PI_NOTIFICATION_AUDIO to a valid .aiff/.wav/.mp3 path");
process.exit(1);
}
console.log("[test-audio] playing:", audioPath);
execSync(`afplay "${audioPath}"`, { stdio: ["ignore", "pipe", "pipe"] });
console.log("[test-audio] ✅ Audio played");
} catch (e: any) {
console.error("[test-audio] ❌ Failed:", e.message);
process.exit(1);
}

73
plans/metrics-check.md Normal file
View File

@ -0,0 +1,73 @@
# Plan: Analyze & Fix `llm-metrics` Extension Timing Bug
## Problem Statement
The extension reports generation speed as ~8,0002,400 tok/s (physically impossible) while prefill speed is ~70 tok/s. The math is internally consistent but the underlying phase boundaries are inverted or misaligned. Real generation speed is ~5370 tok/s (confirmed by earlier runs).
## Phase 1: Locate & Map the Extension
1. **Find the source code**
- Search `~/.pi/extensions/`, `~/.pi/tools/`, and the pi-coding-agent package for files matching `llm`, `metric`, `performance`, `benchmark`
- Check `~/.pi/config` or project `.pi/config` for extension/tool registration
- Look for custom tool definitions in `extensions/`, `tools/`, or `skills/` directories
2. **Identify the provider integration**
- The log shows `"provider":"llama.cpp"` — find where the extension hooks into llama.cpp (likely via subprocess, WebSocket, or callback interception)
- Map the data flow: raw llama.cpp output → extension parsing → JSON log writing
## Phase 2: Diagnose the Timing Bug
3. **Trace phase boundary detection**
- Find how the extension defines "prefill" vs "generation" start/end times
- Check if it uses:
- `timeToFirstToken` (TTFT) as the split point
- llama.cpp callback hooks (`completion_token_callback`, `prompt_token_callback`)
- Wall-clock timestamps around token streaming
4. **Verify the calculation**
- Confirm the formula: `generationTok/s = outputTokens / (totalDuration - TTFT)`
- Check if `totalDuration` includes only generation, or the full call
- Look for race conditions: async callbacks firing out of order, or generation end timestamp captured before all tokens are flushed
5. **Reproduce the anomaly**
- Run the same model with identical prompt/output length
- Compare TTFT, totalDuration, and per-phase timestamps
- Check if the bug appears only with large prompts, speculative decoding, or certain sampling configs
## Phase 3: Fix the Implementation
6. **Correct phase boundaries**
- If using callbacks: ensure generation start = TTFT timestamp, generation end = last token callback or explicit `done` event
- If using wall-clock: add a small buffer after last token to account for async flush
- Add validation: reject generation speeds > 500 tok/s (sanity check)
7. **Fix label assignment**
- Ensure `prefillTokensPerSec` = `inputTokens / TTFT`
- Ensure `generationTokensPerSec` = `outputTokens / (totalDuration - TTFT)`
- Add explicit phase logging to debug output
8. **Add telemetry**
- Log raw timestamps: `prefill_start`, `prefill_end`, `gen_start`, `gen_end`, `total_start`, `total_end`
- Log per-phase token counts to catch mismatches
- Write to `.pi/llm-metrics.log` with consistent schema
## Phase 4: Verify & Deploy
9. **Test cases**
- Small prompt + short output (baseline)
- Large prompt + long output (original failure case)
- Speculative decoding run (if supported)
- Early termination / stop token edge case
10. **Validate output**
- Generation speed should be 40100 tok/s for this model/hardware
- Prefill speed should be 50200 tok/s (parallel compute)
- TTFT should match prefill duration
- No negative phase durations
11. **Update schema & docs**
- Add `rawTimestamps` field to log entries for debugging
- Document phase definitions in extension README
- Add unit tests for metric calculation logic
## Deliverables
- [ ] Extension source located & data flow mapped
- [ ] Root cause identified (callback timing gap, phase boundary misassignment, or async flush race)
- [ ] Fix implemented with sanity checks
- [ ] Test suite covering edge cases
- [ ] Log schema updated with raw timestamps
- [ ] PR or patch ready for review
## Questions to Answer During Analysis
- Does the extension intercept llama.cpp at the C++ level, via CLI, or through a Python wrapper?
- Are callbacks synchronous or async?
- Is there a `done`/`end` event, or does it rely on empty token streams?
- Could speculative decoding be causing the draft model's batched verification to be misclassified as "generation"?

View File

@ -0,0 +1,154 @@
# Plan: pi-notifications v0 — Desktop Notifications for Agent Events
## Goal
Make the `pi-notifications` extension reliably show macOS Notification Center alerts when the agent finishes a turn, so the user gets alerted without needing to watch the screen.
## Current State
- Extension exists at `packages/pi-notifications/src/index.ts` (monorepo) and `~/.pi/agent/extensions/pi-notifications.ts` (auto-discovery)
- Extension loads correctly (appears in `/reload` extension list)
- `console.log` from extensions is NOT visible in `/reload` output
- `osascript` works when run directly in bash, but notification doesn't appear when called from the extension
- The `session_start` handler fires on reload, `agent_end` fires when prompts complete
## Debugging Strategy (split into two orthogonal problems)
### Problem A: "Does the trigger fire?" — visible debug signal
`console.log` from extensions is invisible in pi's TUI output. To debug the trigger logic in a fast loop, add a **debug mode** (`PI_NOTIFICATION_DEBUG=true`) that emits a visible signal via `ctx.ui.steer()` (or similar) right before calling `notify()`. This surfaces in the chat/TUI so you can verify the handler fires without needing actual desktop notifications.
### Problem B: "Does `osascript` actually deliver?" — isolated tester
Create a standalone script (`test-notify.ts`) that you run from bash independently of the agent loop. This verifies `osascript` works in the extension's import context, decoupled from event handlers.
### 1. Verify `osascript` works in extension context
The extension uses `execSync` from `node:child_process`. Test that it works inside the extension:
```typescript
// In the extension, add this to session_start handler:
try {
const output = execSync('osascript -e "display notification \\"test\\" with title \\"test\\""').toString();
console.log("[pi-notifications] osascript output:", output);
} catch (e: any) {
console.log("[pi-notifications] osascript error:", e.message, e.stderr?.toString());
}
```
If `execSync` fails silently, try:
- Using `{ stdio: ["pipe", "pipe", "pipe"] }` to capture stderr
- Checking if `node:child_process` is available in the extension sandbox
### 2. Check macOS notification settings
Notifications may be delivered but not shown as banners:
- **System Settings → Notifications → Ghostty → Notification Style** — must be "Banners" or "Alerts", not "None" (osascript fires from the Ghostty process, so macOS attributes notifications to Ghostty, not "pi")
- **System Settings → Focus → [active focus] → Apps** — ensure "Ghostty" is not excluded
- **System Settings → Notifications → Show Notifications on Lock Screen** — enable if needed
**Known symptom:** Notifications appear in Notification Center when pulled down, but never pop up as banners. This is a macOS style setting, not a code issue.
### 3. Ghostty suppresses banners when focused
Ghostty intentionally silences banner notifications (no pop-up, no sound) when the Ghostty window is **active/focused**. The notification is still delivered to Notification Center. Banners only appear when Ghostty is **not** the active window.
**Workarounds:**
- **System Settings → Notifications → Ghostty → Alert Style → "Persistent"** — macOS shows these as banners regardless of Ghostty's silencing
- **Switch to another app** (e.g. leave your browser open) when you want to see the banner
### 3. Try alternative notification methods
If `osascript` doesn't work from the extension, try:
- `notify-send` (Linux-only, not relevant for macOS)
- A custom TUI widget that shows a persistent banner
- Using `ctx.ui.notify()` (but this only shows in pi's TUI, not system notification)
### 4. Verify event handlers fire
Add a `session_start` handler that definitely fires:
```typescript
pi.on("session_start", async (_event, ctx) => {
console.log("[pi-notifications] session_start fired");
ctx.ui.notify("pi-notifications active", "info"); // Shows in TUI
});
```
If `ctx.ui.notify()` works but `osascript` doesn't, the issue is macOS notification permissions, not the extension.
## Implementation Plan
### Step 0A: Add debug mode with visible signal (PI_NOTIFICATION_DEBUG)
Add a `PI_NOTIFICATION_DEBUG=true` env var. When enabled, the extension calls `ctx.ui.steer()` (or a visible TUI signal) right before each notification, so you see "notification triggered" in the chat output during the agent loop. This lets you verify trigger logic without needing actual desktop notifications.
- In `agent_end` handler: if `PI_NOTIFICATION_DEBUG=true`, call `ctx.ui.steer("[pi-notifications] notification triggered")` before `notify()`
- In `session_start` handler: same pattern
- This is purely for debugging — no desktop notification shown when debug is on (or both are shown)
### Step 0B: Create isolated notification tester
Create `packages/pi-notifications/src/test-notify.ts` — a standalone script runnable via `npx jiti` that fires a test notification. Run it from bash to verify `osascript` works in the extension's context, completely separate from the agent loop.
### Step 1: Fix notification delivery (priority)
Once the root cause is identified:
**If `osascript` works but notification is suppressed:**
- Add a `PI_NOTIFICATION_SOUND` env var (already in design)
- Add `PI_NOTIFICATIONS_ENABLED` toggle (already in design)
- Consider adding a "first-run" notification that asks user to enable notifications
**If `osascript` doesn't work from extension:**
- Fall back to `ctx.ui.notify()` which shows in pi's TUI
- Or use a different approach (e.g., write to a file that a separate process monitors)
### Step 2: Add turn-limit notification
In `packages/pi-turn-limit/src/turn-limit.ts`, add notification when the limit is reached:
```typescript
// In the turn-limit extension, when the limit fires:
if (shouldNotify) {
execSync('osascript -e \'display notification "Turn limit reached" with title "pi" subtitle "Turns: ' + turnCount + '/' + maxTurns + '"\'');
}
```
Configuration via env var:
- `PI_NOTIFICATION_TURN_LIMIT` — default `true`, set to `false` to disable
### Step 3: Add sound option
Already designed in the extension:
- `PI_NOTIFICATION_SOUND` env var (default: `default`)
- macOS sounds: `Bottle`, `Ping`, `Pop`, `Submarine`, `Sosumi`, `Tink`
- Set to `""` for silent
### Step 4: Update README
Document the extension with:
- What it does
- Configuration options
- How to enable macOS notifications
- Troubleshooting tips
## Files to Modify
| File | Action |
|------|--------|
| `~/.pi/agent/extensions/pi-notifications.ts` | Debug and fix notification delivery |
| `packages/pi-notifications/src/index.ts` | Sync fixes from auto-discovery version |
| `packages/pi-turn-limit/src/turn-limit.ts` | Add turn-limit notification |
| `packages/pi-notifications/README.md` | Update with notification docs |
## Success Criteria
1. ✅ Extension loads and appears in `/reload` output
2. ✅ macOS Notification Center shows "pi-notifications active" on reload
3. ✅ macOS Notification Center shows "Agent finished — N turns" when agent completes a prompt
4. ✅ Turn-limit notification shows when turn limit is exceeded
5. ✅ `PI_NOTIFICATIONS_ENABLED=false` disables all notifications
6. ✅ README documents all configuration options
7. ✅ `PI_NOTIFICATION_DEBUG=true` shows visible signal in TUI when handlers fire
8. ✅ `test-notify.ts` fires a notification when run standalone

76
scoped-packages.md Normal file
View File

@ -0,0 +1,76 @@
# Scoped Packages
## Step 1: Create the npm org
```bash
npm org create mostalive
```
This creates the `@mostalive` scope on npm. You'll need to pay the [org fee](https://docs.npmjs.com/about-organizations) (currently ~$7/month for the basic tier).
Alternatively, if you already have an account, you can use your username directly — scoped packages can use your personal account too:
```bash
# No separate org creation needed if @mostalive is your npm username
```
Check if the scope exists:
```bash
npm org list
```
## Step 2: Rename the package
In `packages/pi-turn-limit/package.json`:
```json
{
"name": "@mostalive/pi-turn-limit",
"version": "0.1.0",
...
}
```
## Step 3: Publish
```bash
cd packages/pi-turn-limit
npm publish
```
Scoped packages require `--access public` on first publish (since npm defaults scoped packages to private):
```bash
npm publish --access public
```
## Step 4: Users install
```bash
pi install npm:@mostalive/pi-turn-limit
```
---
## Cheaper Alternative: Scoped Git Package
If you don't want to pay for an npm org, you can ship via git without scoping:
```bash
pi install git:github.com/mostalive/pi-turn-limit
```
No npm org needed. Users install directly from your GitHub repo. You'd still need to publish to npm for the `npm:` install path, but the git path is free.
---
## Summary
| Approach | Cost | User installs via |
|----------|------|-------------------|
| `npm org create` + scoped npm | ~$7/mo | `pi install npm:@mostalive/pi-turn-limit` |
| GitHub repo (no scope) | Free | `pi install git:github.com/user/repo` |
| Unscoped npm (`pi-turn-limit`) | Free | `pi install npm:pi-turn-limit` |
If you already have a personal npm account named `mostalive`, the scope is free — scoped packages just use your existing account. The org fee only applies if you create a separate organization entity.

152
working-with-extensions.md Normal file
View File

@ -0,0 +1,152 @@
# Working with Pi Extensions
## Installation Options
### Option 1: Publish to npm + `pi install` (Recommended)
The cleanest path that replicates the official pi experience.
**You (publishing):**
```bash
cd packages/pi-turn-limit
npm publish
```
**Users (installing globally):**
```bash
pi install npm:pi-turn-limit
```
This writes to `~/.pi/agent/settings.json` under `packages`. Pi handles the install, runs `npm install`, and auto-discovers the extension from the `pi.extensions` manifest.
### Option 2: npm global install + settings.json
**You (publishing):**
```bash
npm publish
```
**Users:** Two steps — install the npm package globally, then tell pi about it:
```bash
npm install -g pi-turn-limit
```
Then in `~/.pi/agent/settings.json`:
```json
{
"packages": [
"npm:pi-turn-limit"
]
}
```
Or use the same command as Option 1 — `pi install npm:pi-turn-limit` does both steps.
### Option 3: Local directory (for development)
For local testing without publishing:
```bash
pi install /Users/willem/dev/spikes/llm/monotonic-pi-extensions/packages/pi-turn-limit
```
Or in `~/.pi/agent/settings.json`:
```json
{
"packages": [
"/Users/willem/dev/spikes/llm/monotonic-pi-extensions/packages/pi-turn-limit"
]
}
```
Or as a single-file extension in `~/.pi/agent/extensions/`:
```bash
cp packages/pi-turn-limit/src/turn-limit.ts ~/.pi/agent/extensions/turn-limit.ts
```
### Option 4: Per-repo project-local install
Users can install an extension only for a specific project:
```bash
pi install -l npm:pi-turn-limit # -l = project-local
```
This writes to `.pi/settings.json` in the project root. Pi auto-installs missing packages on startup per-project.
---
## Disabling Extensions Per-Repo
Three approaches:
### A. `pi config` (simplest)
```bash
pi config turn-limit:off # Disable by extension name
pi config turn-limit:on # Re-enable
```
Works for both global and project scope. Per-repo:
```bash
pi config -l turn-limit:off
```
### B. Package filtering in project `settings.json`
In `.pi/settings.json` (project-local):
```json
{
"packages": [
{
"source": "npm:pi-turn-limit",
"extensions": [] // Load none
}
]
}
```
Or filter specific files:
```json
{
"packages": [
{
"source": "npm:pi-turn-limit",
"extensions": ["!src/turn-limit.ts"] // Exclude this one
}
]
}
```
### C. Remove from settings entirely
```bash
pi remove npm:pi-turn-limit
```
Or manually edit `~/.pi/agent/settings.json` and remove the package entry.
---
## Summary Table
| Method | Scope | User Command |
|--------|-------|--------------|
| `pi install npm:pkg` | Global | One command, handles everything |
| `npm i -g` + settings.json | Global | Two steps |
| `pi install ./path` | Global (symlink-style) | Local dev |
| `pi install -l npm:pkg` | Project-local | Per-repo |
| `pi config name:off` | Toggle | Enable/disable without uninstalling |
| `pi config -l name:off` | Project-local toggle | Per-repo disable |
**Recommendation:** Publish to npm, then users run `pi install npm:pi-turn-limit`. For disabling per-repo, `pi config -l turn-limit:off` is the simplest approach — a one-liner that doesn't require editing JSON files.