move pi-llm-performance to monorepo, update README and add deno.json
This commit is contained in:
parent
0cf13ed54e
commit
a38c76c65e
1
.pi/llm-metrics.log
Normal file
1
.pi/llm-metrics.log
Normal file
@ -0,0 +1 @@
|
||||
{"timestamp":"2026-04-28T08:58:29.989Z","provider":"llama.cpp","model":"Qwen3.6-35B-A3B-MXFP4_MOE.gguf","turnCount":6,"inputTokens":8294,"outputTokens":1356,"totalTokens":9650,"prefillTokensPerSec":1925.26,"generationTokensPerSec":42.55,"combinedTokensPerSec":266.77,"totalDurationMs":36174,"timeToFirstTokenMs":4308}
|
||||
@ -1,5 +1,6 @@
|
||||
[tools]
|
||||
bun = "latest"
|
||||
deno = "latest"
|
||||
elixir = "latest"
|
||||
erlang = "latest"
|
||||
node = "24"
|
||||
|
||||
@ -1,22 +1,26 @@
|
||||
# pi-llm-performance
|
||||
|
||||
LLM performance metrics extension
|
||||
LLM performance metrics extension — captures and displays TTFT, prefill, and generation speeds from pi agent turns.
|
||||
|
||||
## How to install
|
||||
## Development
|
||||
|
||||
Add to your global pi settings:
|
||||
This package lives in the `pi-extensions` monorepo.
|
||||
|
||||
```bash
|
||||
pi install /Users/willem/dev/spikes/llm/custom-coding-agent/packages/pi-llm-performance
|
||||
pnpm install # workspace setup
|
||||
deno test # run tests
|
||||
```
|
||||
|
||||
Or add manually to `~/.pi/agent/settings.json`:
|
||||
## Usage
|
||||
|
||||
```
|
||||
Add to your pi settings (`~/.pi/agent/settings.json`):
|
||||
|
||||
```json
|
||||
{
|
||||
"packages": [
|
||||
"/Users/willem/dev/spikes/llm/custom-coding-agent/packages/pi-llm-performance",
|
||||
...
|
||||
"../dev/spikes/llm/monotonic-pi-extensions/packages/pi-llm-performance"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Then reload pi:
|
||||
|
||||
8
packages/pi-llm-performance/deno.json
Normal file
8
packages/pi-llm-performance/deno.json
Normal file
@ -0,0 +1,8 @@
|
||||
{
|
||||
"imports": {
|
||||
"@std/assert": "jsr:@std/assert@^1.0.0"
|
||||
},
|
||||
"tasks": {
|
||||
"test": "deno test src/"
|
||||
}
|
||||
}
|
||||
5
packages/pi-llm-performance/deno.lock
generated
5
packages/pi-llm-performance/deno.lock
generated
@ -14,5 +14,10 @@
|
||||
"@std/internal@1.0.12": {
|
||||
"integrity": "972a634fd5bc34b242024402972cd5143eac68d8dffaca5eaa4dba30ce17b027"
|
||||
}
|
||||
},
|
||||
"workspace": {
|
||||
"dependencies": [
|
||||
"jsr:@std/assert@1"
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
73
plans/metrics-check.md
Normal file
73
plans/metrics-check.md
Normal file
@ -0,0 +1,73 @@
|
||||
# Plan: Analyze & Fix `llm-metrics` Extension Timing Bug
|
||||
|
||||
## Problem Statement
|
||||
The extension reports generation speed as ~8,000–2,400 tok/s (physically impossible) while prefill speed is ~70 tok/s. The math is internally consistent but the underlying phase boundaries are inverted or misaligned. Real generation speed is ~53–70 tok/s (confirmed by earlier runs).
|
||||
|
||||
## Phase 1: Locate & Map the Extension
|
||||
1. **Find the source code**
|
||||
- Search `~/.pi/extensions/`, `~/.pi/tools/`, and the pi-coding-agent package for files matching `llm`, `metric`, `performance`, `benchmark`
|
||||
- Check `~/.pi/config` or project `.pi/config` for extension/tool registration
|
||||
- Look for custom tool definitions in `extensions/`, `tools/`, or `skills/` directories
|
||||
2. **Identify the provider integration**
|
||||
- The log shows `"provider":"llama.cpp"` — find where the extension hooks into llama.cpp (likely via subprocess, WebSocket, or callback interception)
|
||||
- Map the data flow: raw llama.cpp output → extension parsing → JSON log writing
|
||||
|
||||
## Phase 2: Diagnose the Timing Bug
|
||||
3. **Trace phase boundary detection**
|
||||
- Find how the extension defines "prefill" vs "generation" start/end times
|
||||
- Check if it uses:
|
||||
- `timeToFirstToken` (TTFT) as the split point
|
||||
- llama.cpp callback hooks (`completion_token_callback`, `prompt_token_callback`)
|
||||
- Wall-clock timestamps around token streaming
|
||||
4. **Verify the calculation**
|
||||
- Confirm the formula: `generationTok/s = outputTokens / (totalDuration - TTFT)`
|
||||
- Check if `totalDuration` includes only generation, or the full call
|
||||
- Look for race conditions: async callbacks firing out of order, or generation end timestamp captured before all tokens are flushed
|
||||
5. **Reproduce the anomaly**
|
||||
- Run the same model with identical prompt/output length
|
||||
- Compare TTFT, totalDuration, and per-phase timestamps
|
||||
- Check if the bug appears only with large prompts, speculative decoding, or certain sampling configs
|
||||
|
||||
## Phase 3: Fix the Implementation
|
||||
6. **Correct phase boundaries**
|
||||
- If using callbacks: ensure generation start = TTFT timestamp, generation end = last token callback or explicit `done` event
|
||||
- If using wall-clock: add a small buffer after last token to account for async flush
|
||||
- Add validation: reject generation speeds > 500 tok/s (sanity check)
|
||||
7. **Fix label assignment**
|
||||
- Ensure `prefillTokensPerSec` = `inputTokens / TTFT`
|
||||
- Ensure `generationTokensPerSec` = `outputTokens / (totalDuration - TTFT)`
|
||||
- Add explicit phase logging to debug output
|
||||
8. **Add telemetry**
|
||||
- Log raw timestamps: `prefill_start`, `prefill_end`, `gen_start`, `gen_end`, `total_start`, `total_end`
|
||||
- Log per-phase token counts to catch mismatches
|
||||
- Write to `.pi/llm-metrics.log` with consistent schema
|
||||
|
||||
## Phase 4: Verify & Deploy
|
||||
9. **Test cases**
|
||||
- Small prompt + short output (baseline)
|
||||
- Large prompt + long output (original failure case)
|
||||
- Speculative decoding run (if supported)
|
||||
- Early termination / stop token edge case
|
||||
10. **Validate output**
|
||||
- Generation speed should be 40–100 tok/s for this model/hardware
|
||||
- Prefill speed should be 50–200 tok/s (parallel compute)
|
||||
- TTFT should match prefill duration
|
||||
- No negative phase durations
|
||||
11. **Update schema & docs**
|
||||
- Add `rawTimestamps` field to log entries for debugging
|
||||
- Document phase definitions in extension README
|
||||
- Add unit tests for metric calculation logic
|
||||
|
||||
## Deliverables
|
||||
- [ ] Extension source located & data flow mapped
|
||||
- [ ] Root cause identified (callback timing gap, phase boundary misassignment, or async flush race)
|
||||
- [ ] Fix implemented with sanity checks
|
||||
- [ ] Test suite covering edge cases
|
||||
- [ ] Log schema updated with raw timestamps
|
||||
- [ ] PR or patch ready for review
|
||||
|
||||
## Questions to Answer During Analysis
|
||||
- Does the extension intercept llama.cpp at the C++ level, via CLI, or through a Python wrapper?
|
||||
- Are callbacks synchronous or async?
|
||||
- Is there a `done`/`end` event, or does it rely on empty token streams?
|
||||
- Could speculative decoding be causing the draft model's batched verification to be misclassified as "generation"?
|
||||
76
scoped-packages.md
Normal file
76
scoped-packages.md
Normal file
@ -0,0 +1,76 @@
|
||||
# Scoped Packages
|
||||
|
||||
## Step 1: Create the npm org
|
||||
|
||||
```bash
|
||||
npm org create mostalive
|
||||
```
|
||||
|
||||
This creates the `@mostalive` scope on npm. You'll need to pay the [org fee](https://docs.npmjs.com/about-organizations) (currently ~$7/month for the basic tier).
|
||||
|
||||
Alternatively, if you already have an account, you can use your username directly — scoped packages can use your personal account too:
|
||||
|
||||
```bash
|
||||
# No separate org creation needed if @mostalive is your npm username
|
||||
```
|
||||
|
||||
Check if the scope exists:
|
||||
|
||||
```bash
|
||||
npm org list
|
||||
```
|
||||
|
||||
## Step 2: Rename the package
|
||||
|
||||
In `packages/pi-turn-limit/package.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "@mostalive/pi-turn-limit",
|
||||
"version": "0.1.0",
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
## Step 3: Publish
|
||||
|
||||
```bash
|
||||
cd packages/pi-turn-limit
|
||||
npm publish
|
||||
```
|
||||
|
||||
Scoped packages require `--access public` on first publish (since npm defaults scoped packages to private):
|
||||
|
||||
```bash
|
||||
npm publish --access public
|
||||
```
|
||||
|
||||
## Step 4: Users install
|
||||
|
||||
```bash
|
||||
pi install npm:@mostalive/pi-turn-limit
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cheaper Alternative: Scoped Git Package
|
||||
|
||||
If you don't want to pay for an npm org, you can ship via git without scoping:
|
||||
|
||||
```bash
|
||||
pi install git:github.com/mostalive/pi-turn-limit
|
||||
```
|
||||
|
||||
No npm org needed. Users install directly from your GitHub repo. You'd still need to publish to npm for the `npm:` install path, but the git path is free.
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| Approach | Cost | User installs via |
|
||||
|----------|------|-------------------|
|
||||
| `npm org create` + scoped npm | ~$7/mo | `pi install npm:@mostalive/pi-turn-limit` |
|
||||
| GitHub repo (no scope) | Free | `pi install git:github.com/user/repo` |
|
||||
| Unscoped npm (`pi-turn-limit`) | Free | `pi install npm:pi-turn-limit` |
|
||||
|
||||
If you already have a personal npm account named `mostalive`, the scope is free — scoped packages just use your existing account. The org fee only applies if you create a separate organization entity.
|
||||
152
working-with-extensions.md
Normal file
152
working-with-extensions.md
Normal file
@ -0,0 +1,152 @@
|
||||
# Working with Pi Extensions
|
||||
|
||||
## Installation Options
|
||||
|
||||
### Option 1: Publish to npm + `pi install` (Recommended)
|
||||
|
||||
The cleanest path that replicates the official pi experience.
|
||||
|
||||
**You (publishing):**
|
||||
|
||||
```bash
|
||||
cd packages/pi-turn-limit
|
||||
npm publish
|
||||
```
|
||||
|
||||
**Users (installing globally):**
|
||||
|
||||
```bash
|
||||
pi install npm:pi-turn-limit
|
||||
```
|
||||
|
||||
This writes to `~/.pi/agent/settings.json` under `packages`. Pi handles the install, runs `npm install`, and auto-discovers the extension from the `pi.extensions` manifest.
|
||||
|
||||
### Option 2: npm global install + settings.json
|
||||
|
||||
**You (publishing):**
|
||||
|
||||
```bash
|
||||
npm publish
|
||||
```
|
||||
|
||||
**Users:** Two steps — install the npm package globally, then tell pi about it:
|
||||
|
||||
```bash
|
||||
npm install -g pi-turn-limit
|
||||
```
|
||||
|
||||
Then in `~/.pi/agent/settings.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"packages": [
|
||||
"npm:pi-turn-limit"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Or use the same command as Option 1 — `pi install npm:pi-turn-limit` does both steps.
|
||||
|
||||
### Option 3: Local directory (for development)
|
||||
|
||||
For local testing without publishing:
|
||||
|
||||
```bash
|
||||
pi install /Users/willem/dev/spikes/llm/monotonic-pi-extensions/packages/pi-turn-limit
|
||||
```
|
||||
|
||||
Or in `~/.pi/agent/settings.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"packages": [
|
||||
"/Users/willem/dev/spikes/llm/monotonic-pi-extensions/packages/pi-turn-limit"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Or as a single-file extension in `~/.pi/agent/extensions/`:
|
||||
|
||||
```bash
|
||||
cp packages/pi-turn-limit/src/turn-limit.ts ~/.pi/agent/extensions/turn-limit.ts
|
||||
```
|
||||
|
||||
### Option 4: Per-repo project-local install
|
||||
|
||||
Users can install an extension only for a specific project:
|
||||
|
||||
```bash
|
||||
pi install -l npm:pi-turn-limit # -l = project-local
|
||||
```
|
||||
|
||||
This writes to `.pi/settings.json` in the project root. Pi auto-installs missing packages on startup per-project.
|
||||
|
||||
---
|
||||
|
||||
## Disabling Extensions Per-Repo
|
||||
|
||||
Three approaches:
|
||||
|
||||
### A. `pi config` (simplest)
|
||||
|
||||
```bash
|
||||
pi config turn-limit:off # Disable by extension name
|
||||
pi config turn-limit:on # Re-enable
|
||||
```
|
||||
|
||||
Works for both global and project scope. Per-repo:
|
||||
|
||||
```bash
|
||||
pi config -l turn-limit:off
|
||||
```
|
||||
|
||||
### B. Package filtering in project `settings.json`
|
||||
|
||||
In `.pi/settings.json` (project-local):
|
||||
|
||||
```json
|
||||
{
|
||||
"packages": [
|
||||
{
|
||||
"source": "npm:pi-turn-limit",
|
||||
"extensions": [] // Load none
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Or filter specific files:
|
||||
|
||||
```json
|
||||
{
|
||||
"packages": [
|
||||
{
|
||||
"source": "npm:pi-turn-limit",
|
||||
"extensions": ["!src/turn-limit.ts"] // Exclude this one
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### C. Remove from settings entirely
|
||||
|
||||
```bash
|
||||
pi remove npm:pi-turn-limit
|
||||
```
|
||||
|
||||
Or manually edit `~/.pi/agent/settings.json` and remove the package entry.
|
||||
|
||||
---
|
||||
|
||||
## Summary Table
|
||||
|
||||
| Method | Scope | User Command |
|
||||
|--------|-------|--------------|
|
||||
| `pi install npm:pkg` | Global | One command, handles everything |
|
||||
| `npm i -g` + settings.json | Global | Two steps |
|
||||
| `pi install ./path` | Global (symlink-style) | Local dev |
|
||||
| `pi install -l npm:pkg` | Project-local | Per-repo |
|
||||
| `pi config name:off` | Toggle | Enable/disable without uninstalling |
|
||||
| `pi config -l name:off` | Project-local toggle | Per-repo disable |
|
||||
|
||||
**Recommendation:** Publish to npm, then users run `pi install npm:pi-turn-limit`. For disabling per-repo, `pi config -l turn-limit:off` is the simplest approach — a one-liner that doesn't require editing JSON files.
|
||||
Loading…
x
Reference in New Issue
Block a user