%{
  title: "Coding agent generates its' own extensions",
  author: "Willem van den Ende",
  tags: ~w(ai loops),
  description: "Handwritten note",
  published: false
}
---

I see a few people writing about
sharing struggles with Llm's.

for me it is easier to do at
the moment of a small success.

The challenge with writing about
my experiments is that it gets meta pretty quickly.

therefore I am going to leave out
a bunch of things, including this commentary.

the other day I tried to develop on
extension for pi - the shitty coding agent, that would stop a model when it
goes off the rails.

I now have a local model that
is fast, can call tools (edit Files, run test, etc) edit code etc.

It does, however, perform some model assisted coding quinks frequently:

- replace production code that Works With throwing an exception

- write if statements in tests
- add fallbacks for things that can't fail

- Find "problems" in that works
	(passes tests + other checks, works for
		the user etc)

Long form the Solution probably is to work
in small Steps. But these steps come From
experience.

Catching the hovel when it happens by
simply matching Some Lords is a starting point for that: scan for key words in
any edits on a prompt the user for permission, or abort when the session is not interactive.

Looks simple, so I let a more powerful
but slower local model figure out how to build an extensions - pi has a system prompt for that -. After cone iteration
we had a plan and pi generated d
plausible looking extension.

I tested it manually, in pi. Nothing happened. Back to the drawing board.

I had quite a few iterations, compared with
sample code, looked into the pi API,
70 luck.

Eventually I installed the sample extension. that worked. then I deleted most of my
extension, added some logging - I could
see sore thing.

I learned while a bit about pi and its extension mechanism.

It looks like only the last "UI notification" gets shown for any exension point (e.g. a fool call or system startup).

I am not get sure if this is by design or not.

I did take away the, here too, I wont to work test-first for parts that do not interface
with the agent directly. The feedback loop
is just too slow.

this also ren vired experimentation. I did not want to set up a separate project for
an extension that is little more than an idea. But I do want tests

to asked a model again. and suggestion was
to use Deno, because that has testing built
in. Some more Fiddlig Adowed:

- get Dena to work in the Sandbox
- Learn that pi auto loads any thing in
	the excusions folder. If you put a test there,
	pi crashes
- learn that" domain" files also don't work there.

So eventually I ended up with

```
- .pi/ test
	/ core
	/ extensions
```

Core contains the functional cores tests ter the core, exertion, is a thin integration with
pi that uses the core.

this was clear enough that the slow, dense model could build a second extension & performance metrics in chat.) with relatively little guidance after iterations on a plan.

I haven't looked at the code bet. not
out of principle, but because it is late,
and I want to write down my trial and error before I forget.

TODO link pdf
TODO add image from scan (in downloads)