%{ title: "Coding agent generates its' own extensions", author: "Willem van den Ende", tags: ~w(ai loops), description: "Handwritten note", published: false } --- I see a few people writing about sharing struggles with Llm's. for me it is easier to do at the moment of a small success. The challenge with writing about my experiments is that it gets meta pretty quickly. therefore I am going to leave out a bunch of things, including this commentary. the other day I tried to develop on extension for pi - the shitty coding agent, that would stop a model when it goes off the rails. I now have a local model that is fast, can call tools (edit Files, run test, etc) edit code etc. It does, however, perform some model assisted coding quinks frequently: - replace production code that Works With throwing an exception - write if statements in tests - add fallbacks for things that can't fail - Find "problems" in that works (passes tests + other checks, works for the user etc) Long form the Solution probably is to work in small Steps. But these steps come From experience. Catching the hovel when it happens by simply matching Some Lords is a starting point for that: scan for key words in any edits on a prompt the user for permission, or abort when the session is not interactive. Looks simple, so I let a more powerful but slower local model figure out how to build an extensions - pi has a system prompt for that -. After cone iteration we had a plan and pi generated d plausible looking extension. I tested it manually, in pi. Nothing happened. Back to the drawing board. I had quite a few iterations, compared with sample code, looked into the pi API, 70 luck. Eventually I installed the sample extension. that worked. then I deleted most of my extension, added some logging - I could see sore thing. I learned while a bit about pi and its extension mechanism. It looks like only the last "UI notification" gets shown for any exension point (e.g. a fool call or system startup). I am not get sure if this is by design or not. I did take away the, here too, I wont to work test-first for parts that do not interface with the agent directly. The feedback loop is just too slow. this also ren vired experimentation. I did not want to set up a separate project for an extension that is little more than an idea. But I do want tests to asked a model again. and suggestion was to use Deno, because that has testing built in. Some more Fiddlig Adowed: - get Dena to work in the Sandbox - Learn that pi auto loads any thing in the excusions folder. If you put a test there, pi crashes - learn that" domain" files also don't work there. So eventually I ended up with ``` - .pi/ test / core / extensions ``` Core contains the functional cores tests ter the core, exertion, is a thin integration with pi that uses the core. this was clear enough that the slow, dense model could build a second extension & performance metrics in chat.) with relatively little guidance after iterations on a plan. I haven't looked at the code bet. not out of principle, but because it is late, and I want to write down my trial and error before I forget. TODO link pdf TODO add image from scan (in downloads)