as it happens - draft, needs editing
This commit is contained in:
parent
6eee29a9f2
commit
efc50d4ba8
@ -0,0 +1,104 @@
|
|||||||
|
%{
|
||||||
|
title: "Coding agent generates its' own extensions",
|
||||||
|
author: "Willem van den Ende",
|
||||||
|
tags: ~w(ai loops),
|
||||||
|
description: "Handwritten note",
|
||||||
|
published: false
|
||||||
|
}
|
||||||
|
---
|
||||||
|
|
||||||
|
I see a few people writing about
|
||||||
|
sharing struggles with Llm's.
|
||||||
|
|
||||||
|
for me it is easier to do at
|
||||||
|
the moment of a small success.
|
||||||
|
|
||||||
|
The challenge with writing about
|
||||||
|
my experiments is that it gets meta pretty quickly.
|
||||||
|
|
||||||
|
therefore I am going to leave out
|
||||||
|
a bunch of things, including this commentary.
|
||||||
|
|
||||||
|
the other day I tried to develop on
|
||||||
|
extension for pi - the shitty coding agent, that would stop a model when it
|
||||||
|
goes off the rails.
|
||||||
|
|
||||||
|
I now have a local model that
|
||||||
|
is fast, can call tools (edit Files, run test, etc) edit code etc.
|
||||||
|
|
||||||
|
It does, however, perform some model assisted coding quinks frequently:
|
||||||
|
|
||||||
|
- replace production code that Works With throwing an exception
|
||||||
|
|
||||||
|
- write if statements in tests
|
||||||
|
- add fallbacks for things that can't fail
|
||||||
|
|
||||||
|
- Find "problems" in that works
|
||||||
|
(passes tests + other checks, works for
|
||||||
|
the user etc)
|
||||||
|
|
||||||
|
Long form the Solution probably is to work
|
||||||
|
in small Steps. But these steps come From
|
||||||
|
experience.
|
||||||
|
|
||||||
|
Catching the hovel when it happens by
|
||||||
|
simply matching Some Lords is a starting point for that: scan for key words in
|
||||||
|
any edits on a prompt the user for permission, or abort when the session is not interactive.
|
||||||
|
|
||||||
|
Looks simple, so I let a more powerful
|
||||||
|
but slower local model figure out how to build an extensions - pi has a system prompt for that -. After cone iteration
|
||||||
|
we had a plan and pi generated d
|
||||||
|
plausible looking extension.
|
||||||
|
|
||||||
|
I tested it manually, in pi. Nothing happened. Back to the drawing board.
|
||||||
|
|
||||||
|
I had quite a few iterations, compared with
|
||||||
|
sample code, looked into the pi API,
|
||||||
|
70 luck.
|
||||||
|
|
||||||
|
Eventually I installed the sample extension. that worked. then I deleted most of my
|
||||||
|
extension, added some logging - I could
|
||||||
|
see sore thing.
|
||||||
|
|
||||||
|
I learned while a bit about pi and its extension mechanism.
|
||||||
|
|
||||||
|
It looks like only the last "UI notification" gets shown for any exension point (e.g. a fool call or system startup).
|
||||||
|
|
||||||
|
I am not get sure if this is by design or not.
|
||||||
|
|
||||||
|
I did take away the, here too, I wont to work test-first for parts that do not interface
|
||||||
|
with the agent directly. The feedback loop
|
||||||
|
is just too slow.
|
||||||
|
|
||||||
|
this also ren vired experimentation. I did not want to set up a separate project for
|
||||||
|
an extension that is little more than an idea. But I do want tests
|
||||||
|
|
||||||
|
to asked a model again. and suggestion was
|
||||||
|
to use Deno, because that has testing built
|
||||||
|
in. Some more Fiddlig Adowed:
|
||||||
|
|
||||||
|
- get Dena to work in the Sandbox
|
||||||
|
- Learn that pi auto loads any thing in
|
||||||
|
the excusions folder. If you put a test there,
|
||||||
|
pi crashes
|
||||||
|
- learn that" domain" files also don't work there.
|
||||||
|
|
||||||
|
So eventually I ended up with
|
||||||
|
|
||||||
|
```
|
||||||
|
- .pi/ test
|
||||||
|
/ core
|
||||||
|
/ extensions
|
||||||
|
```
|
||||||
|
|
||||||
|
Core contains the functional cores tests ter the core, exertion, is a thin integration with
|
||||||
|
pi that uses the core.
|
||||||
|
|
||||||
|
this was clear enough that the slow, dense model could build a second extension & performance metrics in chat.) with relatively little guidance after iterations on a plan.
|
||||||
|
|
||||||
|
I haven't looked at the code bet. not
|
||||||
|
out of principle, but because it is late,
|
||||||
|
and I want to write down my trial and error before I forget.
|
||||||
|
|
||||||
|
TODO link pdf
|
||||||
|
TODO add image from scan (in downloads)
|
||||||
Loading…
x
Reference in New Issue
Block a user