diff --git a/app/priv/blog/engineering/2026/04-24-smaller-open-llms-now-work-for-open-agents.md b/app/priv/blog/engineering/2026/04-24-smaller-open-llms-now-work-for-open-agents.md index 78a3b02..2616af6 100644 --- a/app/priv/blog/engineering/2026/04-24-smaller-open-llms-now-work-for-open-agents.md +++ b/app/priv/blog/engineering/2026/04-24-smaller-open-llms-now-work-for-open-agents.md @@ -7,15 +7,18 @@ } --- -A combination of people working on open weights inference servers implementing research papers that came out at a rapid clip with strong open weight model releases by Qwen (3.5, 3.6, various sizes) and Google (Gemma4) in the last month or two arrived at my desk last week and this week. I said "adjusting" in the description, but maybe reeling is a more accurate description of what is happening. +I am replacing most, if not all, of my Claude Code workflows with [pi.dev](https://pi.dev), an open source coding agent, and local LLMs running on my laptop. If you don't have the hardware, smaller models are also cheap(er) to run on hosted services like [Open Router](https://openrouter.ai). As prices of frontier models continue to rise, and subscription plans are watered down, the capability and speed of open weight and open source models continues increasing. The last month saw a step change, with a couple releases from last week (Qwen 3.6 and several inference servers implementing performance improvements - more speed, less memory) marking a step change in user experience. + +I wrote "adjusting" in the description, but maybe reeling is maybe a more accurate description of what is happening. I wrote this more general note, as I plan to write some how-to's to make decoupling (coding) agents from inference hosting approachable for more people. We're all still figuring this out, but once you are up and running, you can work with your local agent to improve your tools and way of working in small steps. I started using open weights models and open source coding tools in 2024, and have on and off kept doing that alongside Claude Code. The second half of 2025 saw some strong open agents (most notable pi.dev and Open Code]), and now paired with strong, affordable models I can do serious development work with an agent from the comfort of my own laptop. Large Language Models are making several kinds of knowledge and ways of working more accessible and shorten some feedback loops. I am not keen on depending on frontier labs who have no customer service whatsoever, and are operating on models that are financially not sustainable. A friend of mine set up an account to use Claude Code, it got shut down without explanation. This seems to be very common at the moment. The customer service consists of a form from the sound of it, and no response. -Note that I say "small open LLMs" - you can run these locally, but if you don't want to spend money on hardware, there are services like [Open Router](https://openrouter.ai) that host these as well, for very little. Before I fell into a Claude Code subscription, I used Open Router with a local coding agent (Aider), and spent maybe 25$ in half a year. Admittedly, it was occasional use and smaller bits of work. Open Router serves both open weight and frontier lab models, pricing is transparent. Claude Code was? ---- +Before I fell into a Claude Code subscription, I used Open Router with a local coding agent (Aider), and spent maybe 25$ in half a year. Admittedly, it was occasional use and smaller bits of work. Open Router serves both open weight and frontier lab models, pricing is transparent. So you can experiment. Enterprise AI use is likely to also become more price sensitive - it is easy to burn through a year's worth of AI expense in a quarter, as some are now finding out. + I have been using Claude Code for about a year. I noticed last week I started talking in the past tense about it. I hesitated writing a clickbait title: "Claude code was?". Late november marked a step change in how well claude code worked - the release of the Opus 4.5 model combined with Anthropics' long running agents paper made that I could brainstorm an idea for a fairly complicated web app iteratively, and then fairly easily and reliably build it. Giving me time to do exploratory testing and focus on user experience. That also gave me some anxiety. What if my account got pulled for whatever reason? It was clear from the start of the monthly subscriptions (to me and many others. not everyone apparently) that this was not sustainable, and that at some point they would have to raise prices. Anthropic is now putting more and more of their stuff behind per-token pricing. And their models are expensive. @@ -44,5 +47,5 @@ Where to start? You can try models in the browser on Open Router and Hugging Face, or install "Google Edge" or another app on your phone: that will run a small model right on your phone. And you can see what they are like. Then you can hook up an API (say Open Router, but also can be a frontier model from Anthropic, Google or OpenAI) to a local coding agent. OpenCode is easy to set up out of the box, and comes with a free cloud model, so you are good to go. I didn't like that, because it defaults to that free cloud model if you make a typo in the local, private model you want. I use [Pi.dev](https://pi.dev) at the moment, which is more minimal and "fail fast". I'll write some more "how to posts", because people ask me what my setup looks like. -I had to get the "where I'm at" post out first, apparently. +I had to get the "where I'm at" post out first, apparently. Where are you at?