For example, if I were building real software, I would design everything from policy to error logging policies and so on. But when writing a blog post, it's just simplified into a short runnable script.
2. The content is lower quality.
The strengths are that the design forces Chain of Thought as a memory buffer and the TODO list in an FSM style. I think those are fine. The recovery strategy is also pretty good.
However, the problem is that the business logic does not run as Python code but lives inside the prompt. And it does not support parallel execution. But as a single run script, it is helpful enough for understanding the concept.
Of course, if I were to do the code properly, I would use a separate storage instead of in memory, and more carefully verify tool constraints and the actual scope limitations of the tools. But still, I think this is helpful enough.
I don't think the content is low quality, though.
Please stop posting.
They just took undefined behaviour and called it unsafe. Theyve not really solved anything. Even their own std lib has security bugs in unsafe code.
And their only ever retort is "there are thousands of these bugs a day in c code"... Let's wait until rust gets used seriously in the systems and embedded space first, no point comparing c to minnows like rust when it comes to total cves.
None of them have been worth it. A year ago the models needed to be reminded. Today they can follow a plan from text alone. This is my experience from working on a project alone - in teams ... i actually think the same lesson holds in the new AI paradigm.
My current scheme is basically this - in order of the task's complexity:
- Tell an agent to do something
- Tell an agent to make a plan then tell it to execute on it.
- Tell an agent to make a plan, write to a file, have a subagent review it, then execute it.
- Do the above, but instead tell the agent they're in a supervise mode and to have subagents implement as many phases and rollover with a handoff.md while they, as the supervisor agent, keeps driving the task to completion.
The latter two i have under a sigil so they're prepared prompts i can inject with a few keystrokes.
If i feel very fancy i'll tell them to update the plan with a checklist and add checkboxes, but it just doesn't pay enough to have 'init-prompt' level planning feature or tools if in the same context you already have files/read/write.
It's not about enhancing Claude. This article is about creating your own agent, and giving it the ability to create plans and tasks list for its or.
The way Claude code creates plans and tasks list for itself.
The article is about creating that in your own harness for things not using claude code, like say a custom LLM integration in your own web app.
Why can't you do the planning ? Figure out what needs to be done , break it down into small tasks and then ask the agent to execute those small tasks?
When we executed projects in the past, this is what I would do as a lead: figure out the overall software architecture and delegate the tasks to developers.
This way I always knew how the system worked and could extend it as needed. I am not in development role anymore but I am trying to understand why we are delegating planning and software architecture to coding agents?
Maybe I spent 2-4 hours reviewing it, checking things with colleagues etc.
Then I press "go" and maybe an hour later I have a tested system ready for manual review.
It's plans are at least as good as any I've seen. Their weakness is if there are unstated assumptions I have about how things need to be done, so most of my time is now getting those assumptions stated properly and then reviewing.
Why wouldn't I use this? It's the best tool I've used in my 30 years of professional programming.