Filesystem as profile manager is not something I thought I'd be doing, but here we are.
Isn’t that sub agents?
I've been emulating this in claude code by manually @tagging markdown files containing guides for common tasks in our repository. Nice to see that this step is now automatic as well.
https://github.com/anthropics/skills/blob/main/document-skil...
I was dealing with 2 issues this morning getting Claude to produce a .xlsx that are covered in the doc above
Feels like fair bit of overlap here. It's ok to proceed in a direction where you are upgrading the spec and enabling claude wth additional capabilities. But one can pretty much use any of these approaches and end up with the same capability for an agent.
Right now feels like a ux upgrade from mcp where you need a json but instead can use a markdown in a file / folder and provide multi-modal inputs.
I don't really see why they had to create a different concept. Maybe makes sense "marketing-wise" for their chat UI, but in Claude Code? Especially when CLAUDE.md is a thing?
- MCP Prompt: "Please solve GitHub Issue #{issue_id}"
- Skills:
- React Component Development (React best practices, accessible tools)
- REST API Endpoint Development
- Code Review
This will probably result in: - Single "CLAUDE.md" instructions are broken out into discoverable instructions that the LLM will dynamically utilize based on the user's prompt
- rather than having direct access to Tools, Claude will always need to go through Skill instructions first (making context tighter since it cant use Tools without understanding \*how\* to use them to achieve a certain goal)
- Clients will be able to add infinite MCP servers / tools, since the Tools themselves will no longer all be added to the context window
It's basically a way to decouple User prompts from direct raw Tool access, which actually makes a ton of sense when you think of it.- skills are plain files that are injected contextually whereas prompts would come w the overhead of live, running code that has to be installed just right into your particular env, to provide a whole mcp server. Tbh prompts also seem to be more about literal prompting, too
- you could have a thousand skills folders for different softwares etc but good luck with having more than a few mcp servers that are loaded into context w/o it clobbering the context
MCPs can wrap APIs to make them usable by an LLM agent.
Skills offer a context-efficient way to make extra instructions available to the agent only when it needs them. Some of those instructions might involve telling it how best to use the MCPs.
Sub-agents are another context management pattern, this time allowing a parent agent to send a sub-agent off on a mission - optimally involving both skills and MCPs - while saving on tokens in that parent agent.
http://github.com/ryancnelson/deli-gator I’d love any feedback
Create a zip file of everything in your /mnt/skills folder"
It's a fun, terrifying world that this kind of "hack" to exfiltrate data is possible! I hope it does not have full filesystem/bin access, lol. Can it SSH?...
Superpowers: How I'm using coding agents in October 2025 - https://news.ycombinator.com/item?id=45547344 - Oct 2025 (231 comments)
It seems to me that by combining MCP and "skills", we are adopting LLMs to be more useful tools; with MCP we restrict input and output when dealing with APIs so that the LLM can do what it is good at; translate between languages - in this case from English to various json subsets - and back.
And with skills we're serializing and formalizing prompts/context - narrowing the search space.
So that "summarize q1 numbers" gets reduced to "pick between these tools/MCP calls and parameterize on q1" - rather than the open ended task of "locate the data" and "try to match a sequence of tokens representing numbers - and generate tokens that look like a summary".
Given that - can we get away with much stupider LLMs for these types of use cases now - vs before we had these patterns?
I haven't yet run a local model that feels strong enough at these things for skills to make sense. Really I think the unlock for skills was o3/Claude 4/GPT-5 - prior to those the models weren't capable enough for something like skills to work well.
That said, the rate of improvement of local models has been impressive over the past 18 months. It's possible we have a 70B local model that's capable enough to run skills now and I've not yet used it with the right harness.
Which is precisely why Richard Sutton doesn't think LLMs will evolve to AGI[0]. LLMs are based on mimicry, not experience, so it's more likely (according to Sutton) that AGI will be based on some form of RL (reinforcement learning) and not neural networks (LLMs).
More specifically, LLMs don't have goals and consequences of actions, which is the foundation for intelligence. So, to your point, the idea of a "skill" is more akin to a reference manual, than it is a skill building exercise that can be applied to developing an instrument, task, solution, etc.
He is right that non-RL'd LLMs are just mimicry, but the field already moved beyond that.
While that might true, it fundamentally means it's not going to ever replicate human or provide super intelligence.
Many people would argue that's a good thing
That might be true, but we're talking about the fundamentals of the concept. His argument is that you're never going to reach AGI/super intelligence on an evolution of the current concepts (mimicry) even through fine tuning and adaptions - it'll like be different (and likely based on some RL technique). At least we have NO history to suggest this will be case (hence his argument for "the bitter lesson").
At the very end of an extremely long and sophisticated process, the final mapping is softmax transformed and the distribution sampled. That is one operation among hundreds of billions leading up to it.
It’s like saying is a jeopardy player is random word generating machine — they see a question and they generate “what is “ followed by a random word—random because there is some uncertainty in their mind even in the final moment. That is both technically true, but incomplete, and entirely missing the point.
But this is easier said than done. Current models require vastly more learning events than humans, making direct supervision infeasable. One strategy is to train models on human supervisors, so they can bear the bulk of the supervision. This is tricky, but has proven more effective than direct supervision.
But, in my experience, AIs don't specifically struggle with the "qualitative" side of things per-se. In fact, they're great at things like word choice, color theory, etc. Rather, they struggle to understand continuity, consequence and to combine disparate sources of input. They also suck at differentiating fact from fabrication. To speculate wildly, it feels like it's missing the the RL of living in the "real world". In order to eat, sleep and breath, you must operate within the bounds of physics and society and live forever with the consequences of an ever-growing history of choices.
Which eventually forces you to take a step back and start questioning basic assumptions until (hopefully) you get a spark of realization of the flaws in your original plan, and then recalibrate based on that new understanding and tackle it totally differently.
But instead I watch Claude struggling to find a directory it expects to see and running random npm commands until it comes to the conclusion that, somehow, node_modules was corrupted mysteriously and therefore it needs to wipe everything node related and manually rebuild the project config by vague memory.
Because no big deal, if it’s wrong it’s the human's problem to untangle and Anthropic gets paid either way so why not try?
In fairness I have on many an occasion worked with real life software developers who really should know better deciding the problem lies anywhere but their initial model of how this should work. Quite often that developer has been me, although I like to hope I've learned to be more skeptical when that thought crosses my mind now.
> More specifically, LLMs don't have goals and consequences of actions, which is the foundation for intelligence.
Citation?
And I associate that part to AGI being able to do cutting edge research and explore new ideas like humans can. Where, when that seems to “happen” with LLMs it’s been more debatable. (e.g. there was an existing paper that the LLM was able to tap into)
I guess another example would be to get an AGI doing RL in realtime to get really good at a video game with completely different mechanics in the same way a human could. Today, that wouldn’t really happen unless it was able to pre-train on something similar.
ChatGPT broke upen the dam to massive budget on AI/LM and LLM will probably be a puzzle peace to AGI. But otherwise?
I mean it should be clear that we have so much work to do like RL (which now happens btw. on massive scale because you thumb up or down every day), thinking, Model of Experts, toolcalling and super super critical: Architecture.
Compute is a hard upper limit too.
And the math isn't done either. The performance of Context length has advanced, we also saw other approcheas like a diffusion based models.
Whenever you hear the leading experts talking, they mention world models.
We are still in a phase were we have plenty of very obivous ideas people need to try out.
But alone the quality of whispher, llm as an interface and tool calling can solve problems with robotics and stuff, no one was able to solve that easy ever before.
You may disagree with this take but its not uninformed. Many LLMs use self‑supervised pretraining followed by RL‑based fine‑tuning but that's essentially it - it's fine tuning.
Also how do you think the most successful RL models have worked? AlphaGo/AlphaZero both use Neural Networks for their policy and value networks which are the central mechanism of those models.
On the other hand, LLMs have a programatic context with consistent storage and the ability to have perfect recall, they just don't always generate the expected output in practice as the cost to go through ALL context is prohibitive in terms of power and time.
Skills.. or really just context insertion is simply a way to prioritize their output generation manually. LLM "thinking mode" is the same, for what it's worth - it really is just reprioritizing context - so not "starting from scratch" per se.
When you start thinking about it that way, it makes sense - and it helps using these tools more effectively too.
I’d been re-teaching Claude to craft Rest-api calls with curl every morning for months before i realized that skills would let me delegate that to cheaper models, re-using cached-token-queries, and save my context window for my actual problem-space CONTEXT.
what the fuck, there is absolutely no way this was cheaper or more productive than just learning to use curl and writing curl calls yourself. Curl isn't even hard! And if you learn to use it, you get WAY better at working with HTTP!
You're kneecapping yourself to expend more effort than it would take to just write the calls, helping to train a bot to do the job you should be doing
You are bad at reading comprehension. My comment meant I can tell Claude “update jira with that test outcome in a comment” and, Claude can eventually figure that out with just a Key and curl, but that’s way too low level.
What I linked to literally explains that, with code and a blog post.
Not really. It's a consequential issue. No matter how big or small the context window is, LLMs simply do not have the concept of goals and consequences. Thus, it's difficult for them to acquire dynamic and evolving "skills" like humans do.
Of course OpenAI and Anthropic want to be able to reuse the same servers/memory for multiple users, otherwise it would be too expensive.
Could we have "personal" single-tenant setups? Where the LLM incorporates every previous conversation?
Not OP, but this is the part that I take issue with. I want to forget what tools are there and have the LLM figure out on its own which tool to use. Having to remember to add special words to encourage it to use specific tools (required a lot of the time, especially with esoteric tools) is annoying. I’m not saying this renders the whole thing “useless” because it’s good to have some idea of what you’re doing to guide the LLM anyway, but I wish it could do better here.
ooh, it does call make when I ask it to compile, and is able to call a couple other popular tools without having to refer to them by name. if I ask it to resize an image, it'll call imagemagik, or run ffmpeg and I don't need to refer to ffmpeg by name.
so at the end of the day, it seems they are their training data, so better write a popular blog post about your one-off MCP and the tools it exposes, and maybe the next version of the LLM will have your blog post in the training data and will automatically know how to use it without having to be told
I installed ImageMagik on Windows.
Created a ".claude/skills/Image Files/" folder
Put an empty SKILLS.md file in it
and told Claude Code to fill in the SKILLS.md file itself with the path to the binaries.
and it created all the instructions itself including examples and troubleshooting
and in my project prompted
"@image.png is my base icon file, create all the .ico files for this project using your image skill"
and it all went smoothly
You probably mean "starting from square one" but yeah I get you
The description is equivalent to your short term memory.
The skill is like your long term memory which is retrieved if needed.
These should both be considered as part of the AI agent. Not external things.
For folks who this seems elusive for, it's worth learning how the internals actually work, helps a great deal in how to structure things in general, and then over time as the parent comment said, specifically for individual cases.
For the AIs to interface with the rich existing toolset for refactoring code from the pre-AI era.
E.g., if it decides to rename a function, it resorts to grepping and fixing all usages 'manually', instead of invoking traditional static code analysis tools to do the change.
But context engineering very much not going anywhere as a discipline. Bigger and better models will by no means make it obsolete. In fact, raw model capability is pretty clearly leveling off into the top of an S-curve, and most real-world performance gains over the last year have been precisely because of innovations on how to better leverage context.
Of course this is why the model providers keep shipping new ones; without them their product is a commodity.
Plugins include: * Commands * MCPs * Subagents * Now, Skills
Marketplaces aggregate plugins.
From a technical perspective, it seems like unnecessary complexity in a way. Of course I recognize there are lot of product decisions that seem to layer on 'unnecessary' abstractions but still have utility.
In terms of connecting with customers, it seems sensible, under the assumption that Anthropic is triaging customer feedback well and leading to where they want to go (even if they don't know it yet).
Update: a sibling comment just wrote something quite similar: "All these things are designed to create lock in for companies. They don’t really fundamentally add to the functionality of LLMs." I think I agree.
This stuff is like front end devs building fad add-ons which call into those core elements and falsely market themselves as fundamental advancements.
But this is not the feature they should or could have built, at least for Claude Code. CC already had a feature very similar to this -- subagents (or agents-as-tools).
Like Skills, Subagents have a metadata description that allows the model to choose to use them in the right moment.
Like Skills, Subagents can have their own instruction MD file(s) which can point to other files or scripts.
Like Skills, Subagents can have their own set of tools.
But critically, unlike Skills, Subagents don't pollute the main agent's context with noise from the specialized task.
And this, I think, is a major product design failure on Anthropic's part. Instead of a new concept, why not just expand the Subagent concept with something like "context sharing" or "context merging" or "in-context Subagents", or add the ability for the user to interactively chat with the Subagent via the normal CLI chat?
Now people have to choose between Skill and Subagent for what I think will be very similar or identical use cases, when really the choice of how this extra prompting/scripting should relate to the agent loop should be a secondary configuration choice rather than a fundamental architecture one.
Looking forward to a Skill-Subagenr shim that allows for this flexibility. Not thrilled that a hack like that is necessary, but I guess it's nice that CC's use of simple MD files on disk make it easy enough to accomplish.
A subagent can use a skill.
A skill can encourage the agent to run a subagent.
Let me just say: I'm nitpicking what I think is overall an incredible tool and a great new feature of said tool.
For a moment, pretend Subagents don't exist. And Anthropic just released "In-Thread Skills" (identical to what are now "Skills") and "Out-of-Thread Skills" (identical to Subagents). I feel like the library of Skills that would be published would be useful in more circumstances if this were the reality. Of course some may publish both versions of a thing, and of course you could do a shim of some kind, but it could be _nicer_.
Another similar thing: how are Skills different than the Slash Command Tool [0]? Why not just amend Slash Commands to allow them to include scripts and other supplementary files stored in a directory, and boom, you have Skills. Instead we have a net new primitive.
And the larger unfortunate reality is that because Claude Code is the white-hot center of this white-hot ecosystem, there are likely a dozen other tools in this space that are going to copy the exact same primitive set just to have perceived parity with CC.
I'm veering into "yelling at clouds" territory now, so I'll get off my soapbox. It's just one of those things that feels like it could be slightly more awesome than the awesome that it is, is all.
[0] https://docs.claude.com/en/docs/claude-code/slash-commands#s...
My guess is that, as I understand it, Anthropic’s belief is that subagents are usually not the proper tool for most tasks. In their guides and videos about proper use of subagents, they seem to really try to steer you toward “workflows” rather than subagents.
Maybe it’s time they rethink the overall strategy so that each new concept doesn’t have to be its own distinct feature (skills, plugins, marketplaces, subagents, etc).
https://www.anthropic.com/engineering/building-effective-age...
Across ChatGPT and Claude we now have tools, functions, skills, agents, subagents, commands, and apps, and there's a metastasizing complex of vibe frameworks feeding on this mess.
Yes, it's a mess, and there will be a lot of churn, you're not wrong, but there are foundational concepts underneath it all that you can learn and then it's easy to fit insert-new-feature into your mental model. (Or you can just ignore the new features, and roll your own tools. Some people here do that with a lot of success.)
The foundational mental model to get the hang of is really just:
* An LLM
* ...called in a loop
* ...maintaining a history of stuff it's done in the session (the "context")
* ...with access to tool calls to do things. Like, read files, write files, call bash, etc.
Some people call this "the agentic loop." Call it what you want, you can write it in 100 lines of Python. I encourage every programmer I talk to who is remotely curious about LLMs to try that. It is a lightbulb moment.
Once you've written your own basic agent, if a new tool comes along, you can easily demystify it by thinking about how you'd implement it yourself. For example, Claude Skills are really just:
1) Skills are just a bunch of files with instructions for the LLM in them.
2) Search for the available "skills" on startup and put all the short descriptions into the context so the LLM knows about them.
3) Also tell the LLM how to "use" a skill. Claude just uses the `bash` tool for that.
4) When Claude wants to use a skill, it uses the "call bash" tool to read in the skill files, then does the thing described in them.
and that's more or less it, glossing over a lot of things that are important but not foundational like ensuring granular tool permissions, etc.
One great thing about the MCP craze, is it has given vendors a motivation to expose APIs which they didn’t offer before - real example, Notion’s public REST API lacks support for duplicating pages.. yes their web UI can do it, calling their private REST API, but their private APIs are complex, undocumented, and could stop working at any time with no notice. Then they added it to their MCP server - and MCP is just a JSON-RPC API, you aren’t limited to only invoking it from an LLM agent, you can also invoke it from your favourite scripting language with no LLM involved at all
At least it's something we all reap the benefits of, even if MCP is really mostly just an api wrapper dressed up as "Advanced AI Technology."
PS. Just want to say, Notion MCP is still very buggy. It can't handle code block, nor large page very well
That description sounds a lot like PocketFlow, an AI/LLM development framework based on a loop that's about 100 lines of python:
https://github.com/The-Pocket/PocketFlow
(I'm not at all affiliated with Pocket Flow, I just recall watching a demo of it)
I’ve done this a few times (pre and post MCP) and learned a lot each time.
sounds like prompt is what you send, and caching is important here because what you send is derived from previous responses from llm calls earlier?
sorry to sound dense, I struggle to understand where and how in the mental model the non-determinism of a response is dealt with. is it just that it's all cached?
1) Maintaining the state of the "conversation" history with the LLM. LLMs are stateless, so you have to store the entire series of interactions on the client side in your agent (every user prompt, every LLM response, every tool call, every tool call result). You then send the entire previous conversation history to the LLM every time you call it, so it can "see" what has already happened. In a basic agent, it's essentially just a big list of strings, and you pass it into the LLM api on every LLM call.
2) "Prompt caching", which is a clever optimization in the LLM infrastructure to take advantage of the fact that most LLM interactions involve processing a lot of unchanging past conversation history, plus a little bit of new text at the end. Understanding it requires understanding the internals of LLM transformer architecture, but the essence of it is that you can save a lot of GPU compute time by caching previous result states that then become intermediate states for the next LLM call. You cache on the entire history: the base prompt, the user's messages, the LLM's responses, the LLM's tool calls, everything. As a user of an LLM api, you don't have to worry about how any of it works under the hood, you just have to enable it. The reason to turn it on is it dramatically increases response time and reduces cost.
Hope that clarifies!
I'm personally just curious how far, clever, insightful, any given product is "on top of" the foundation models. I'm not in it deep enough to make claims one way or the other.
So this shines a little more light, thanks!
Do you think a non-programmer could realistically build a full app using vibe coding?
What fundamentals would you say are essential to understand first?
For context, I’m in finance, but about 8 years ago I built a full app with Angular/Ionic (live on Play Store, under review on Apple Store at that time) after doing a Coursera specialization. That was my first startup attempt, I haven’t coded since.
My current idea is to combine ChatGPT prompts with Lovable to get something built, then fine-tune and iterate using Roo Code (VS plugin).
I’d love to try again with vibe coding. Any resources or directions you’d recommend?
For personal or professional use?
If you want to make it public I would say 0% realistic. The bugs, security concerns, performance problems etc you would be unable to fix are impossible to enumerate.
But even if you had a simple loging and kept people's email and password, you can very easily have insecure dbs, insecure protections against simple things like mysqliinjections etc.
You would not want to be the face of "vibe coder gives away data of 10k users"
If your app has to do something useful, your app just exploded in complexity and corner cases that you will have to account for and debug. Also, if it does anything interesting that the LLM has not yet seen a hundred thousand times, you will hit the manual button quite quickly.
Claude especially (with all its deserved praise) fantasizes so much crap together while claiming absolute authority in corner cases, it can become annoying.
For now, my MVP is pretty simple: a small app for people to listen to soundscapes for focus and relaxation. Even if no one uses, at least it's going to be useful to me and it will be a fun experiment!
I’m thinking of starting with React + Supabase (through Lovable), that should cover most of what I need early on. Once it’s out of the survival stage, I’ll look into adding more complex functionality.
Curious, in your experience, what’s the best way to keep things reliable when starting simple like this? And are there any good resources you can point to?
If I'd use Vibe coding I wouldn't use Lovable but Claude code. You can run it in your terminal.
And I would ask it to use NextAuth, NextJS and Prisma (or another ORM), and connect it with SQLite or an external MariaDB managed server (for easy development you can start with SQLLite, for deployment to vercel you need an external database).
People here shit on nextjs, but due to its extensive documentation & usage the LLM's are very good at building with it, and since it forces a certain structure it produces generally decently structured code that is workable for a developer.
Also vercel is very easy to deploy, just connect Github and you are done.
Make sure to properly use GIT and commit per feature, even better branch per feature. So you can easily revert back to old versions if Claude messed up.
Before starting, spend some time sparring with GPT5 thinking model to create a database scheme thats future proof before starting out. It might be a challenge here to find the right balance between over-engineering and simplicity.
One caveat: be careful to run migration on your production database with Claude. It can accidentally destroy it. So only run your claude code on test databases.
I’m not 100% set on Lovable yet. Right now I’m using Stitch AI to build out the wireframes. The main reason I was leaning toward Lovable is that it seems pretty good at UI design and layout.
How does Claude do on that front? Can it handle good UI structure or does it usually need some help from a design tool?
Also, is it possible to get mobile apps out of a Next.js setup?
My thought was to start with the web version, and later maybe wrap it using Cordova (or Capacitor) like I did years ago with Ionic to get Android/iOS versions. Just wondering if that’s still a sensible path today.
Definitely want to try this out. Any resources / etc. on getting started?
It uses Go, which is more verbose than Python would be, so he takes 300 lines to do it. Also, his edit_file tool could be a lot simpler (I just make my minimal agent "edit" files by overwriting the entire existing file).
I keep meaning to write a similar blog post with Python, as I think it makes it even clearer how simple the stripped-down essence of a coding agent can be. There is magic, but it all lives in the LLM, not the agent software.
Just have your agent do it.
(I am not snobbish about my code. If it works and is solid and maintainable I don't care if I wrote it or not. Some people seem to feel a sense of loss when an LLM writes code for them, because of The Craft or whatever. That's not me; I don't have my identity wrapped up in my code. Maybe I did when I was more junior, but I've been in this game long enough to just let it go.)
https://ravinkumar.com/GenAiGuidebook/language_models/Agents... https://github.com/canyon289/ai_agent_basics/blob/main/noteb...
You have to remember, every system or platform has a total complexity budget that effectively sits at the limit of what a broad spectrum of people can effectively incorporate into their day to day working memory. How it gets spent is absolutely crucial. When a platform vendor adds a new piece of complexity, it comes from the same budget that could have been devoted to things built on the platform. But unlike things built on the platform, it's there whether developers like it and use it or not. It's common these days that providers binge on ecosystem complexity because they think it's building differentiation, when in fact it's building huge barriers to the exact audience they need to attract to scale up their customer base, and subtracting from the value of what can actually be built on their platform.
Here you have a highly overlapping duplicative concept that's taking a solid chunk of new complexity budget but not really adding a lot of new capability in return. I am sure the people who designed it think they are reducing complexity by adding a "simple" new feature that does what people would otherwise have to learn themselves. It's far more likely they are at break even for how many people they deter vs attract from using their platform by doing this.
MCP allows anybody to extend their own LLM application's context and capabilities using pre-built *third party* tools.
Agent Skills allows you to let the LLM enrich and narrow down it's own context based on the nature of the task it's doing.
I have been using a home grown version of Agent Skills for months now with Claude in VSCode, using skill files and extra tools in folders for the LLM to use. Once you have enough experience writing code with LLMs, you will realize this is a natural direction to take for engineering the context of LLMs. Very helpful in pruning unnecessary parts from "general instruction files" when working on specific tasks - all orchestrated by the LLM itself. And external tools for specific tasks (such as finding out which cell in a jupyter notebook contains the code that the LLM is trying to edit, for example) make LLMs a lot more accurate and efficient, efficient because they are not burning through precious tokens to do the same and accurate because the tools are not stochastic.
With Claude Skills now I don't need to maintain my home grown contraption. This is a welcome addition!
This is also why not everyone is an early adopter. There are mental costs involved in staying on top of everything.
Usually, there are relatively few adopters of a new technology.
But with LLMs, it's quite the opposite: there was a huge number of early adopters. Some got extremely excited and run hundreds of agents all the time, some got burned and went back to the good old ways of doing things, whereas the majority is just using LLMs from time to time for various tasks, bigger of smaller.
https://en.wikipedia.org/wiki/Technology_adoption_life_cycle
I was able to try Beads[1] quickly with my framework and decided I like it enough to keep it. If I don't like it, just drop it, they're composable.
[0]: https://github.com/aperoc/toolkami.git [1]: https://github.com/steveyegge/beads
I like the trend where the agent decides what models, tooling and thought process to use. That seems to me far more powerful than asking users to create solutions for each discreet problem space.
Yeah, if you chase buzzword compliance and try to learn all these things outside of a particular use case you're going to burn out and have a bad time. So... don't?
AI will help you solve problems you wouldn't have without AI.
… jk… I’ll bet at least one person was like “ah, damnit, what did I miss…” for a second.
I’m surprised/disappointed that I haven’t seen any papers out of the programming languages community about how to integrate agentic coding with compilers/type system features/etc. They really need to step up, otherwise there’s going to be a lot of unnecessary CO2 produced by tools like this.
I focus on building projects delivering some specific business value and pick the tools that gets me there.
There is zero value in spending cycles by engaging in new tools hype.
You see a text file and understand that it can be anything, but end users can’t/won’t make the jump. They need to see the words Note, Reminder, Email, etc.
For those few who do write competent documentation and have well-organized file systems and the risk tolerance to allow LLMs to run roughshod over data, sure, there’s some potential here. Though if you’re already that far in, you’d likely be better off farming that grunt work to a Junior as a learning exercise than an LLM, especially since you’ll have to cleanup the output anyhow.
With the limited context windows of LLMs, you can never truly get this sort of concept to “stick” like you can with a human, and if you’re training an agent for this specific task anyway, you’re effectively locking yourself to that specific LLM in perpetuity rather than a replaceable or promotable worker.
Just…it makes me giggle, how optimistic they are that stars would align at scale like that in an organization.
If you are good a writing, documenting, planning? etc. - basically all the stuff in the SDLC that isn’t writing code, you’ll probably be much more effective at using LLMs for coding.
That's ONE of the long games that are currently played, and is arguably their fallback strategy: The equivalent of vendor lock-in but for LLM providers.
If any of these outfits truly cared about making AI accessible and beneficial to everyone, then all of them would be busting hump to distill models better to run on a wider variety of hardware, create specialized niches that collaborate with rather than seek to replace humans, and promote sovereignty over the AI models rather than perpetual licensing and dependency forever.
No, not one of these companies actually gives a shit about improving humanity. They’re all following the YC playbook of try everything, rent but never own, lock-in customers, and hope you get that one lucrative bite that allows for an exit strategy of some sort while promoting the hell out of it and yourself as the panacea to a problem.
OpenAI have gpt-oss-20b and 120b. Google have the Gemma 3 models. At this point the only significant AI lab that doesn't provide a locally executable model are Anthropic!
None of the present AI industry is operating in an ethical or responsible way, full stop. They know it, they admit to it when pressed, and nobody seems to give a shit if it means they can collapse the job market and make money for themselves. It’s “fuck you got mine” taken to a technological extreme.
The team is obviously doing a lot of cool things very rapidly, so I don't want to be too negative, but ... please just ask Claude to review your own docs before you merge a change.
Here are a few recent open ones: - "Documentation missing for new 'Explore' subagent" - https://github.com/anthropics/claude-code/issues/9595 - "Missing documentation for modifying tool inputs in PreToolUse hooks" - https://github.com/anthropics/claude-code/issues/9185 - "Missing Documentation for Various Claude Code Features (CLI Flags, Slash Commands, & Tools)" - https://github.com/anthropics/claude-code/issues/8584
I noticed the general tendency for overlap also when trying to update claude since 3+ methods conflicted with each other (brew, curl, npm, bun, vscode).
Might this be the handwriting of AI? ;)
From the Anthropic Engineering blog.
I think Skills will be useful in helping regular AI users and non-technical people fall into better patterns.
Many power users of AI were already doing the things it encourages.
You get the best of both worlds if you can select tokens by problem rather than by folder.
The key question is how effective this will be with tool calling.
I've asked Claude and this it answered this:
Skills = Instructions + resources for the current Claude instance (shared context)
Subagents = Separate AI instances with isolated contexts that can work in parallel (different context windows)
Skills make Claude better at specific tasks. Subagents are like having multiple specialized Claudes working simultaneously on different aspects of a problem.
I imagine we can probably compose them, e.g. invoke subagents (to keep separate context) which could use some skills to in the end summarize the findings/provide output, without "polluting" the main context window.Having a sub-agent "execute" a skill makes a lot of sense from a context management, perspective, but I think the way to think about it is that a sub-agent is an "execution-level" construct, whereas a skill is a "data-level" construct.
Just use slash commands, they work a lot better.
[1] https://docs.claude.com/en/docs/claude-code/plugin-marketpla...
But couldn't an MCP server expose a "help" tool?
What they're trying to do here is translate MCP servers to something more broadly useable by the population. They cannot differentiate themselves with model training anymore, so they have been focusing more and more on tooling development to grow revenue.
I think the big difference is that now you can include scripts in these skills that can be executed as part of the skill, in a VM on their servers.
Or is there some type of file limit here. Maybe the context windows just aren't there yet, but it would be really awesome if coding agents would stop trying to make up functions.
I love this per-agent approach and the roll calling. I don’t know why they used a file system instead of MCP though. MCP already covered this and could use the same techniques to improve.
So we just narrow the scope of the each thing but all of this prompt organizing feels like we’ve gone from programming with YAML to now Markdown.
Specifically, it looks like skills are a different structure than mcp, but overlap in what they provide? Skills seem to be just markdown file & then scripts (instead of prompts & tool calls defined in MCP?).
Question I have is why would I use one over the other?
This is the skill description:
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.
What it created was an abstract art museum-esque poster with random shapes and no discernable message. It may have been trying to design a playing card but just failed miserably which is my experience with most AI image generators.
It certainly spent a lot of time, and effort to create the poster. It asked initial questions, developed a plan, did research, created tooling - seems like a waste of "tokens" given how simple and lame the resulting image turned out.
Also after testing I still don't know how to "use" one of these skills in an actual chat.
Building a new one that works well is a project, but then it will scale up as much as you like.
This is bringing some of the advantages of software development to office tasks, but you give up some things like reliable, deterministic results.
https://github.com/RossH3/context-tree - Helps Claude and humans understand complex brownfield codebases through maintained context trees.
Having able to start off with a base skill level is nice tho as humans can't just load into memory like this
While I like the flexibility of deploying your own skills to claude for use org-wide, this really feels like what MCP should be for that use case, or what built-in analysis sandbox should be.
We haven't even gone mainstream with MCP and there are already 10 stand-ins doing roughly the same thing with a different twist.
I would have honestly preferred they called this embedded MCP instead of 'skills'.
You too can win a jackpot by spinning the wheel just like these other anecdotal winners. Pay no attention to your dwindling credits every time you do though.
This is where waiting for this stuff to stablize/standardize, and then writing a "skill" based on an actual RFC or standard protocol makes more sense, IMO. I've been burned too many times building vendor-locked chatbot extensions.
Not mine! I made a few when they first opened it up to devs, but I was trying to use Azure Logic Apps (something like that?) at the time which was supremely slow and finicky with F#, and an exercise in frustration.
OpenAI ships extensions for ChatGPT - that feed more to plug into the consumer experience. Anthropic ships extensions (made for builders) into ClaudeCode - feel more DX.
I'm a little confused.
By I don't understand why it's a thing in Claude Code tho when we already have Claude.md? Could also just point to any .md file in the prompt as preamble but not even needed. https://www.anthropic.com/engineering/claude-code-best-pract...
That concept is also already perfectly specd in the MCP standard right? (Although not super used I think?) https://modelcontextprotocol.io/specification/2025-06-18/ser...
"AI" companies have reached the end of the road when it comes to throwing more data and compute at the problem. The only way now for charts to go up and to the right is to deliver value-added services.
And, to be fair, there's a potentially long and profitable road by doing good engineering work that was needed anyways.
But it should be obvious to anyone within this bubble that this is not the road to "superintelligence" or "AGI". I hope that the hype and false advertising stops soon, so that we can focus on practical applications of this technology, which are numerous.
If you have a large number of skills, you could group them into a smaller number of skills each with subskills. That way not all the (sub)skill descriptions need to be loaded into context.
For example, instead of having a ‘PDF editing’ skill, you can have a ‘file editing’ skill that, when loaded into context, tells the LLM what type of files it can operate on. And then the LLM can ask for the info about how to do stuff with PDF files.
If you can manage to keep structuring slightly intelligent tools so that they compound, seems like AGI is achievable.
That's why the thing everyone is after right now is new ways to make those slight intelligences keep compounding.
Just like repeated multiplication of 1.001 grows indefinitely.
For coding in particular, it would be super-nice if they could just live in a standard location in the repo.
> You can also manually install skills by adding them to ~/.claude/skills.
Im not interested in any system that require me to write a document begging an LLM to follow instructions, only to have it randomly ignore those instructions whenever its convenient.
Putting a list of short blurbs pointing Claude Code at a set of extra, longer sets of CLAUDE.md style information was being used to prevent auto loading that context until it was needed.
Instead of assuming this is just change for the sake of change, it’s actually a nice way to support a usage pattern that many of us found works well already
CLAUDE.md holds about as much weight has the "Classroom Rules" craft posters hanging in most kindergarten classrooms.
Honestly no offense, but for me nothing really changed in the last 12 months. It’s not one particular mistake by a company but everything is just so overhyped with little substance.
Skills to me is basically providing a read-only md file with guidelines. Which can be useful but somehow I don’t use it as maintaining my guidelines is more work then just writing a better prompt.
I’m not sure anymore if all the ai slop and stuff we create is beneficial anymore for us or it’s just creating a low quality problem in the future
The only "reasoning" model was the o1 preview.
We didn't have MCP, but that wasn't a big deal because the models were mostly pretty weak at tool calling anyway.
The DeepSeek moment hadn't happened yet - the best available open weights models were from Mistral and Llama and were nowhere close to the frontier hosted models.
The LLM landscape feels radically different to me now compared to October last year.
Not just Claude Code, but all these tools are just better in generating more slop, which is generating more effort in your codebase in the future. Making it less agile, harder to maintain and harder to extend without breaking.
I still haven’t found a useful usage of MCP for me, if i want tool calling I get a structured response by the AI and then do a normal API call. I don’t need nor want the AI to have access to all these calls it’s just too unreliable.
I’m really just sharing my personal preference as I also prefer a pedal bin over an electric one as there is delay in the later and you have the exchange batteries, whilst the first just always works.
The main issue with AI to me is reliability and all that happens is we give it more and more power. This might work out or stall us.
For me personally I don’t feel much improvement and I cant share the hype anymore, whilst I’m still more then grateful for the opportunity to live at this time and have AI teach me decent skills in a wide range of topics and accelerate my learning curve.
> Claude: Here is how you do it with parralel routes in sveltekit yada yada yad
> Me: Show me the documentation for parallel routes for svelte?
> Claude: You're absolutely right, this is a nextjs feature.
----
> Claude: Does something stupid, not even relevant, completely retarded
> Me: You're retarded, this does not work because of (a), (b), (c)
> Claude: You're absolutely right. Let me fix that. Does same stupid thing, completely ignoring my previous input
...which would be great if the (likely binary) format of that was used internally, but something tells me an architectural screwup will lead to leaking the binaries and we'll have a dependency on a dumb inscrutable binary format to carry forward...
The idea is interesting and something I shall consider for our platform as well.
Yes, this can only end well.
Roo Code just has "modes", and honestly, this is more than enough.
We used to call that a programming language. Here, they are presumably repeatable instructions how to generate stolen code or stolen procedures so users have to think even less or not at all.
"Equipping agents for the real world with Agent Skills" https://www.anthropic.com/engineering/equipping-agents-for-t...
https://simonwillison.net/2025/Oct/16/claude-skills/
But here, it seems more like a diamond shape of information flow: the LLM processes the big task, then prompts are customized (not via LLM) with reference to the Skills, and then the customized prompt is fed yet again to the LLM.
Is that the case?
VSCode recently introduced support nested AGENTS.md which albeit less formal, might overlap:
https://code.visualstudio.com/updates/v1_105#_support-for-ne...
It also means that any tool that knows how to read AGENTS.md could start using skills today.
"if you need to create a PDF file first read the file in skills/pdfs/SKILL.md"
I see skills as something you might use inside of a project. You could have a project called "data analyst" with a bunch of skills for different aspects of that task - how to run a regression, how to export data from MySQL, etc.
They're effectively custom instructions that are unlimited in size and that don't cause performance problems by clogging up the context - since the whole point of skills is they're only read into the context when the LLM needs them.
Currently if a project is 5% or less capacity, it will auto-load all files, so skills also give you a way to avoid that capacity limit. For larger projects, Claude has to search files, which can be unreliable, so skills will again be useful for an explicit "always load this"
no reason not to.