I'm just trying to learn this stuff now, so I don't the literature. The "trajectory view" through action space is what makes the most sense to me.
Along these lines, another half-baked pattern I see is kind of a time-lagged translation of stuff from modern stat mech to deep learning/"AI". First it was energy based systems and the complex energy landscape view, a-la spin glasses and boltzmann machines. The "equilibrium" state-space view, concerned with memory and pattern storage/retrieval. Hinton, amit, hopfield, mackay and co.
Now, the trajectory view that started in the 90s with jarzynski and crooks and really bloomed in 2010+ with "stochastic thermodynamics" seems to be a useful lens. The agent stuff is very "nonequilibrium"/ "active"-system coded, in the thermo sense... With the ability to create, modify, and exploit resources (tools/memory) on the fly, there's deep history and path dependence. I see ideas from recent wolpert and co.(Susanne still, crooks again, etc.) w.r.t. thermodynamics of computation providing a kind of through line, all trajectory based. That's all very vague I know, but I recently read the COALA paper and was very enchanted and have been trying to combine what I actually know with this new foreign agent stuff.
It's also very interesting to me how the Italian stat mech school, the parisi family, have continuously put out bangers trying to actually explain machine learning and deep learning success.
I'd love to hear if anyone is thinking along similar lines, or thinks I'm way off track, has paper recs please let me know! Especially papers on the trajectory view of agents.
For a long time now, SWEs seem to have bamboozled into thinkg the only way you can connect different applications together are "integrations" (tightly coupling your app into the bespoke API of another app). I'm very happy somebody finally remembered what protocols are for: reusable communications abstractions that are application-agnostic.
The point of MCP is to be a common communications language, in the same way HTTP is, FTP is, SMTP, IMAP, etc. This is absolutely necessary since you can (and will) use AI for a million different things, but AI has specific kinds of things it might want to communicate with specific considerations. If you haven't yet, read the spec: https://modelcontextprotocol.io/specification/2025-11-25
The reason we have MCP is because early agent designs couldn't run arbitrary CLIs. Once you can run commands, MCP becomes silly.
There is a clear problem that you'd like an "automatic" solution for, but it's not "we don't have a standard protocol that captures every possible API shape", it's "we need a good way to simulate what a CLI does for agents that can't run bash".
For any sufficiently complex set of AI tasks, you will eventually need to invent MCP. The article posted here talks about those cases and reasons. However, there are cases when you should not use MCP, and the article points those out too.
When I first used ChatGPT, I thought, "surely someone has written some kind of POP3 or IMAP plugin for ChatGPT so it can just connect to my mail server and download my mail." Nope; you needed to write a ChatGPT-specific integration for mail, which needed to be approved by ChatGPT, etc. Whereas if they supported any remote MCP server, I could just write an MCP server for mail, and have ChatGPT connect to it, ask it to "/search_mail_for_string" or whatever, and poof, You Have Mail(tm).
The security is so chief that they had no security at all until several versions later when they hastily bolted on OAuth.
MCP is a vibe-codef protocol that rode one of the many AI hype waves where all "design documents" are post-hoc justifications.
If they had made the security spec without waiting for user information they would most certainly have chosen a suboptimal solution.
Considering many popular MCPs have done auth incorrectly, this made me lol
It tries to standardize the auth, messaging, feedback loop where API can't do alone. A CLI app can do for sure but we are talking about a standard maybe the way is something like mcpcli that you can install your phone but still would you really prefer installing bunch of application to your personal device?
Some points that MCP is still not good as of today:
- It does not have a standard to manage context in a good way. You have to find your hack. The mostly accepted one search, add/rm tool. Another one is cataloging the tools.
- lack of client tooling to support elicitation on many clients (it really hurts productivity but this is not solved with cli too)
- lack of mcp-ui adoption (mcp-ui vs openai mcp app)
I would suggest keep building to help you and your users. I am not sponsor of MCP, just sharing my personal opinion. I am also creator HasCLI but kindly biased for MCP then CLI in terms of coverage and standardization.
Namely, two very useful features resources and prompts have varying levels of support across clients (Codex being one of the worst).
These two are possibly the most powerful ones since they allow consistent, org-level remote delivery of context and I would like to see all major clients support these two and eventually catch up on the other features like elicitation, progress, tasks, etc.
If it tried to do that, you wouldn't have the pain point list.
It's a vibe coded protocol that keeps using one-directional protocols for bi-directional communication, invents its own terms for existing stuff (elicitation lol), didn't even have any auth at the beginnig etc.
It's a slow sad way of calling APIs with JSON-RPC
That is mcp for coding agents is dumb. For gui apps installed at non-cli using non-permissive endpoints it makes a lot of sense.
I agree mcp is bad as a protocol and likely not what solves the problem long term. But clearly the cli focus is an artifact of coding agents being the tip of the iceberg that we are seeing for llm agent use cases.
Have you tried to use a random API before? It’s a process of trial and error.
With the MCP tools I use, it works the first time and every time. There is no “figuring out.”
CLI works for both agents and technical people. REST API works for both agents and technical people. MCP works only for agents (unless I can curl to it, there are some HTTP based ones)
This actually isn't true. I've written bespoke CLI tools for my small business and non-technical people run them without issue. They get intimidated at first but within a day or so they're completely used to it - it's basically just magic incantations on a black box.
I’ve used that approach to get non-technical near-retirees as early adopters of command line tooling (version control and internal apps). A semantic layer to the effect of ‘make-docs, share-docs, get-newest-app, announce-new-app-version’.
The users saw a desktop folder with big buttons to double click. Errors opened up an email to devs/support with full details (minimizing error communication errors and time to fix). A few minutes of training, expanded and refined to meet individual needs, and our accountants & SME’s loved SVN/Git. And the discussion was all about process and needs, not about tooling or associated mental models.
My company is currently trying to rollout shared MCPs and skills throughout the company. The engineers who have been using AI tools for the past 1-2 years have few, if any, issues. The designers, product managers, and others have numerous issues.
Having a single MCP gateway with very clear instructions for connecting to Claude Desktop and authenticating with Google eliminates numerous problems that would arise from installing and authenticating a CLI.
The MCP is also available on mobile devices. I can jot down ideas and interact with real data with Claude iOS and the remote MCP. Can’t do that with a CLI.
Back to the MCP debate, in a world where most web apis have a schema endpoint, their own authentication and authorization mechanisms, and in many instances easy to install clients in the form of CLIs … why do we need a new protocol, a new server, a new whatever. KISS
> OP never mentioned letting the agent run as him or use his secrets
That is implicit with a CLI because it is being invoked in the user session unless the session itself has been sandboxed first. Then for the CLI to access a protected resource, it would of course need API keys or access tokens. Sure, a user could set up a sandbox and could provision agent-specific keys, but everyone could always enable 2FA, pick strong passwords, use authenticators, etc . and every org would have perfect security.That's not reality.
CLI sandboxing is a solved problem compared to whatever MCP is.
If you can tell ahead of time what external connectors you need and you're already sandboxing then by all means go with CLIs, if you can't then MCP is literally the only economical and ergonomic solution as it stands today.
> ...people assume the way they are using AI is universal
This is what led me back to MCP. Our team is using Claude CLI, Claude VSCX, Codex, OpenCode, GCHP, and we need to support GH Agents in GH Actions.We wanted telemetry and observability to see how agents are using tool and docs.
There's no sane way to do this as an org without MCP unless we standardize and enforce a specific toolset/harness that we wrap with telemetry. And no one wants that.
That sounds like a hack to get around the lack of MCP. If your goal is to expose your tools through an interface that a coding agent can easily parse and use, what compels you to believe throwing amorphous structured text is a better fit than exposing it through a protocol specially designed to provide context to a model?
> The reason we have MCP is because early agent designs couldn't run arbitrary CLIs. Once you can run commands, MCP becomes silly.
I think you got it backwards. Early agents couldn't handle, and the problem was solved with the introduction of an interface that models can easily handle. It became a solved problem. Now you only argue that if today's models work hard enough, they can be willed into doing something with tools without requiring a MCP. That's neat, but a silly way to reinvent the wheel - poorly.
The agents are writing the mcps, so they can figure out those http and ftp calls. MCP makes it so they dont have to every time they want to do something.
I wouldnt hire a new person to read a manual and then make a bespoke json to call an http server, every single time i want to make a call, and thats not a knock on the person's intelligence. Its just a waste of time doing the same work over and over again. I want the results of calling the API, not to spend all my time figuring out how to call the API
Obviously if the self-modifying, Clawd-native development thing catches on, any old API will work. (Preferably documented but that’s not a hard requirement.)
For now though, Anthropic doesn’t host a clawd for you, so there isn’t yet a good way for it to persist customs integrations.
each ai need context management per conversation this is something that would be very clunky to replicate on top of http or ftp (as in requiring side channel information due session and conversation management)
Everyone looks at api and sure mcp seem redundant there but look at agent driving a browser the get dom method depends on all the action performed from when the window opened and it needs to be per agent per conversation
Can you do that as rest sure sneak a session and conversation in a parameter or cookie but then the protocol is not really just http is it it's all this clunky coupling that comes with a side of unknowns like when is a conversation finished did the client terminate or were just between messages and as you go and solve these for the hundredth time you'd start itching for standardization
It makes it part of the protocol so the llm doesn't have to handle it, which is brittle
And look at the patent post I've replied to choice of protocol, I'd like to see a session token over ftp where you need to track the current folder per conversation.
It makes it harder for the LLM to understand what’s going on, not easier.
It is not a guarantee (as we see with structured output schemas), but it significantly increases compliance.
So why MCP? Are there other protocols that will provide more correctness when trained? Have we tried? Maybe a protocol that offers more compression of commands will overall take up more context, thus offering better correctness.
MCP seems arbitrary as a protocol, because it kinda is. It doesn't >>cause<< the increase in correctness in of itself, the fact that it >>is<< a protocol is the reason it may increase correctness. Thus, any other protocol would do the same thing.
With all due respect if you are prompting correctly and following approaches such as TDD / extensive testing then correctness is not out the window. That is a misunderstanding likely caused by older versions of these models.
Correctness can be as complete as any other new code, I've used the AI to port algorithms from Python to Rust which I've then tested against math oracles and published examples. Not only can I check my code mathematically but in several instances I've found and fixed subtle bugs upstream. Even in well reviewed code that has been around for many years and is well used. It is simply a tool.
> So why MCP? ... MCP seems arbitrary as a protocol
You're right, it is an arbitrary protocol, but it's one that is supported by the industry.See the screencaps at the end of the post that show why this protocol. Maybe one day, we will get a better protocol. But that day is not today; today we have MCP.
One simple reason is "determinism". If you ask the AI to "just figure it out", it will do that in different ways and you won't have a reliable experience. The protocol provides AI a way to do this without guessing or working in different ways, because the server does all the work, deterministically.
But the second reason is, all the other reasons. There is a lot in the specification, that the AI literally cannot figure out, because it would require custom integration with every application and system. MCP is also a client/server distributed system, which "calling a tool" is not, so it does stuff that is impossible to do on your existing system, without setting up a whole other system... a system like MCP. And all this applies to both the clients, and the servers.
Here's another way to think of it. The AI is a psychopath in prison. You want the psycho to pick up your laundry. Do you hand the psycho the keys to your car? Or do you hand him a phone, where he can call someone who is in charge of your car? Now the psycho doesn't need to know how to drive a car, and he can't drive it off a bridge. All he can do is talk to your driver and tell him where to go. And your driver will definitely not drive off a bridge or stab anyone. And this works for planes, trains, boats, etc, just by adding a phone in between.
the point is, is it necessary to create a new protocol?
Why did we have to invent an entire new transport protocol for this, when the only stated purpose is documentation?
Even the auth is just OAuth.
It’s JSON-RPC plus OAuth.
(Plus a couple bits around managing a local server lifecycle.)
There's nothing more standard or reusable or application-agnostic about it than using an API over any of the existing protocols.
Let's say you use Claude's chat interface. How can you make Claude connect to, say, the lights in your house?
Without MCP, you would need Anthropic the company to add support to Claude the web interface to connect over a network to your home, use some custom routing software (that you don't have) to communicate over whatever lightbulb-specific IoT protocol your bulbs use, to be able to control them. Claude needs to support your specific lightbulb stack, and some kind of routing software would need to be added in your home to connect the external network to the internal devices.
But with MCP, Claude only has to support MCP. They don't have to know anything about your lightbulbs or have some custom routing thing for your home. You just need to run an MCP server that talks to the lightbulbs... which the lightbulb company should make and publish, so you don't have to do anything but download the lightbulb MCP server and run it. Now Claude can talk to your lightbulbs, and neither you nor Claude had to do any extra work.
In addition to the communication, there is also asynchronous task control features, AI-specific features, security features, etc that are all necessary for AI work. All this is baked into MCP.
This is the power of standardized communications abstractions. It's why everyone uses HTTP and doesn't have their own custom application-specific tcp-server-language. The world wide web would just be 10 websites.
You can probably let llm guess the help flag and try to parse help message. But the success rate is totally depends on model you are using.
`-h` is also popular, but there is also possible issue of that shorthand, hence `--help`.
and windows mostly use \? and also \h,
java user single - for long argument because it don't have short one.
I doubt it is ever close to reusable.
And even allowed position of parameters (or even meaning of arguments in case of ffmpeg) are program dependent.
Some allow anywhere as long as it is started with a dash, some only allow before first input
This is one key difference between experienced and inexperienced devs; if something looks like crud, it probably is crud. Don’t follow or do something because it’s popular at the time.
I got it to build an MCP server into the app that supported sending commands to allow Claude to interact with it as if it was a user, including keypresses and grabbing screenshots, and the difference was immediate and really beneficial.
Visual issues were previously one of the things it would tend to struggle with.
> Claude imolement plan.md until all unit and browser tests pass
In my case I started with something somewhat like Playwright, and claude had a habit of interacting with the app more directly than a user would be able to and so not spotting problems because of it. Forcing it to interact by pressing keys rather than delving into the dom or executing random javascript helped. In particular I wanted to be able to chat with it as it tried things interactively. This is more to help with manual tests or exploratory testing rather than classic automated testing.
My current app is a desktop app, so playwright isn't as applicable.
I code in 8 languages, regularly, for several open source and industry projects.
I use AI a lot nowadays, but have never ever interacted with an MCP server.
I have no idea what I'm missing. I am very interested in learning more about what do you use it for.
I made a prolog program that knows the valid words and spelling along with sentence conposition rules.
Via the MCP server a translated text can be verified. If its not faultless the agent enters a feedback loop until it is.
The nice thing is that it's implemented once and I can use it in opencode and claude without having to explain how to run the prolog program, etc.
> I have no idea what I'm missing.
The questions I'd ask: - Do you work in a team context of 10+ engineers?
- Do you all use different agent harnesses?
- Do you need to support the same behavior in ephemeral runtimes (GH Agents in Actions)?
- Do you need to share common "canonical" docs across multiple repos?
- Is it your objective to ensure a higher baseline of quality and output across the eng org?
- Would your workload benefit from telemetry and visibility into tool activation?
If none of those apply, then it's not for you. Server hosted MCP over streamable HTTP benefits orgs and teams and has virtually no benefit for individuals.You are right: MCP tools are in essence OpenAPI specs with some niceties like standardized progress reporting. But MCP is more than tools.
More concretely, you can have an installable (and updatable) skills that will teach the agents how to use your api and will come with slash commands.
What you cannot do with an mcp is pipe the output into standard tools (jq, head, etc...) or create scripts around it, etc.
I have been working on a system using a Fjall datastore in Rust. I haven't found any tools that directly integrate with Fjall so even getting insight into what data is there, being able to remove it etc is hard so I have used https://github.com/modelcontextprotocol/rust-sdk to create a thin CRUD MCP. The AI can use this to create fixtures, check if things are working how they should or debug things e.g. if a query is returning incorrect results and I tell the AI it can quickly check to see if it is a datastore issue or a query layer issue.
Another example is I have a simulator that lets me create test entities and exercise my system. The AI with an MCP server is very good at exercising the platform this way. It also lets me interact with it using plain english even when the API surface isn't directly designed for human use: "Create a scenario that lets us exercise the bug we think we have just fixed and prove it is fixed, create other scenarios you think might trigger other bugs or prove our fix is only partial"
One more example is I have an Overmind style task runner that reads a file, starts up every service in a microservice architecture, can restart them, can see their log output, can check if they can communicate with the other services etc. Not dissimilar to how the AI can use Docker but without Docker to get max performance both during compilation and usage.
Last example is using off the shelf MCP for VCS servers like Github or Gitlab. It can look at issues, update descriptions, comment, code review. This is very useful for your own projects but even more useful for other peoples: "Use the MCP tool to see if anyone else is encountering similar bugs to what we just encountered"
the AI gets to do two things:
- expose hidden state - do interactions with the app, and see before/after/errors
it gives more time where the LLM can verify its own work without you needing to step in. Its also a bit more integration test-y than unit.
if you were to add one mcp, make it Playwright or some similar browser automation mcp. Very little has value add over just being able to control a browser
A static set of tools is safer and more reliable.
the agent sees tools as allowed or not by the harness/your mcp config.
For the most part, the same company that you're connecting to is providing the mcp, so its not having your data go to random places, but you can also just write your own. its fairly thin wrappers of a bit of code to call the remote service, and a bit of documentation of when/what/why to do so
The failure mode is turning taste into a religion. If you never touch anything that looks crude on day one, you also miss the occasional weird thing that later becomes boring infra.
This is quite literally the opposite opinion I and many others had when first exploring MCP. It's so _obviously_ simple, which is why it gained traction in the first place.
Do you not expose an mcp endpoint? Literally every vscode or opencode node gets it for free (a small json snippet in their mcp.json config) If you do auth right
We can plug in MCP almost anywhere with just a small snippet of JSON and because we're serving it from a server, we get very clear telemetry regardless of tooling and envrionment.
So what’s the best centralized gateway available today, with telemetry and auth and all the goodness espoused in this blog post?
MCP is effectively "just another HTTP REST API"; OAuth and everything. The key parts of the protocol is the communication shape and sequence with the client, which most SDKs abstract for you.
The SDKs for MCPs make it very straightforward to do so now and I would recommend experimenting with them. It is as easy to deploy as any REST API.
https://docs.aws.amazon.com/whitepapers/latest/overview-depl...
it should be part of your app and coordinated in a way that everyone in the enterprise can find all the available mcps. Like backstage or something
However, MCP is context bloat and not very good compared to CLIs + skills mechanically. With a CLI you get the ability to filter/pipe (regular Unix bash) without having to expand the entire tool call every single time in context.
CLIs also let you use heredoc for complex inputs that are otherwise hard to escape.
CLIs can easily generate skills from the —help output, and add agent specific instructions on top. That means you can give the agent all the instructions it needs to know how to use the tools, what tools exist, lazy loaded, and without bloating the context window with all the tools upfront (yes, I know tool search in Claude partially solves this).
CLIs also don’t have to run persistent processes like MCP but can if needed
It’s much easier for users to find what exactly a model can do with your app over it compared to building a skill that would work with it since clients can display every tool available to the user. There’s also no need for the model to setup any environment since it’s essentially just writing out a function, which saves time since there’s no need to setup as many virtual machine instructions.
It obviously isn’t as useful in development environments where a higher level of risk can be accepted since changes can always be rolled back in the repository.
If I recall correctly, there’s even a whole system for MCP being built, so it can actually show responses in a GUI much like Siri and the Google Assistant can.
> If I recall correctly, there’s even a whole system for MCP being built, so it can actually show responses in a GUI much like Siri and the Google Assistant can
That's MCP progress spec: https://modelcontextprotocol.io/specification/2025-11-25/bas...1. Documenting the interface without MCP. This problem is best solved by the use of Skills which can contain instructions for both CLIs and APIs (or any other integration). Agents only load the relevant details when needed. This also makes it easy to customize the docs for the specific cases you are working with and build skills that use a subset of the tools.
2. Regarding all of the centralization benefits attributed to remote MCPs - you can get the same benefits with a traditional centralized proxy as well. MCP doesn't inherently grant you any of those benefits. If I use AWS sso via CLI, boom all of my permissions are tied to my account, benefit from central management, and have all the observability benefits.
In my mind, use Skills to document what to do and benefit from targeted progressive disclosure, and use CLIs and REST APIs for the actual interaction with services.
> This problem is best solved by the use of Skills which can contain instructions for both CLIs and APIs
You've just reversed the context benefits because the content of the skill...goes into context. > ...you can get the same benefits with a traditional centralized proxy as well. MCP doesn't inherently grant you any of those benefits.
You've just rebuilt MCP...but bespoke, unstructured, and does not plug into industry tooling. MCP prompts are activated as `/` (slash) commands. MCP resources are activated as `@` (at) references. You can't do this with a proxy.See the three .gifs at the end of the post to see how clients use MCP prompts and resources and definitely check the specification for these two.
Challenges we are solving with centralised MCP are around brand guardianship, tone of voice, internal jargon and domain context, access to common data sources, and via the resources methods in MCP access to “skills” that prescribe patterns and shims for expected paths and ways of connecting/extracting data.
tptacek nailed it - once agents run bash, MCP is overhead. the security argument is weird too, it shipped without auth and now claims security as chief benefit. chroot jails and scoped tokens solved this decades ago.
only place MCP wins is oauth flows for non-technical users who will never open a terminal. for dev tooling? just write better CLIs.
In v0, people can add e.g. Supabase, Neon, or Stripe to their projects with one click. We then auto-connect and auth to the integration’s remote MCP server on behalf of the user.
v0 can then use the tools the integration provider wants users to have, on behalf of the user, with no additional configuration. Query tables, run migrations, whatever. Zero maintenance burden on the team to manage the tools. And if users want to bring their own remote MCPs, that works via the same code path.
We also use various optimizations like a search_tools tool to avoid overfilling context
Email sent from a human's account on behalf of an agent is a different legal and reputational thing than email sent from the agent's own address. If the agent makes a mistake, takes an action, or enters into a relationship — whose name is on it? Right now the answer is almost always "the human's", which means agents can't really be held accountable as entities.
The deeper issue MCP hasn't addressed is that auth was built for users, not agents. OAuth gives agents delegated access. But delegation isn't identity. An agent with delegated Gmail access is acting as a deputy. An agent with its own email address and phone number is acting as a first-class participant.
Some things you want the deputy model (browsing the web, reading your calendar). Some things need a distinct identity — outreach, commitments, anything where attribution matters downstream. Those two cases need different infrastructure.
IMO, by default MCP tools should run in forked context. Only a compacted version of the tool response should be returned to the main context. This costs tokens yes, but doesn't blow out your entire context.
If other information is required post-hoc, the full response can be explored on disk.
And it's also affected by how model is trained. Gemini specifically like to read large amount of text data directly and explodes the context. But claude try to use tool for partial search or write a script to sample from a very large file. Gemini always fills the context way faster then claude when doing the same job.
But I guess in case of a bad designed mcp, there is no much model can do because the results are injected into context directly though (unless the runtime decided to redirect it to somewhere else)
This pattern works well with specialized tool sets in general.
I'd recommend that you take a peek at MCP prompts and resources spec and understand the purpose that these two serve and how they plug into agent harnesses.
> I'd recommend that you take a peek at MCP prompts and resources spec
Don't assume that if somebody does not like something they don't know what it is. MCP makes happy developers that need the illusion of "hooking" things into the agent, but it does not make LLMs happy.
Then I have a troubleshooting file (also linked from the main SKILL file) which basically lists out all the 'gotchas' that are unique to my platform and thus the LLM may struggle with in complex scenarios.
After a lot of testing, I identified just 5 gotchas and wrote a short section for each one. The title of each section describes the issue and lists out possible causes with a brief explanation of the underlying mechanism and an example solution.
Adding the troubleshooting file was a game changer.
If it runs into a tricky issue, it checks that troubleshooting file. It's highly effective. It made the whole experience seamless and foolproof.
My platform was designed to reduce applications down to HTML tags which stream data to each other so the goal is low token count and no-debugging.
I basically replaced debugging with troubleshooting; the 5 cases I mentioned are literally all that was left. It seems to be able to quickly assemble any app without bugs now.
The 'gotchas' are not exactly bugs but more like "Why doesn't this value update in realtime?" kind of issues. They involve performance/scalability optimizations that the LLM needs to be aware of.
But it's putting a lot of trust in the remote server not to prompt-inject you, perhaps accidentally. Also, what if the remote docs don't suit local conditions? You could make local edits to a skill if needed.
Better to avoid depending on a remote API when a local tool will do.
Most folks are familiar with MCP tools but not so much MCP resources[0] and MCP prompts[1]. I'd make the case that these latter two are way more powerful and significant because (most) tools support them (to varying degrees at the moment, to be fair).
For teams/orgs, these are really powerful because they simplify delivery of skills and docs and moves them out of the repo (yes, there are benefits to this, especially when the content is applicable across multiple repos) on top of surfacing telemetry that informs usage and efficacy.
Why would you do it? One reason is that now you can index your docs with more powerful tools. Postgres FTS, graph databases to build a knowledge base, extract code snippets and build a best practices snippet repo, automatically link related documents by using search, etc.
[0] https://modelcontextprotocol.io/specification/2025-06-18/ser...
[1] https://modelcontextprotocol.io/specification/2025-06-18/ser...
Garry Tan and the influencer take is focused on vibe coding whereas orgs need things like auth and telemetry.
If I use a remote MCP or CLI that relies on network calls, and I give it in the hands of my coding assistant, wouldn't be too easy to inject prompts and exfiltrate data from my machine?
At least MCP don't have direct access to my machine, but CLIs do.
1. You can make the script very specific for the skill and permission appropriately.
2. You can have the output of the script make clear to the LLM what to do. Lint fails? "Lint rules have failed. This is an important for reasons blah blah and you should do X before proceeding". Otherwise the Agent is too focused on smashing out the overall task and might opt route around the error. Note you can use this for successful cases too.
3. The output and token usage can be very specific what the agent needs. Saves context. My github comments script really just gives the comments + the necessary metadata, not much else.
The downsides of MCP all focus on (3), but the 1+2 can be really important too.
> (I preface that this is primarily relevant for orgs and enterprises; it really has no relevance for individual vibe-coders)
The thing about tools that "democratize" software development, whether it is Visual Studio/Delphi/QT or LLMs, is that you wind up with people in organizations building internal tools on which business processes will depend who do not understand that centralization is key. They will build these tools in ignorance of the necessity of centralization-centric approaches (APIs, MCP, etc.) and create Byzantine architectures revolving around file transfers, with increasing epicycles to try to overcome the pitfalls of such an approach.
Once you have 10-20 people using agents in wildly different ways getting wildly different results, the question of "how do I baseline the capabilities across my team?" becomes very real.
In our team, we want to let every dev use the agent harness that they are comfortable with and that means we need a standard mechanism of delivering standard capabilities, config, and content across the org.
I don't see it as democratization versus corporate facism in so much as it is "can we get consistent output from developers of varying degrees of skill using these agents in different ways?"
The agent can only perform the operations it has been expressly given tools to perform, and its invocation of those tools can be audited and otherwise governed.
Whether MCP evolves to fulfill this role effectively, time will tell.
More than 200% growth in official MCP servers in past 6 months: https://bloomberry.com/blog/we-analyzed-1400-mcp-servers-her...
A local mcp doesn't come in play because they just couldn't offer the same features in this case.
So when you run it, your codign agent is using AI to run that code (what to call, what parameters to pass, and so on). Via MCP, they don't pay any LLM cost; they just offer the code and the endpoint.
But this is usually messy for the coding agent since it fills up the context. While if you use skill + API, it's easier for the agent since there's no code in the context, just how to call the API and what to pass.
With something like this, you can then have very complex things happening in the endpoint without the agent worrying about context rot or being able to deal with that functionality.
But to have that difficult functionality, you also need to call an LLM inside the endpoint, which is problematic if the person offering the MCP service does not want to cover LLM costs.
So it does matter if it's an endpoint or an MCP because the agent is able to do more complex and robust stuff if it uses skill and HTTP.
I wrote a little bit about this a while ago: https://sibylline.dev/articles/2026-03-01-mcp-changed-my-min...
I created an example repo demonstrating this pattern and how it can be used at https://github.com/sibyllinesoft/smith-gateway
Biggest pain point is reliability: connections drop, tools fail silently, no good way to know if a call actually reached the server.
But the article's "just HTTP with extra steps" framing misses the point. The value is the standardized tool interface. Before MCP, every AI integration was a bespoke wrapper. A shared vocabulary for "here's a tool, here's its schema, call it" is genuinely useful, rough edges and all.
https://mariozechner.at/posts/2025-11-30-pi-coding-agent/#to...
Being 4.5 months behind the trend has its advantage. ;-)
Why? Because when you pair output schema with CodeAct agents (agents that reason and act by writing executable code rather than natural language, like smolagents by Hugging Face), you solve some of the most painful problems in agentic tool use:
1. Context window waste: Without output schema, agents have to call a tool, dump the raw output (often massive JSON blobs) into the context window, inspect it, and only then write code to handle it. That "print-and-inspect" pattern burns tokens and attention on data the agent shouldn't need to explore in the first place.
2. Roundtrip overhead: Writing large payloads back into tools has the same problem in reverse. Structured schemas on both input and output let the agent plan a precise, single-step program instead of fumbling through multiple exploratory turns.
There's a blog post on Hugging Face that demonstrates this concretely using smolagents: https://huggingface.co/blog/llchahn/ai-agents-output-schema
And the industry is clearly converging on this pattern. Cloudflare built their "Code Mode" around the same idea (https://blog.cloudflare.com/code-mode/), converting MCP tools into a TypeScript API and having the LLM write code against it rather than calling tools directly. Their core finding: LLMs are better at writing code to call MCP than at calling MCP directly. Anthropic followed with "Programmatic tool calling" (https://www.anthropic.com/engineering/code-execution-with-mc..., https://platform.claude.com/docs/en/agents-and-tools/tool-us...), where Claude writes Python code that calls tools inside a code execution container. Tool results from programmatic calls are not added to Claude's context window, only the final code output is. They report up to 98.7% token savings in some workflows.
So the point here is: MCP isn't just valuable for the centralization, auth, and telemetry story the author laid out (which I fully agree with). The protocol itself, specifically its structured schema capabilities, directly enables more efficient and reliable agentic workflows. That's a concrete technical advantage that CLIs simply don't offer, and it's one more reason MCP will stick around.
Long live MCP indeed.
The CLI are executed by the coding assistants in the project directory, which means that they can get implicit information from there (e.g. git branch and commit)
With an MCP you would need a prepare step to gather that, making things slower.
Great article otherwise. I've been wondering why people are so zealous about MCP vs executable tools, and it looks like it's just tradeoffs between implementation differences to me.
But fundamentally that doesn’t make sense. If an AI needs to be fed instructions or schemas (context) to understand how to use something via MCP, wouldn’t it need the same things via CLI? How could it not? This article points that out, to be clear. But what I’m calling out is how simple it is to determine for yourself that this isn’t an MCP versus CLI battle. However, most people seem to be falling for this narrative just because it’s the new hot thing to claim (“MCP is dead, Long Live CLI”).
As for Google - they previously said they are going to support MCP. And they’ve rolled out that support even recently (example from a quick search: https://cloud.google.com/blog/products/ai-machine-learning/a...). But now with the Google Workspace CLI and the existence of “Gemini CLI Extensions” (https://geminicli.com/extensions/about/), it seems like they may be trying to diminish MCP and push their own CLI-centric extension strategy. The fact that Gemini CLI Extensions can also reference MCP feels a lot like Microsoft’s Embrace, Extend, Extinguish play.
Just follow the widely accepted pattern (all you need 3 tools in front): - listTools - List/search tools - getToolDetails - Get input arguments for the given tool name - execTool - Execute given tool name with input arguments
HasMCP - Remote MCP framework follows/allows this pattern.
Or...just don't slam 100 tools into your agent in the first place.
But I can do them with CLI so that's a negative for MCP?
100 MCP tools will bloat the context whereas 100 CLI's won't. Which part do you disagree with?
2. The part where you think your agent is going to know how to use 100 CLI tools that are not already in its training dataset without using extra turns walking the help content to dump out command names and schemas
3. The part where, without a schema defining the inputs, the LLM wastes iterations trying to correct the input format.
4. The part where, not having the full picture of the tools, your odds of it picking the same tools or the right tools is completely gambling that it outputs the right keywords to trigger the tool to be used.
5. The part where you forgot to mention that for your agent to know that your 100 CLI tools exist, you had to either provide it in context directly, provide it in context in a README.md, or have it output the directory listing and send that off to the LLM to evaluate before picking the tool and then possibly expanding the man pages for several tools and sub commands using several turns.
Don't get me wrong, CLIs are great if its already in the LLMs training set (`git`, for example). Not so great if it's not because it will need to walk the man pages anyways.
I'm not sure how that solves the issue. The shape of each individual tool will be different enough that you will need different schema - something you will be passing each time in MCP and something you can avoid in CLI. Also, CLI's can also be flexible.
> The part where you think your agent is going to know how to use 100 CLI tools that are not already in its training dataset without using extra turns walking the help content to dump out command names and schemas
By CLI's we mean SKILLS.md so it won't require this hop.
> The part where, without a schema defining the inputs, the LLM wastes iterations trying to correct the input format.
What do we lose by one iteration? We lose a lot by passing all the tool shapes on each turn.
> The part where, not having the full picture of the tools, your odds of it picking the same tools or the right tools is completely gambling that it outputs the right keywords to trigger the tool to be used.
we will use skills
> The part where you forgot to mention that for your agent to know that your 100 CLI tools exist, you had to either provide it in context directly, provide it in context in a README.md, or have it output the directory listing and send that off to the LLM to evaluate before picking the tool and then possibly expanding the man pages for several tools and sub commands using several turns.
skills
https://www.anthropic.com/engineering/code-execution-with-mc
I hear everyone talking about skills, but I this something I should use skills for?
This is what the skill file is for.
>Centralizing this behind MCP allows each developer to authenticate via OAuth to the MCP server and sensitive API keys and secrets can be controlled behind the server
This doesn't require MCP. Nothing is stopping you from creating a service to proxy requests from a CLI.
The problem with this article is it doesn't recognize that skills is a more general superset compared with MCP. Anything done with MCP could have an equivalent done with a skill.
This is one of the first posts that I've see that cuts through the hype against both MCPs and CLIs with nuance findings.
There were times where it didn't make sense for using MCPs (such as connecting it to a database) and CLIs don't make sense at all for suddenly generating them for everything. It just seems like the use-case was a solution in search of a problem on top of a bad standard.
But no-one could answer "who" was the customer of each of these, which is why the hype was unjustified.
The MCP spec allows MCP servers to send back images to clients (base64-encoded, some json schema). However:
1) codex truncates MCP responses, so it will never receive images at all. This bug has been in existence forever.
2) Claude Code CLI will not pass those resulting images through its multi-modal visual understanding. Indeed, it will create an entirely false hallucination if asked to describe said images.
3) No LLM harness can deal with you bouncing your local MCP server. All require you to restart the harness. None allow reconnection to the MCP server.
I assure you there are many other similar bugs, whose presence makes me think that the LLM companies really don't like MCP, and are bugly-deprecating it.
The fundamental proposal here is that despite being bad MCP is the correct choice for Enterprise because:
> Organizations need architectures and processes that start to move beyond cowboy, vibe-coding culture to organizationally aligned agentic engineering practices. And for that, MCP is the right tool for orgs and enterprises.
…but, you can distill this to: the “cowboys” are off MCP because they've moved to yolo openclaw, where anything goes and there are no rules, no restrictions and no auditing.
…but thats a strawman from the twatter hype train.
Enterprises are not adopting openclaw.
It’s not “MCP or Openclaw”.
Thats a false dichotomy.
The correct question is: has MCP delivered the actual enterprise value and actual benefits it promised?
Or, were those empty promises?
Does the truely stupid MCP ui proposal actually work in practice?
Or, like the security and auditing, is it a disaster in practice, which was never really thought through carefully by the original authors?
It seems to me, that vendors are increasingly determining that controlled AI integrations with rbac are the correct way forward, but MCP has failed to deliver that.
Thats why MCP is dying off.
…because an open plugin ecosystem gives you broken crap like the Atlassian MCP server, and a bunch of maybe maybe 3rd party hacks.
Thats not what enterprises want, for all the reasons in the article.
It provides a unified way to connect tools (whether local via stdio or remote via HTTP), handles bidirectional JSON-RPC communication natively, and forces tools to be explicit about their capabilities, which is exactly what you want for managing LLM context and agentic workflows.
This current anti-MCP hype train feels highly reminiscent of the recent phase where people started badmouthing JSON in favor of the latest niche markup language. It’s just hype driven contrarianism trying to reinvent the wheel.
We're yet to genuinely standardise bloody help texts for basic commands (Does -h set the hostname, or does it print the help text? Or is it -H? Does --help exist?). Writing man-pages seems like a lost art at this point, everyone points to $WEBSITE/docs (which contains, as you guessed, LLM slopdocs).
We're gonna end up seeing the same loops of "Modern standard for AI" -> "Standard for AI" -> "Not even a standard" -> "Thing of the past" because all of it is fundamentally wrong to an extent. LLMs are purely textual in context, while network protocols are more intricate by pure nature. An LLM will always and always end up overspeccing a /api/v1/ping endpoint while ICMP ping can do that within bits. Text-based engineering, while visible (in the sense that a tech-illiterate person will find it easy to interpret), will always end up forming abstractions over core - you'll end up with a shaky pyramid that collapses the moment your $LLM model changes encodings.