- Markdown as protocol — one stream carrying text, executable code, and data
- Streaming execution — code fences execute statement by statement as they stream in
- A mount() primitive — the agent creates React UIs with full data flow between client, server, and LLM
Let me know what you think!
Right now this uses React for Web but could also see it in the terminal via Ink.
And I love the "freeze" idea — maybe then you could even share the mini app.
I think the key decision for someone implementing a flexible UI system like this is the required level of expressiveness. To me, the chief problem with having agents build custom html pages (as another comment suggested) is far too unconstrained. I've been working with a system of pre-registered blocks and callbacks that are very constrained. I quite like this as a middleground, though it may still be too dynamic for my use case. Will explore a bit more!
You're right that the level of expressiveness is the key design decision. There's a real spectrum:
- pre-registered blocks (safe, predictable)
- code execution with a component library (middle ground)
- full arbitrary code (maximum flexibility).
My approach can slide along that spectrum: you could constrain the agent to only use a specific set of pre-imported components rather than writing arbitrary JSX. The mount() primitive and data flow patterns still work the same way, you just limit what the LLM is allowed to render.
Would love to hear what you learn if you explore it!
I have been working on something with a similar goal:
Combined with a slot mechanism, complex UIs build up progressively — a skeleton appears first, then each section fills in as the LLM generates it.
I wrote a deeper dive on how the streaming execution works technically: https://fabian-kuebler.com/posts/streaming-ts-execution/
I can see the value in early user verification and maybe interrupting the LLM to not proceed on an invalid path but I guess this is customer facing so not as valuable.
"In interactive assistants, that latency makes or breaks the experience." Why ? Because user might just jump off ?
(edited)
Always Show then Ask.
Markdown UI is declarative — you embed predefined widget types in markdown. The LLM picks from a catalog. It's clean and safe, but limited to what the catalog supports.
My approach is code-based — the LLM writes executable TypeScript in markdown code fences, which runs on the server and can render any React UI. It also has server-side state, so the UI can do forms, callbacks, and streaming data — not just display widgets.
````assistant
<Short Summary title>
gemini/3.1-pro - 20260319T050611Z
Response from the assistant
````
with a similar block for tool calling This can be parsed semantically as part of the conversation but also is rendered as regular Markdown code block when needed
Helps me keep AI chats on the filesystem, as a valid document, but also add some more semantic meaning atop of Markdown
So many formats, with different tradeoffs around readable/parsable/comments/etc. I wish there was a "universal" converter. With LLM's sometimes used to edit chat traces, I'd like ingestion from md/yaml, not merely a "render from message json".
So .json `[{"role": "user", "content": "Hi"}` <-> .md ` ```json\n[{"role": "user", "content": "Hi"}` <-> above ` ```user\nHi` <-> `# User\nHi` <-> ` ```chatML\n<|user|>\nHi` <-> .html rendered .md, but with elements like <think> and <file> escaped... etc.
I think we are reinventing HTML from first principles. It's semantic structuring with a meaningful render
As a tradeoff example, yesterday I again tripped on the KISS "CDATA doesn't support HEREDOC-like prefix whitespace removal". So does one indent, compromising payloads where leading ws is significant, or not, confusing humans and llms.
Re reinvention and first principles, aside from engineering tradeoffs, it can be hard to understand design spaces and to be aware of related work. I suspect there's a missing literature to support these, but professional organizations have been AWOL, and research funding dysfunctional. And commercial conflicts of interest. And it's hard. But now coding LLMs are messing with "don't reinvent wheels" payoff tables. Perhaps we'll someday be able to be explicit about design space structure and design choice consequences too. And perhaps we're already getting transformatively more flexible around format extension and interoperation. TFA isn't just a new format - it's a github repo which will help teach LLMs how to do progressive execution of fenced code blocks, making the next format which does this potentially easier to create. "Merge in what X does, but <change request>". Yay?
IIUC, non-meme carcinization is something vaguely like "similar tradeoffs pressure towards similar forms in diverse contexts". LLMs might help us more easily understand tradeoffs, implement forms, and manage diversity?
I’m building an agentic commerce chat that uses MCP-UI and want to start using these new implementations instead of MCP-UI but can’t wrap my head around how button on click and actions work? MCP-UI allows onClick events to work since you’re “hard coding” the UI from the get-go vs relying on AI generating undertemistic JSON and turning that into UI that might be different on every use.
const onRefresh = async () => {
data.loading = true;
data.messages = await loadMessages();
data.loading = false;
};
mount({
data,
callbacks: { onRefresh },
ui: ({ data, callbacks }) => (
<Button onClick={callbacks.onRefresh}>Refresh</Button>
)
});
When the user clicks the button, it invokes the server-side function. The callback fetches fresh data, updates state via reactive proxies, and the UI reflects it — all without triggering a new LLM turn.So the UI is generated dynamically by the LLM, but the interactions are real server-side code, not just display. Forms work the same way — "await form.result" pauses execution until the user submits.
The article has a full walkthrough of the four data flow patterns (forms, live updates, streaming data, callbacks) with demos.
[1] https://github.com/FabianKuebler/fenced/blob/main/packages/l...
My approach is the opposite bet: full code execution instead of tool calls. The agent can build any React UI from scratch with the full power of code — including client-server data flow, callbacks, streaming data.
MDX is a compile-time format for static content. This is a runtime protocol where the LLM writes code that executes as it streams, and the UIs it creates stay connected to the server.
It embodies the whole idea of having data, code and presentation at the same place.
If you're open for contributions I already have an idea for cascading styles system in mind.
Perhaps "WWW SPA document"? Using markdown with highly-progressive fenced blocks?
Hypertext (one word, coined 1960s) is quite a broad category. Subcategory "WWW" could fit, as TFA seems WWW-ish. A markdown document format, and progressive rendering of tags and code, seems HTML-like. Though with greater progressiveness - code blocks with streamed execution rather than merely compilation. The progressive JSON callbacks, React, integrated client and server code execution, and server-side rendering, seem closer to WWW SPA than to HTML. Though SPA files often seem more "source" than "document". And the multiple-page "App"-ness of SPA doesn't fit well. SPA seems a better fit than "full-stack". Perhaps some name analogous to "isomorphic javascript"...?
Maybe one day someone will invent a rounder wheel.
The wheel is what I would call, passé.
Soon we'll be optimizing for minimizing the sides of a wheel (triangles are not the final form here...) /s
[given what CSS has incrementally and inevitably become, it's my ever-firmer belief that DSSSL would've been the right choice in the first place]
What are some of the ugly hacks you've seen that were applied?