- Mog is a statically typed, compiled, embedded language (think statically typed Lua) designed to be written by LLMs -- the full spec fits in 3,200 tokens. - An AI agent writes a Mog program, compiles it, and dynamically loads it as a plugin, script, or hook. - The host controls exactly which functions a Mog program can call (capability-based permissions), so permissions propagate from agent to agent-written code. - Compiled to native code for low-latency plugin execution -- no interpreter overhead, no JIT, no process startup cost. - The compiler is written in safe Rust so the entire toolchain can be audited for security. Even without a full security audit, Mog is already useful for agents extending themselves with their own code. - MIT licensed, contributions welcome.
Motivations for Mog:
1. Syntax Only an AI Could Love: Mog is written for AIs to write, so the spec fits easily in context (~3200 tokens), and it's intended to minimize foot-guns to lower the error rate when generating Mog code. This is why Mog has no operator precedence: non-associative operations have to use parentheses, e.g. (a + b) * c. It's also why there's no implicit type coercion, which I've found over the decades to be an annoying source of runtime bugs. There's also less support in Mog for generics, and there's absolutely no support for metaprogramming, macros, or syntactic abstraction.
When asking people to write code in a language, these restrictions could be onerous. But LLMs don't care, and the less expressivity you trust them with, the better.
2. Capabilities-Based Permissionsl: There's a paradox with existing security models for AI agents. If you give an agent like OpenClaw unfettered access to your data, that's insecure and you'll get pwned. But if you sandbox it, it can't do most of what you want. Worse, if you run scripts the agent wrote, those scripts don't inherit the permissions that constrain the agent's own bash tool calls, which leads to pwnage and other chaos. And that's not even assuming you run one of the many OpenClaw plugins with malware.
Mog tries to solve this by taking inspiration from embedded languages. It compiles all the way to machine code, ahead of time, but the compiler doesn't output any dangerous code (at least it shouldn't -- Mog is quite new, so that could still be buggy). This allows a host program, such as an AI agent, to generate Mog source code, compile it, and load it into itself using dlopen(), while maintaining security guarantees.
The main trick is that a Mog program on its own can't do much. It has no direct access to syscalls, libc, or memory. It can basically call functions, do heap allocations (but only within the arena the host gives it), and return something. If the host wants the Mog program to be able to do I/O, it has to supply the functions that the Mog program will call. A core invariant is that a Mog program should never be able to crash the host program, corrupt its state, or consume more resources than the host allows.
This allows the host to inspect the arguments to any potentially dangerous operation that the Mog program attempts, since it's code that runs in the host. For example, a host agent could give a Mog program a function to run a bash command, then enforce its own session-level permissions on that command, even though the command was dynamically generated by a plugin that was written without prior knowledge of those permission settings.
(There are a couple other tricks that PL people might find interesting. One is that the host can limit the execution time of the guest program. It does this using cooperative interrupt polling, i.e. the compiler inserts runtime checks that check if the host has asked the guest to stop. This causes a roughly 10% drop in performance on extremely tight loops, which are the worst case. It could almost certainly be optimized.)
3. Self Modification Without Restart: When I try to modify my OpenClaw from my phone, I have to restart the whole agent. Mog fixes this: an agent can compile and run new plugins without interrupting a session, which makes it dynamically responsive to user feedback (e.g., you tell it to always ask you before deleting a file and without any interruption it compiles and loads the code to... actually do that).
Async support is built into the language, by adapting LLVM's coroutine lowering to our Rust port of the QBE compiler, which is what Mog uses for compilation. The Mog host library can be slotted into an async event loop (tested with Bun), so Mog async calls get scheduled seamlessly by the agent's event loop. Another trick is that the Mog program uses a stack inside the memory arena that the host provides for it to run in, rather than the system stack. The system tracks a guard page between the stack and heap. This design prevents stack overflow without runtime overhead.
Lots of work still needs to be done to make Mog a "batteries-included" experience like Python. Most of that work involves fleshing out a standard library to include things like JSON, CSV, Sqlite, and HTTP. One high-impact addition would be an `llm` library that allows the guest to make LLM calls through the agent, which should support multiple models and token budgeting, so the host could prevent the plugin from burning too many tokens.
I suspect we'll also want to do more work to make the program lifecycle operations more ergonomic. And finally, there should be a more fully featured library for integrating a Mog host into an AI agent like OpenClaw or OpenAI's Codex CLI.
Agents can pretty much iterate on their own.
The most important thing for me, at least for now (and IMO the foreseeable future) is being able to review and read the output code clearly. I am the bottleneck in the agent -> human loop, so optimizing for that by producing clear and readable code is a massive priority. Gleam eliminates a ton of errors automatically so my reviews are focused on mostly business logic (also need to explicitly call out redundant code often enough).
I could see an argument for full on Erlang too, but I like the static typing.
If anthropic makes "claude-script", it'll outmog this language with massive RL-maxing. I hope your cortisol is ready for that.
If you want to try and mog claude with moglang, I think you need to make a corpus of several terrabytes of valid useful "mog" programs, and wait for that to get included in the training dataset.
> String Slicing > You can extract a substring using bracket syntax with a range: s[start:end]. Both start and end are byte offsets. The slice includes start and excludes end.
Given that all strings are UTF-8, I note that there's not a great way to iterate over strings by _code point_. Using byte offsets is certainly more performant, but I could see this being a common request if you're expecting a lot of string manipulation to happen in these programs.
Other than that, this looks pretty cool. Unlike other commenters, I kinda like the lack of operator precedence. I wouldn't be surprised if it turns out to be not a huge problem, since LLMs generating code with this language would be pattern-matching on existing code, which will always have explicit parentheses.
If you're running the compiled code in-process, how is that not JIT? And isn't that higher-latency than interpreting? Tiered-JIT (a la V8) solves exactly this problem.
Edit: Although the example programs show traditional AOT compile/execute steps, so "no process startup cost" is presumably a lie?
JIT means the code is interpreted until some condition kicks in to trigger compilation. This is obviously common and provides a number of advantages, but it has downsides too: 1) Code might run slowly at first. 2) It can be difficult to predict performance -- when will the JIT kick in? How well will it compile the code?
With Mog, you do have to pay the up-front cost of compiling the program. However, what I said about "no process startup cost" is true: there is no other OS process. The compiler runs in process, and then the compiled machine code is loaded into the process. Trying to do this safely is an unusual goal as far as I can tell. One of the consequences of this security posture is that the compiler and host become part of the trusted computing base. JITs are not the simplest things in the world, and not the easiest things to keep secure either. The Mog compiler is written entirely in safe Rust for this reason.
This up-front compilation cost is paid once, then the compiled code can be reused. If you have a pre-tool-use hook, or some extension to the agent itself, that code runs thousands of times, or more. Ahead-of-time compilation is well-suited for this task.
If this is used for writing a script that agent runs once, then JIT compilation might turn out to be faster. But those scripts are often short, and our compiler is quite fast for them as it is in the benchmarking that I've done -- there are benchmarking scripts in the repo, and it would be interesting to extend them to map out this landscape more.
Also, in my experience, in this scenario, the vast majority of the total latency of waiting for the agent to do what you asked it is due to waiting for an LLM to finish responding, not compiling or executing the script it generated. So I've prioritized the end-to-end performance of Mog code that runs many times.
V8 can also go back and forth from machine instructions back to bytecode if it identifies that certain optimization assumptions no longer hold.
A limited plugin API is interesting in some ways, but it has "rewrite it in Rust" energy. Maybe it's easier to flesh out a new library ecosystem using a coding agent, though?
Almost all the code LLMs have been trained on uses operator precedence, so no operator precedence seems like a massive foot-gun.
how do you mean? given that spec, ambiguous code just won't compile. that could potentially be inefficient, but not a foot gun.
But LLMs very much do care. They are measurably worse when writing code in languages with non-standard or non-existent operator precedence. This is not surprising given how they learn programmming.
The permission model is almost identical to Roc's - https://www.roc-lang.org/platforms - although Roc isn't designed for "Syntax only an AI could love" (among many other differences between the two languages - but still, there are very few languages taking this approach to permissions).
If you're curious, I've talked about details of how Roc's permission model works in other places, most recently in this interview: https://youtu.be/gs7OLhdZJvk?si=wTFI7Ja85qdXJWiW
It feels like a custom defined DSL (domain specific language) problem.
Models are good at generating code that already have a large corpus of examples, documentation, and training data behind them. A brand new language may be good for LLM to speak on, but it is hard for LLMs to produce it reliably until it becomes widely used. And it is hard for it to become widely used until models can already produce it well.
> There's also less support in Mog for generics, and there's absolutely no support for metaprogramming, macros, or syntactic abstraction.
OK that does immediately make it boring, I give them that much.
A few questions:
- Is there a list of host languages?
- Can it live in the browser? (= is JS one of the host languages?)
It's also designed to be run in an event loop. I've tested this with Bun's event loop that runs TypeScript. I haven't tried it with other async runtimes, but it should be doable.
As for the browser, I haven't tried it, but you might be able to compile it to WASM -- the async stuff would be the hardest part of that, I suspect. Could be cool!
The project is here: https://codeberg.org/ZelphirKaltstahl/web-app-vocabulary-tra... But I left it unfinished and a quick grep does not yield comments or something that explains why at some place I do something to circumvent the SQLite problems. I remember though, that I basically swore to myself, that I would not ever use SQLite in production with Django ORM. And if I am not using it in production, then testing also better should not be using it, because one should test with the same RDBMS as runs in production, or risk unexpected issues suddenly only happening in production. So SQLite is out for anything serious in Django projects for me.
Would have been a blockchain language 10 years ago.
Please think twice before releasing these, if you're going to do it come up with at least one original idea that nobody else has done before.
Why didn't you just call it "bad rust copy"?
I see that Deno requires a subprocess which introduces some overhead, and I might be naive to think so, but that doesn't seem like it would matter much when agent round-trip and inference time is way, way longer than any inefficiency a subprocess would introduce. (edit: I realized in some cases the round-trip time may be negligible if the agent is local, but inference is still very slow)
I admittedly do prefer the syntax here, but I'm more so asking these questions from a point of pragmatism over idealism. I already use Deno because it's convenient, practical, and efficient rather than ideal.
The bigger problem is maintainability over the long term, Deno is built by Node.js creator and is maintained for half a decade now, that's hard to compete with. In a way it's much more about social trust rather than particular syntax.
It's expensive of course, but if a new language is genuinely better for LLMs to write and understand, that would not be an issue.
I guess it depends on what "would work better" really means, but I don't think it's always a given. I've made my own languages, there is no available training set on exactly those, but AI with a prompt can figure out how to effectively use them as much as any other language, it seems to me. I guess it helps that most languages are more similar to each other than different, but even experimenting with new syntax seems to work out OK for me.
I can tell an llm "write hello world in C", and it will produce a valid program with just that context, without needing the C language spec nor stdlib definition in the context window because they're baked into the model weights.
As such, I can use the context window to for example provide information about my own function signatures, libraries, and objectives.
For a language not well-represented in the training data-set, a chunk of my context has to be permanently devoted to the stdlib and syntax, and while coding it will have to lookup stdlib function signatures and such using up additional context.
Perhaps you're trying to argue that the amount of tokens needed to describe the language, the stdlib, the basic tooling to look up function signatures, commands to compile, etc is not enough tokens to have a meaningful impact on the context window overall?
- GitHub syntax highlighting
- IDE integrations, LSP
- Modules and dependency management
I don't see an agent first language becoming a thing while humans are still ultimately responsible.
There is something to be said about giving AIs a clean foundation on which to build their own language. This allows evolution of such systems to go all the way into the compiler, beyond tooling.
Something purpose built to enable embedding allows it to be used in more contexts. Maybe I want a Mog plugin for my latest video game. Embedding JS is possible, but no fun.
I didn't mean to suggest there's no need for Mog either. I love to see developments like this. Deno is a practical solution for me today, but I see why it isn't a perfect forever-solution too.
Since it's new, Mog will likely not yet beat existing systems at basically anything. Its potential lies in having better performance and a much smaller total system footprint and complexity than the alternatives. WASM is generally interpreted -- you can compile it, but it wasn't really designed for that as far as I know.
More generally, I think new execution environments are good opportunities for new languages that directly address the needs of that environment. The example that comes to mind is JavaScript, which turned webpages into dynamically loaded applications. AI agents have such heavy usage and specific problems that a language designed to be both written and executed by them is worth a shot in my opinion.
WASM is a great system, but quite complex -- the spec for Mog is roughly 100x smaller.