Against Query Based Compilers
80 points by surprisetalk 3 days ago | 48 comments

recursivecaveat 23 hours ago
Mass unqualified imports is a feature that I really dislike in languages. At least please make it less syntactically convenient so people are not drawn to it. Trawling around to find out where symbols come from has wasted a lot more of my time than unqualified imports have ever saved. Using a very short alias gets you 90% of the way there anyways.
reply
AlotOfReading 19 hours ago
I recently designed a build system and of the key decisions was banning unqualified target names. The neat side effect was that it also became possible to ask the language itself where something was defined with only single additional line of code in the runtime.
reply
p1necone 22 hours ago
My plan for this in my current toy language project is to allow things like 'import * from Foo', but save a package.lock esque file somewhere on first build - after that you need to run some kind of '--update' command to bring in totally new names.

The problem I'm trying to solve is more around ensuring that purely additive changes in libraries aren't technically breaking due to the risk of name clashes than general discoverability though.

reply
boxed 18 hours ago
In Swift it's even worse. All symbols in neighbor files are available. No imports.
reply
armchairhacker 18 hours ago
Also in Java, and one of the (few) features they added was “module imports” (https://docs.oracle.com/en/java/javase/24/language/module-im...). I’m curious why, my guess is that it’s for beginners and scripts (same with “unnamed classes and instance main” (https://docs.oracle.com/en/java/javase/21/language/implicitl...)).
reply
conartist6 4 hours ago
o_O I haven't written Swift so I did not know that
reply
jiggawatts 23 hours ago
> Trawling around to find out where symbols come from has wasted a lot more of my time than unqualified imports have ever saved.

Why aren't you using an IDE with "navigate to definition" conveniently bound to something like middle-click or ctrl-click?

I haven't used a language/IDE combination without this feature in decades.

reply
alpinisme 22 hours ago
Sometimes I just want to open a project quickly and take a look without getting all dependencies installed and the lsp working and so on. Sometimes I want to look at another team’s project even though it’s in another language than my team works in, but I don’t want to set up that language in my environment.

This is particularly true with cpp projects, where wrestling with cmake and co just isn’t how i want to spend my time to answer a question about why something is slow or what a serialized object shape should be under various circumstances.

reply
jiggawatts 20 hours ago
> Sometimes I just want to open a project quickly

Every time I've heard someone say a version of this, invariably they've spent more time doing things manually than the properly mechanised method would have achieved.

There's a seductive immediacy to certain quick & dirty manual processes that in the sum are slower if measured with a stopwatch. It could be as trivial as not having syntax highlighting means that you miss an issue because you didn't notice that something was commented out or a quoted string was quoted incorrectly.

Similarly, I've argued with many developers a few decades back that insisted that the fancy keyboard bindings of EMACS or Vim made them faster, but if raced against someone with an IDE they lost badly every time, no exceptions.

reply
hombre_fatal 20 hours ago
Huh? Their example could be just reading code in github or reading diffs. You shouldn’t need to pull code into a development environment just so you can GoToDefinition to understand what’s going on.

There’s all sorts of workflows where vim would mog the IDE workflow you’re really excited about, like pressing E in lazy git to make a quick tweak to a diff. Or ctrl-G in claude code.

I wouldn’t be so sure you’ve cracked the code on the best workflow that has no negative trade offs. Everyone thinks that about their workflow until they use it long enough to see where it snags.

reply
jiggawatts 19 hours ago
> You shouldn’t need to

... but you do more often that the quick & dirty approach really allows.

I was just watching the Veritasium episode on the XZ tools hack, which was in part caused by poor tooling.

The attacker purposefully obfuscated his change, making a bunch of "non-changes" such as rearranging whitespace and comments to hide the fact that he didn't actually change the C code to "fix" the bug in the binary blob that contained the malware payload.

You will miss things like this without the proper tooling.

I use IDEs in a large part because they have dramatically better diff tools than CLI tools or even GitHub.

> you’ve cracked the code on the best workflow

I would argue that the ideal tooling doesn't even exist yet, which is why I don't believe that I've got the best possible setup nailed. Not yet.

My main argument is this:

Between each keypress in a "fancy text editor" of any flavour, an ordinary CPU could have processed something like 10 billion instructions. If you spend even a minute staring at the screen, you're "wasting" trillions of possible things the computer could be doing to help you.

Throw a GPU into the mix and the waste becomes absurd.

There's an awful lot the computer could be doing to help developers avoid mistakes, make their code more secure, analyse the consequences of each tiny change, etc...

It's very hard to explain without writing something the length of War & Peace, so let me leave you with a real world example of what I mean from a related field:

There's two kinds of firewall GUIs.

One kind shows you the real-time "hit rate" of each rule, showing packets and bytes matched, or whatever.

The other kind doesn't.

One kind dramatically reduces "oops" errors.

The other kind doesn't. It's the most common type however, because it's much easier to develop as a product. It's the lazy thing. It's the product broken down into independent teams doing their own thing: the "config team" doing their thing and the "metrics" team doing theirs, no overlap. It's Conway's law.

IDEs shouldn't be fancy text editors. They should be constantly analysing the code to death, with AIs, proof assistants, virtual machines, instrumentation, whatever. Bits and pieces of this exist now, scattered, incomplete, and requiring manual setup.

One day we'll have these seamlessly integrated into a cohesive whole, and you'd be nuts to use anything else.

... one day.

reply
akst 19 hours ago
The fact you’re talking someone with this frustration shows maybe there are people with use cases other than yours?

When IDEs do resolve this it tends to be because they built some index to look up these identifiers, which is likely taking up a portion of your memory. A language that statically tells you with an identifier comes from will take out less resources, and your IDE can collapse the import anyways.

So not sure why you feel so strongly about a language design whose ambiguity necessitates consuming additional resources to show you your little drop-down menu.

reply
jitl 22 hours ago
the difference is "i see this information at a glance" versus "i need to move the cursor to each unknown name to ask my ide a question". i think doing the command-click or even just hovering a bunch of unknown symbols is "trawling around". this goes doubly when doing code reviews; i'd prefer code make sense to my grug brain when i view it outside an ide. (sure, i can pull the code down to do an in-depth review in my ide, but the tax for unqualified names is starting to pile up...)
reply
jiggawatts 20 hours ago
> the difference is "i see this information at a glance" versus "i need to move the cursor to each unknown name to ask my ide a question".

Is it really "at a glance"?

In most languages either by convention or requirement the "import" or "using" statements are collected at the beginning of a file. Once you've scrolled down even a few lines, the context is gone.

Also, determining what exactly is bound where is decidedly non-trivial in many languages due to keywords such as "var" and "let", overloaded function/method definitions, etc...

Sure, a human can do this with 95% or better accuracy, but that 5% can be a killer during a complex troubleshooting session if you guess wrong.

That's why I strongly prefer IDEs and having a purely mechanical process to resolve the dependencies so I can know exactly what things are instead of hoping my intuitions were correct.

reply
jitl 12 hours ago
go gets this very right by strongly preferring `import “x”; x.y` style.
reply
jcgrillo 21 hours ago
similarly, I can step through a program in the debugger and see how it works--or at least that it works--but the whole point of having a programming language in the first place is that (if I use it right) I can understand the program without running it
reply
garretraziel 13 hours ago
What if I want the opposite? I was modernizing library usage in C# project and I very much missed a functionality of “show me all classes imported from this package in the current class”.
reply
nurettin 14 hours ago
Even better, use an IDE which will generate the import statements, so that you won't have to use these feature-of-the-week language quirks from the 90s.
reply
armchairhacker 3 days ago
> In Zig, every file can be parsed completely in isolation...every name needs to be explicitly declared (there’s no use *)...

> In contrast, you can’t really parse a file in Rust...Expanding macros requires name resolution, which, in Rust, is a crate-wide, rather than a file-wide operation...Similarly, the nature of the trait system is such that impl blocks relevant to a particular method call can be found almost anywhere...

matklad doesn't even mention dynamic languages, where perfect name resolution is undecidable. "Fast static analysis, easy static analysis, language designed for static analysis - pick two".

Rust's IDE integration is fast and deep, and I've heard TypeScript's is too, so "easy static analysis" may not be important today. I believe it will as coding evolves due to LLMs, albeit without evidence and I'm not quite sure how.

reply
rendaw 22 hours ago
> "Fast static analysis, easy static analysis, language designed for static analysis - pick two"

Where does this come from and what's the explanation?

reply
armchairhacker 18 hours ago
I made it up, although maybe I’m not the first? The third should actually be “language not designed around static analysis”

- You can write a static analysis that is fast and straightforward if you design the language around it (fast + easy + !language)

- Otherwise, you’ll run into features that prevent that. Ex (stolen from an above comment): name resolution within unqualified imports

- For many such features, you can write a brute-force solution. Ex: scan all possible definitions within the unqualified import until you find the target name (!fast + easy + language)

- Or you can optimize this solution, using incremental cached lookup tables (fast + !easy + language)

Of course, there are languages with features that make most static analyses for them neither fast nor easy (ex: for name resolution, eval or macros). But it holds in many cases, ex:

- Parsing: basic syntax (!language), GLR parser or LL parser with big alternatives (!fast), IELR parser or LL parser with smart alternatives (!easy)

- Type inference: many type annotations (!language), brute-force enumerate candidates (!fast), Hindley-Milner or bidirectional, possibly extended for features like trait resolution (!easy)

- Allocation: manual (!language), put everything on the heap (!fast), heuristics, implicit boxing and unboxing (!easy)

There’s a trend (I also don’t know where it originates) that most “data processing” problems can be solved with an easier slower or trickier faster algorithm (hence, competitive programming). Static analyses are this class of problem, but for them (sometimes) there is a third option: simplify the problem by putting hints in the language.

reply
foolswisdom 23 hours ago
I would really appreciate if rust analyzer was faster, actually. It feels even worse with the fact that you need to save the file before it updates the type checking (though I assume it's because it's too slow to feel smooth if you do it while typing?).
reply
Rusky 22 hours ago
The reason rust-analyzer doesn't update diagnostics until you save is historical. Originally, people tried to build IDE support by reusing rustc itself, but this proved too slow and cumbersome at the time.

Rust-analyzer reimplemented the frontend in a more IDE-friendly architecture, but focused more on name resolution than on type checking. So it delegated diagnostics to literally just running `cargo check`.

As parts of rustc get rewritten over time (the trait solver, borrow checker) they have also been made more IDE-friendly and reusable, so rust-analyzer is slowly gaining the ability to surface more type checking diagnostics as you edit, without delegating to `cargo check`.

reply
smj-edison 19 hours ago
It seems like he's not arguing against query-based compilers per se, but rather he's arguing against the thinking that often comes with it. Namely, that you can amortize an expensive compilation process across incremental changes (which often doesn't work in practice because of the avalanche effect).
reply
Panzerschrek 19 hours ago
I have a language server implementation for my programming language. Since it has been developed much later than its compiler, no stuff like query-based redesign was viable to implement. So, I just adopted the compiler codebase for language server needs in a tricky way: on each document change I just redo parsing, but not other compilation steps and use parsing results to perform fetches from name tables built previously - for some older document state. This state is costly to update and it's done only once in a while and only if input document is syntactically-correct. This approach works surprisingly well (it's fast and completion is mostly correct) and doesn't require throwing the whole compiler code away.

Even more, I can't imagine how people write language servers with full reprocessing on each document change (even with query-based approach). How do they deal with broken syntax? isn't it generally impossible to recovery syntactic errors, so that the end result is as good as expected?

reply
norman784 15 hours ago
You need a fault tolerant parser, tree sitter is a good example on how it works in practice, your compiler should have the ability to run a few steps (parsing, symbol resolution, type checking, etc) with an incomplete/invalid AST.

> isn't it generally impossible to recovery syntactic errors

AFAIK Zig was designed in such way that, in most cases, you can parse each line independently, like their strings by default are single line, you need a special syntax for multiline strings that still are per line basis.

reply
Panzerschrek 7 hours ago
I doubt it's possible to perform proper syntax recovery without messing the result. My approach with usage of some previous valid program state is more reliable and easier to implement.

> AFAIK Zig was designed in such way that, in most cases, you can parse each line independently

It can't be true. Basic stuff like opening/closing braces introduces some global syntax state, so that line-by-line parsing isn't possible. Do you mean something like tokenization instead? It is in many cases possible. But for true LSP it's almost useless.

reply
armchairhacker 18 hours ago
I know Haskell’s LSP years ago just stopped on broken syntax, so the only diagnostic would be the syntax error and you would lose code actions. That was annoying, especially because the LSP was slow.

JetBrains (not an LSP) works up to the syntax error, then recovers and works after. In some cases that are not hard to trigger (e.g. after a closing delimiter), the recovery is wrong and everything after is messed up. But in practice that’s rarely an issue; if I want those later diagnostics and code actions, I fix the delimiters, and the analysis updates quickly.

reply
noelwelsh 16 hours ago
The article doesn't mention the other side of the tradeoff, which is that features like Rust's traits or macros make the language more expressive. Given that Rust's LSP server is pretty snappy, these features don't seem to cause problem in practice for incremental compilation.
reply
gizmondo 14 hours ago
The author is the former lead developer of the Rust's LSP server, so it seems natural to have a preference for languages that would allow him to do a better job. Slow compilation (including incremental) is one of the most common complaints about Rust, despite a lot of effort spent on optimizations there. I think it would be good if more people are aware of the inherent downsides of maximizing expressiveness.
reply
speed_spread 11 hours ago
Rust compilation is actually very fast if you keep things Java-esque : No macros, no monomorphization (every Trait dyn). Obviously most the ecosystem isn't built that way, leaning more on the metaprogramming style because the language makes it so convenient.
reply
snthpy 19 hours ago
> Similarly, the nature of the trait system is such that impl blocks relevant to a particular method call can be found almost anywhere. For every trait method call, you get a dependency on the impl block that supplies the implementation, *but you also get a dependency on non-existence of conflicting impls in every other file*!

The last clause is a great point and one I will try to remember.

reply
jitl 21 hours ago
i agree on the whole that query is a good second line of defense versus the primary strategy. however you can do query based strategy and still use a diff/patch approach for some queries, just make the previous state of a query one of its inputs.

query based compilers are equivalent to component UI frameworks / signals / self-adjusting computation / spreadsheets. signia signals library implements this idea of using the previous output + diffs to compute the next output (https://signia.tldraw.dev/docs/incremental#using-diffs-in-co...)

reply
zarzavat 20 hours ago
> In contrast, you can’t really parse a file in Rust. Rust macros generate new source code, so parsing can’t be finished until all the macros are expanded.

Rust needs macros because the language is extremely strict and verbose. Macros are the main escape hatch, allowing you to simplify your code while maintaining memory safety without a GC.

Zig doesn't have memory safety as a goal so it seems like an unfair comparison.

reply
Philpax 20 hours ago
I'm a Rust main, but this argument seems... incorrect? You would not need macros for Rust to remain a usable memory-safe language. They certainly make it easier, but they're not necessary. It would be perfectly possible to design a variant of Rust that gets you to 80-90% of Rust's usability, with the same safety, without macros.
reply
kobebrookskC3 9 hours ago
how would you implement https://doc.rust-lang.org/stable/std/pin/macro.pin.html without macros? a macro is used to shadow the original variable so that you can't move it (safely) after you pin it
reply
steveklabnik 8 hours ago
Regular variable definition shadows. Macros expand to regular Rust code, they could always be replaced by the expanded body.
reply
kobebrookskC3 2 minutes ago
yes, but the code inside is unsafe. the pin macro is like a safe function.
reply
jibal 4 hours ago
Since macros just expand into code, how could you imagine that a macro is ever necessary?
reply
kobebrookskC3 8 minutes ago
the macro uses unsafe inside, so that's another instance of unsafe you'll need to check, whereas the pin macro is like a safe function
reply
foltik 20 hours ago
Macros in Rust have nothing to do with memory safety. They're typically used for compile-time codegen (derive, println!) and very occasionally for syntactic abstraction (vec!).
reply
armchairhacker 18 hours ago
A lot of the abstraction that Rust’s macros enable can be implemented via generics and traits, especially if they were extended (ex: keyword generics, variadic generics, orphan trait instances). Except traits are the other thing that complicates Rust’s name resolution…
reply
jibal 19 hours ago
This makes no sense at all.

The "comparison" (it isn't one--it's a technical point; "fairness" is not remotely relevant--Rust isn't being attacked and doesn't need defending) is between a language with macros and a language without macros--why the languages have or don't have macros isn't relevant.

Zig is strict and verbose and could benefit from macros but for numerous reasons doesn't have them.

Zig also has no GC and does have memory safety as a goal (but not the primary one, and not provable memory safety)--but none of this is at all relevant to the point the OP made, which is strictly about one language having macros and the other language not having them.

reply
jiggawatts 23 hours ago
> This is provably impossible to make usefully incremental. The encryption can be implemented as a graph of function calls, and you can apply the general incremental recipe to it. It just won’t be very fast.

The specific example is bogus.

Merkle trees and their many variants exist to solve precisely this problem.

For more compiler-specific scenarios there exist vaguely similar solutions to the issues introduced by incremental compilation such as splitting up monolithic executables into many small dynamically loaded libraries (only during development time), or taking that to the extreme, hot code reloading at the function level.

> ... only because Zig language is carefully designed to allow this.

Is the key point. Rust wasn't designed for incremental compilation, and most legacy languages like C only allow it in a useful way because they were designed in the era "kilobytes" of system memory and hence they're very frugal in their design.

Other than Zig, the only modern language I'm aware of that was "co-designed" for efficient compilation and incremental reload is Jai.

> Expanding macros requires name resolution, which, in Rust, is a crate-wide, rather than a file-wide operation.

Sure, but macros change infrequently, so that ought to be a highly cacheable pure function for most edits.

> The above scheme works only if the language has a property that changing the body of function foo (not touching its signature) can’t introduce type errors into an unrelated function bar.

AFAIK Rust and most other statically typed languages have this property. Optimisations such as inlining can mess with the cacheability of code-gen in all languages.

reply
skybrian 21 hours ago
Doesn't Go qualify? It has a fast compiler and it was designed that way.
reply
chrismorgan 21 hours ago
> Rust and most other statically typed languages have this property.

Actually, auto traits thwart this in some places for Rust: https://rust-lang.github.io/impl-trait-initiative/explainer/....

reply
Ygg2 15 hours ago
Fair point but auto traits are already leaky.

And they were created to minimize writing down what auto traits your types implemented. You can filter by them, but they are assumed automatic bounds.

And interestingly enough Zig doesn't have this property either it leaks constness, alignment and similar info.

reply
jibal 19 hours ago
> Optimisations such as inlining can mess with the cacheability of code-gen in all languages.

The statement was about introducing type errors, not cacheability of code-gen.

reply