Hacker News

106 points by pjmlp 3 hours ago | 55 comments

I like this article already because it took me to the goals of Rust for 2026. We use the language in our team, but we haven't needed to go very deep to do the stuff we need. Yet, I really enjoy witnessing the development of a language from ground up with so much community feedback.

I somehow miss noticing that in C++ and I have no idea how it is working in other domains.

My only gripe is that a lot of it is feeling a bit kick-starter-y, with each of the goals needing specific funding. Is that the best model we've found so far?

hmry 2 hours ago

Great article! Love these types of deep dives into optimizations. Hope the project goal works out!

I've felt before that compilers often don't put much effort into optimizing the "trivial" cases.

Overly dramatic title for the content, though. I would have clicked "Async Rust Optimizations the Compiler Still Misses" too you know

Animats 2 hours ago

Agree on title. Too dramatic.

The author seems to be obsessing about the overhead for trivial functions. He's bothered by overhead for states for "panicked" and "returned". That's not a big problem. Most useful async blocks are big enough that the overhead for the error cases disappears.

He may have a point about lack of inlining. But what tends to limit capacity for large numbers of activities is the state space required per activity.

molf 2 hours ago

> Most useful async blocks are big enough that the overhead for the error cases disappears.

Is it really though?

In my experience many Rust applications/libraries can be quite heavy on the indirection. One of the points from the article is that contrary to sync Rust, in async Rust each indirection has a runtime cost. Example from the article:

    async fn bar(blah: SomeType) -> OtherType {
       foo(blah).await
    }

I would naively expect the above to be a 'free' indirection, paying only a compile-time cost for the compiler to inline the code. But after reading the article I understand this is not true, and it has a runtime cost as well.

repelsteeltje 9 minutes ago

> [...] That's not a big problem [...]

Depends somewhat on your expectations, I suppose. Compared to Python, Java, sure, but Rust off course strives to offer "zero-cost" high level concepts.

I think the critique is in the same realm of C++'s std::function. Convenience, sure, but far from zero-cost.

swiftcoder 2 hours ago

> Most useful async blocks are big enough that the overhead for the error cases disappears.

Most useful async blocks are deeply nested, so the overhead compounds rapidly. Check the size of futures in a decently large Tokio codebase sometime

algesten 34 minutes ago

He's optimizing for embedded no-std situation. These things do matter in constrained environments.

dathinab 37 minutes ago

> Agree on title. Too dramatic.

not just too dramatic

given that all the things they list are

non essential optimizations,

and some fall under "micro optimizations I wouldn't be sure rust even wants",

and given how far the current async is away from it's old MVP state,

it's more like outright dishonest then overly dramatic

like the kind of click bait which is saying the author does cares neither about respecting the reader nor cares about honest communication, which for someone wanting to do open source contributions is kinda ... not so clever

through in general I agree rust should have more HIR/MIR optimizations, at least in release mode. E.g. its very common that a async function is not pub and in all places directly awaited (or other wise can be proven to only be called once), in that case neither `Returned` nor `Panicked` is needed, as it can't be called again after either. Similar `Unresumed` is not needed either as you can directly call the code up to the first await (and with such a transform their points about "inlining" and "asyncfns without await still having a state machine" would also "just go away"TM, at least in some places.). Similar the whole `.map_or(a,b)` family of functions is IMHO a anti-pattern, introducing more function with unclear operator ordering and removal of the signaling `unwrap_` and no benefits outside of minimal shortening a `.map(b).unwrap_or(a)` and some micro opt. is ... not productive on a already complicated language. Instead guaranteed optimizations for the kind of patterns a `.map(b).unwrap_or(a)` inline to would be much better.

groundzeros2015 2 hours ago

Async seems like an underbaked idea across the board. Regular code was already async. When you need to wait for an async operation, the thread sleeps until ready and the kernel abstracts it away. But We didn’t like structuring code into logical threads, so we added callback systems for events. Then realized callbacks are very hard to reason about and that sequential control is better.

So threads was the right programming model.

Now language runtimes prefer “green threads” for portability and performance but most languages don’t provide that properly. Instead we have awkward coloring of async/non-async and all these problems around scheduling, priority, and no-preemption. It’s a worse scheduling and process model than 1970.

usrnm 57 minutes ago

The problem comes from trying to sit on both chairs: we want async but want to be able to opt out. This is what causes most of the ugliness, including function colouring. Just look at golang, where everything is async with no way to change it, it's great. It's, probably, not well-suited for things like microcontrollers, where every byte matters, but if you can afford the overhead, it's so much better than Rust async. Before async Rust was an interesting and reasonable language, now it's just a hot mess that makes your eyes bleed for no reason.

vanderZwan 10 minutes ago

> It's, probably, not well-suited for things like microcontrollers, where every byte matters, but if you can afford the overhead, it's so much better than Rust async.

There is one hill I'll die on, as far as programming languages go, which is that more people should study Céu's structured synchronous concurrency model. It specifically was designed to run on microcontrollers: it compiles down to a finite state machine with very little memory overhead (a few bytes per event).

It has some limitations in terms of how its "scheduler" scales when there are many trails activated by the same event, but breaking things up into multiple asynchronous modules would likely alleviate that problem.

I'm certain a language that would suppprt the "Globally Asynchronous, Locally Synchronous" (GALS) paradigm could have their cake and eat it too. Meaning something that combines support for a green threading model of choice for async events, with structured local reactivity a la Céu.

F'Santanna, the creator of Céu, actually has been chipping away at a new programming language called Atmos that does support the GALS paradigm. However, it's a research language that compiles to Lua 5.4. So it won't really compete with the low-level programming languages there.

[0] https://ceu-lang.org/

[1] https://github.com/atmos-lang/atmos

nananana9 2 hours ago

> the thread sleeps until ready and the kernel abstracts it away.

Sure, but once you involve the kernel and OS scheduler things get 3 to 4 orders of magnitude slower than what they should be.

The last time I was working on our coroutine/scheduling code creating and joining a thread that exited instantly was ~200us, and creating one of our green threads, scheduling it and waiting for it was ~400ns.

You don't need to wait 10 years for someone else to design yet another absurdly complex async framework, you can roll your own green threads/stackful coroutines in any systems language with 20 lines of ASM.

groundzeros2015 58 minutes ago

1. Why can’t we have better green threads implementations with better scheduling models?

2. Unchecked array operations are a lot faster. Manual memory management is a lot faster. Shared memory is a lot faster.

Usually when you see someone reach for sharp and less expressive tools it’s justified by a hot code path. But here we jump immediately to the perf hack?

3. How many simultaneous async operations does your program have?

vlovich123 51 minutes ago

Well, if you offload heavy compute into an async task, then usually it depends strictly on how many concurrent inputs you are given. But even something as “simple” as a performance editor benefits from this if done well - that’s why JS text editors have reasonably acceptable performance whereas Java IDEs always struggled (historically anyway since even Java has adopted green threads).

ptx 7 minutes ago

Are you sure Java's UI issues are caused by threading and not just Swing being a glitchy pile of junk?

For example, if you don't explicitly call the java.awt.Toolkit.sync() method after updating the UI state (which according to the docs "is useful for animation"), Swing will in my experience introduce seemingly random delays and UI lag because it just doesn't bother sending the UI updates to the window system.

groundzeros2015 38 minutes ago

You think IDEs are written in JS because of the performance benefits of the threading model?

I thought it was because they could copy chromium.

vlovich123 33 minutes ago

Why do you think they don’t struggle with input latency? Because the non blocking nature built into the browser model is so powerful and you cannot get that with threads.

usrnm 16 minutes ago

Are you sure that latency-sensitive parts are written in async JS instead of having a separate UI thread (pool)? I have no idea myself, but without knowing the details it's hard to argue. Note, that browsers themselves, are usually written in languages like C++ or Rust. They run JS, but aren't written in it

groundzeros2015 19 minutes ago

I disagree with the premise. I cannot imagine a better latency experience than blocking loop IDEs like VS6.

Which inputs are getting latency? The keyboard? The files?

> the non blocking nature

https://youtu.be/bzkRVzciAZg?si=BuBXxHTgN0OqsAhI

BlackFly 36 minutes ago

I think that callbacks are actually easier to reason about:

When it comes time to test your concurrent processing, to ensure you handle race conditions properly, that is much easier with callbacks because you can control their scheduling. Since each callback represents a discrete unit, you see which events can be reordered. This enables you to more easily consider all the different orderings.

Instead with threads it is easy to just ignore the orderings and not think about this complexity happening in a different thread and when it can influence the current thread. It isn't simpler, it is simplistic. Moreover, you cannot really change the scheduling and test the concurrent scenarios without introducing artificial barriers to stall the threads or stubbing the I/O so you can pass in a mock that you will then instrument with a callback to control the ordering...

The problem with callbacks is that the call stack when captured isn't the logical callstack unless you are in one of the few libraries/runtimes that put in the work to make the call stacks make sense. Otherwise you need good error definitions.

You can of course mix the paradigms and have the worst of both worlds.

groundzeros2015 34 minutes ago

I agree. I don’t think callbacks are an underbaked language feature.

vlovich123 54 minutes ago

> Regular code was already async. When you need to wait for an async operation, the thread sleeps until ready and the kernel abstracts it away

Not really. I’ve observed async code often is written in such a way that it doesn’t maximize how much concurrency can be expressed (eg instead of writing “here’s N I/O operations to do them all concurrently” it’s “for operation X, await process(x)”). However, in a threaded world this concurrency problem gets worse because you have no way to optimize towards such concurrency - threads are inherently and inescapably too heavy weight to express concurrency in an efficient way.

This is is not a new lesson - work stealing executors have long been known to offer significantly lower latency with more consistent P99 than traditional threads. This has been known since forever - in the early 00s this is why Apple developed GCD. Threads simply don’t provide any richer information it needs in the scheduler to the kernel about the workload and kernel threads are an insanely heavy mechanism for achieving fine grained concurrency and even worse when this concurrency is I/O or a mixed workload instead of pure compute that’s embarrassingly easily to parallelize.

Do all programs need this level of performance? No, probably not. But it is significantly more trivial to achieve a higher performance bar and in practice achieve a latency and throughput level that traditional approaches can’t match with the same level of effort.

You can tell async is directionally kind of correct in that io_uring is the kernel’s approach to high performance I/O and it looks nothing like traditional threading and syscalls and completion looks a lot closer to async concurrency (although granted exploiting it fully is much harder in an async world because async/await is an insufficient number of colors to express how async tasks interrelate)

groundzeros2015 48 minutes ago

I am not saying threads are the model for all programming problems. For example a dependency graph like an excel spreadsheet can be analyzed and parallelized.

But as you observed, async/await fails to express concurrency any better. It’s also a thread, it’s just a worse implementation.

vlovich123 39 minutes ago

That’s incorrect. Even when expressed suboptimally, it still tends to result in overall higher throughput and consistently lower latency (work stealing executors specifically). And when you’re in this world, you can always do an optimization pass to better express the concurrency. If you’ve not written it async to start with, then you’re boned and have no easy escape hatch to optimize with.

groundzeros2015 35 minutes ago

Why can’t you do the same optimization? Are you maxing out you OS system resources on thread overhead?

codedokode 34 minutes ago

As I understand, "green threads" are also expensive, for example you either need to allocate a large stack for each "thread", or hook stack allocation to grow the stack dynamically (like Go does), and if you grow the stack, you might have to move it and cannot have pointers to stack objects.

pkolaczk 2 hours ago

Threads are neither better or worse than async+callbacks. They are different. There are problems which map nicely to threads and there are problems which are much nicer to express with async.

groundzeros2015 2 hours ago

Such as? The entire premise of async is that callbacks were a mistake because they broke sequential reasoning and control.

Every explanation of the feature starts with managing callback hell.

codedokode 12 minutes ago

The callbacks should be just hidden from programmer, that's what async/await are for.

swiftcoder 2 hours ago

> So threads was the right programming model.

For problems that aren't overly concerned with performance/memory, yes. You should probably reach for threads as a default, unless you know a priori that your problem is not in this common bucket.

Unfortunately there is quite a lot of bookkeeping overhead in the kernel for threads, and context switches are fairly expensive, so in a number of high performance scenarios we may not be able to afford kernel threading

groundzeros2015 2 hours ago

In that sentence I’m referring to the abstract idea of a thread of execution as a model of programming, not OS threads. A green thread implementation could do it too.

But what you said about kernel implementation is true. But are we really saying that the primary motivation for async/await is performance? How many programmers would give that answer? How many programs are actually hitting that bottleneck?

Doesn’t that buck the trend of every other language development in the past 20 years, emphasizing correctness and expressively over raw performance?

swiftcoder 2 hours ago

> But are we really saying that the primary motivation for async/await is performance?

The original motivation for not using OS threads was indeed performance. Async/await is mostly syntax sugar to fix some of the ergonomic problems of writing continuation-based code (Rust more or less skipped the intermediate "callback hell" with futures that Javascript/Python et al suffered through).

sureglymop 23 minutes ago

Importantly though, performance might be worse depending on use case and program. Specifically with scheduling in user space it can negatively impact branch prediction as your CPU is already hyper optimized for doing things differently.

It's all nuanced and what to choose requires careful evaluation.

nchie 57 minutes ago

> But are we really saying that the primary motivation for async/await is performance?

Of course - what else would it be? The whole async trend started because moving away from each http request spawning (or being bound to) an OS thread gave quite extreme improvements in requests/second metrics, didn't it?

groundzeros2015 52 minutes ago

I agree. Managing many http requests or responses was a motivating problem.

What I question is whether 1. Most programs resemble that, so that they make it an invasive feature of every general purpose language. 2. Whether programmers are making a conscious choice because they ruled out the perf overhead of the simpler model we have by default.

swiftcoder 39 minutes ago

That is why we have the function colouring problem and a split ecosystem in the first place - if it were obviously better in all cases, we'd make async the default, and get rid of the split altogether (and there are languages, like Erlang, that fall on this side of the fence)

hacker_homie 2 hours ago

I’m just waiting for them to try co-operative multithreading again.

conaclos 50 minutes ago

I recently started working with Rust async. The main issue I am currently facing is code duplication: I have to duplicate every function that I want to support both asynchronous and blocking APIs. This could be great to have a `maybe-async`. I took a look at the available crates to work around this (maybe-async, bisync), but they all have issues or hard limitations.

albertzeyer 47 minutes ago

The classic function coloring problem. https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...

small_scombrus 34 minutes ago

It'll depend immensely on what you're actually doing, but if it's simple enough you may be able to make a macro that subs out the types & awaits

ozgrakkurt 28 minutes ago

Does this kind of thing make noticeable difference when applied to more complicated async functions?

Examples in the blog seem too simple make any conclusions

diondokter 21 minutes ago

Hi, author here. I mention in the blog that I've tried to quickly hack two of the simplest optimizations in the compiler and it resulted in 2%-5% binary size savings in real embedded (async) codebases. And a quick and probably deeply flawed synthetic benchmark on the desktop showed a 3% perf increase.

So yes, it does really matter. Keep in mind that optimizations stack. We're preventing LLVM from doing it's thing. So if we make the futures themselves smaller, LLVM will be able to optimize more. So small changes really compound.

hacker_homie 2 hours ago

This is the type of ugly but necessary discussions that have been happening in c++ for a while.

I never really liked the viral nature of async in rust when it was introduced.

I wish rust the best of luck and with more people like this rust could have a brighter future.

DeathArrow 10 minutes ago

I like it more how Zig is approaching async with the new IO. It avoids function coloring.

slopinthebag 2 hours ago

It's so funny that people will do anything to hate on Rust, including nitpicking a few bytes of overhead for a future while they reach for an entire thread or runtime to handle async in their favourite language.

berkes 31 minutes ago

I know the people and the company behind this article. They do anything but "hate on Rust".

You could've deduced that from the fact that someone who puts this amount of energy in a detailed article about intricacies of an area of "foo", quite certainly does not "hate on foo".

slopinthebag 30 minutes ago

Not the article, the comments here man.

The article is fine besides the bait title.

lionkor 32 minutes ago

It's more that I and people I know love Rust, and enjoy it, and want it to be better. I want it to be relentlessly optimized.

rnijveld 60 minutes ago

You realize this article talks about Rust on embedded hardware specifically, where you don’t have threads or big runtimes? There is no hate going on here either, just attempts to make things better. Might I suggest you click through to the homepage and I think you’ll figure out the rest.

gspr 16 minutes ago

I _love_ Rust and use it whenever I can. I still find the comments in here to be quite appropriate. Async Rust leaves me with a (subjective!) feeling that something isn't quite right. Not that I know how it _should_ be, but that feeling is very different from the non-async parts of the language that almost always leaves me with a warm fuzzy feeling of joy.

I don't know enough about the domain to be objectively helpful, so it's all wishy-washy feelings on my part. I keep reaching for orchestrating things with threads in Rust where most people would probably reach for async these days. The only language where I've felt fine embracing the blessed async system is Haskell and its green threads (which I understand come with their own host of problems).

MrBuddyCasino 52 minutes ago

Nobody seriously tries to run Golang or Java on an MCU. But they do run Rust code.

forrestthewoods 49 minutes ago

Love Rust. They simply missed the mark with async. Swing and a miss.

The risk they took was very calculated. Unfortunately they’re bad at math and chose the wrong trade-offs.

Ah well. Shit happens.

lionkor 33 minutes ago

I think Rust has a pretty solid async implementation, compared to other systems languages. I struggle to point out another systems language with a working and actually used async implementation.

swiftcoder 33 minutes ago

> Unfortunately they’re bad at math and chose the wrong trade-offs

They chose the exact same tradeoffs as C++'s async/await (and the same overall model as Python/NodeJS), so I'm not sure what that says about programming as a whole.

nromiun 22 minutes ago

Async in Rust and C++ is nothing like it is in Python or NodeJS. Choose your own runtime is a very different model than having a default one.

Not to mention Tokio (most popular runtime for Rust) is multi-threaded by default. So you have to deal with multithreading bugs as well as normal async ones. That is not the case with most async languages. For example both Python and NodeJS use a single thread to execute async code.