GopherCon 2018: The Scheduler Saga - Kavya Joshi https://www.youtube.com/watch?v=YHRO5WQGh0k
GopherCon 2017: Understanding Channels - Kavya Joshi https://www.youtube.com/watch?v=KBZlN0izeiY
This one notably also explains the design considerations for golangs M:N:P in comparison to other schemes and which specific challenges it tries to address.
Wouldn’t that mean go never uses registers to pass arguments to functions?
If so, that seems in conflict with https://go.dev/src/cmd/compile/abi-internal#function-call-ar..., which says “Because access to registers is generally faster than access to the stack, arguments and results are preferentially passed in registers”
Or does the compiler always Go’s stable ABI, known as ABI0 in functions where it inserts code to potentially context switch, and only uses the (potentially) faster ABI that passes arguments in registers elsewhere?
If you fix N workers and control dispatch order yourself, the scheduler barely gets involved — no stealing, no surprises.
The inter-goroutine handoff is ~50-100ns anyway.
Isn't the real issue using `go f()` per request rather than something in the language itself?
https://github.com/php/frankenphp/pull/2016 if you want to see a “correctly behaving” implementation that becomes 100% cpu usage under contention.
From my pov, the worker pool's job isn't to absorb saturation. it's to make capacity explicit so the layer above can route around it. a bounded queue that returns ErrQueueFull immediately is a signal, not a failure — it tells the load balancer to try another instance.
saturation on a single instance isn't a scheduler problem, it's a provisioning signal. the fix is horizontal, not vertical. once you're running N instances behind something that understands queue depth, the "unfair scheduler under contention" scenario stops being reachable in production — by design, not by luck.
the FrankenPHP case looks like a single-instance stress test pushed to the limit, which is a valid benchmark but not how you'd architect for HA.
Go's objective was to become a faster Python. Which was something we also desperately needed at the time, and it has well succeeded on that front. Go has largely replaced all the non-data science things people were earlier doing with Python.
It’s a problem that only go can solve, but that means giving up some of your speed that are currently handled immediately that shouldn’t be. So overall latency will go up and P99 will drop precipitously. Thus, they’ll probably never fix it.
If you have a system that requires predictable latency, go is not the right language for it.
I'm sorry you had a bad experience with Go. What makes you say this? Have you filed an issue upstream yet? If not, I encourage you to do so. I can't promise it'll be fixed or delved into immediately, but filing detailed feedback like this is really helpful for prioritizing work.
Having a garbage collector already make this the case, it is a known trade off.
You can have world pauses that are independent of heap size, and thus predictable latency (of course, trading off some throughput, but that is almost fundamental)
- https://www.ptc.com/en/products/developer-tools/perc
- https://www.aicas.com/products-services/jamaicavm
- https://www.azul.com/products/prime
Not all GCs are born alike.
Having a interface for how it is supposed to behave, a runtime.SetScheduler() or something, but it won't happen.
I have this feeling that in their quest to make Go simple, they added complexity in other areas. Then again, this was built at Google, not Bell Labs so the culture of building absurdly complex things likely influenced this.
I presume that's by design, to trade off against other things google designed it for?
I strongly call BS on that.
Strong claim and evidence seems to be a hallucination in your own head.
There are several writeups of large backends ported from node/python/ruby to Go which resulted in dramatic speedups, including drop in P99 and P99.9 latencies by 10x
That's empirical evidence your claim is BS.
What exactly is so unfair about Go scheduler and what do you compare it to?
Node's lack of multi-threading?
Python's and Ruby's GIL?
Just leaving this to OS thread scheduler which, unlike Go, has no idea about i/o and therefore cannot optimize for it?
Apparently the source of your claim is https://github.com/php/frankenphp/pull/2016
Which is optimizing for a very specific micro-benchmark of hammering std-lib http server with concurrent request. Which is not what 99% of go servers need to handle. And is exercising way more than a scheduler. And is not benchmarking against any other language, so the sweeping statement about "higher than any other language" is literally baseless.
And you were able to make a change that trades throughput for P99 latency without changing the scheduler, which kind of shows it wasn't the scheduler but an interaction between a specific implementation of HTTP server and Go scheduler.
And there are other HTTP servers in Go that focus on speed. It's just 99.9% of Go servers don't need any of that because the baseline is 10x faster than python/ruby/javascript and on-par with Java or C#.
But that's not comparing apples to apples. When you get a dramatic speedup, you will also see big drops in the P99 and P99.9 latencies because what stressed out the scripting language is a yawn to a compiled language. Just going from stressed->yawning will do wonders for all your latencies, tail latencies included.
That doesn't say anything about what will happen when the load increases enough to start stressing the compiled language.
Of course, expecting you to provide the link would be incredibly onerous. We can look it up ourselves just as easy as you can. Well, in theory we can. The only trouble is that I cannot find the issue you are talking about. I cannot find any issues in the Go issue tracker from your account.
So, in the interest of good faith, perhaps you can help us out this one time and point us in the right direction?
That being said, I love studying go and learning how to use it to the best of my ability because I work on sub-ųs networking in go.
When I get home, I’ll dig it up. But if you think it’s a fair scheduler, I invite you to just think about it on a whiteboard for a few minutes. It’s nowhere near fair and should be self-evident from first principles alone.
There are also multiple issues about this on GitHub.
And an open issue that is basically been ignored. golang/go#51071
Like I said. Go won’t fix this because they’ve optimized for throughput at the expense of everything else, which means higher tail latencies. They’d have to give up throughput for lower latency.
It doesn't look ignored to me. It explains that the test coverage is currently poor, so they are in a terrible position of not being able to make changes until that is rectified.
The first step is to improve the test coverage. Are you volunteering? AI isn't at a point where it is going to magically do it on its own, so it is going to take a willing human hand. You do certainly appear to be the perfect candidate, both having the technical understanding and the need for it.
There is unlikely anyone on the Go team with more political clout in this particular area than the one who has already reached out to you. You obviously didn't respond to him publicly, but did he reject your offer in private? Or are you just imaging some kind of hypothetical scenario where they are refusing to talk to you, despite evidence to the contrary?
FTFY