No, they do bypass it. I don't know what "Technical Program Managers at Google" do but they don't seem to be using a lot of Erlang it seems ;-). ETS tables can be modeled as a process which stores data and then replies to message queries. Every update and read is equivalent to sending a message. The terms are still copied (see note * below). You're not going to read half a tuple and then it will mutate underneath as another process updates it. Traversing an ETS table is logically equivalent to asking a process for individual key-values using regular message passing.
What is different is what these are optimized for. ETS tables are great for querying and looking up data. They even have a mini query language for it (https://www.erlang.org/doc/apps/stdlib/qlc.html). Persistent terms are great for configuration values. None of them break the isolated heap and immutable data paradigm, they just optimize for certain access patterns.
Even dictionary fields they mention, when a process reads another process' dictionary it's still a signal being sent to a process and a reply needing to be received.
* Immutable binary blocks >64B can be referenced, but they are referenced when sending data using explicit messages between processes anyway.
ETS is not a process that responds to messages, you have to wrap it in a process and do the messages part yourself.
Process dictionary: i am pretty sure that's a process_info bif that directly queries the vm internal database and not a secret message that can be trapped or even uses the normal message passing system.
I didn't say it's implemented as a process but works as if it where logically. Most terms (except literals and the binary references) are still copied just like when you send message. You could replace it behind the scenes with a process and it would act the same. Performance-wise it won't be the same, and that's why they are implemented differently but it doesn't allow sharing a process heap and you don't have to do locks and mutexes to protect access to this "shared" data.
> i am pretty sure that's a process_info bif that directly queries the vm internal database and not a secret message that can be trapped or even uses the normal message passing system.
I specifically meant querying the dictionary of another process. Since it's in the context of "erlang is violating the shared nothing" comment. In that case if we look at https://www.erlang.org/doc/system/ref_man_processes.html#rec... we see that process_info_request is a signal. A process is sent a signal, and then it gets its dictionary entries and replies (note the difference between messages and signals there).
Nobody argues that any of these approaches is a silver bullet for all concurrency problems. Indeed most of the problems of concurrency have direct equivalents in the world of single-threaded programming that are typically hard and only partially solved: deadlocks and livelocks are just infinite loops that occur across a thread boundary, protocol violations are just type errors that occur across a thread boundary, et cetera. But being able to rule out some of these problems in the happy case, even if you have to deal with them occasionally when writing more fiddly code, is still a big win.
If you have an actor Mem that is shared between two other actors A and B then Mem functions exactly as shared memory does between colocated threads in a multithreaded system: after all, RAM on a computer is implemented by sending messages down a bus! The difference is just that in the hardware case the messages you can pass to/from the actor (i.e. the atomicity boundaries) are fixed by the hardware, e.g. to reads/writes on particular fixed-sized ranges of memory, while with a shared actor Mem is free to present its own set of software-defined operations, with awareness of the program's semantics. Memory fences are a limited way to bring that programmability to hardware memory, but the programmer still has the onerous and error-prone task of mapping domain operations to fences.
> The big win for the actor model is (just) that it linearizes all operations on a particular substate of the program while allowing other actors' states to be operated on concurrently.
Came here to say exactly those two things. Your comment is as clear as it could be.
Default timeout is 5 seconds. You need to set explicit infinity timeout to not have one.
Sometimes I think there should be a list of sane and tested production configs: default rpc timeout, default backoff exponent, default initial backoff, default max backoff, health check frequency, health check timeout, process restart delay, process restart backoff, etc…
I thought it was obviously wrong. Server A calls Server B, and Server B calls server A. Because when I read the code my first thought was that it is circular. Is it really not obvious? Am I losing my mind?
The mention of `persistent_term` is cool.
Though they do say that race conditions are purely mitigated by discipline at design time, but then mention race conditions found via static analysis:
> Maria Christakis and Konstantinos Sagonas built a static race detector for Erlang and integrated it into Dialyzer, Erlang’s standard static analysis tool. They ran it against OTP’s own libraries, which are heavily tested and widely deployed.
> They found previously unknown race conditions. Not in obscure corners of the codebase. Not in exotic edge cases. In the kind of code that every Erlang application depends on, code that had been running in production for years.
I imagine that the 4th issue of protocol violation could possibly be mitigated by a typesafe abstracted language like Gleam (or Elixir when types are fully implemented)
If these race conditions are in code that has been in production for years and yet the race conditions are "previously unknown", that does suggest to me that it is in practice quite hard to trigger these race conditions. Bugs that happen regularly in prod (and maybe I'm biased, but especially bugs that happen to erlang systems in prod) tend to get fixed.
> Bugs that happen regularly in prod
It depends on how regular and reproducible they are. Timing bugs are notoriously difficult to pin down. Pair that with let-it-crash philosophy, and it's maybe not worth tracking down. OTOH, Erlang has been used for critical systems for a very long time – plenty long enough for such bugs to be tracked down if they posed real problems in practice.
Now, if it crashes every 10 years, that is regular, but I think the meaning is that it happens often. Back when I operated a large dist cluster, yes, some rare crashes happened that never got noticed or the triage was 'wait and see if it happens again' and it didn't happen. But let it crash and restart from a known good state is a philosophy about structuring error checking more than an operational philosophy: always check for success and if you don't know how to handle an error fail loudly and return to a good state to continue.
Operationally, you are expected to monitor for crashes and figure out how to prevent them in the future. And, IMHO, be prepared to hot load fixes in response... although a lot of organizations don't hot load.
The content is good and interesting though. Just hard to wade through with all the thorny LLM bushes getting in the way.
Looks like the author had a draft with the core content and ideas and asked an LLM to embellish it. Maybe because author wasn’t confident in their writing skills? Whatever the reason, I’d honestly prefer something human-written.
"Erlang is the strongest form of the isolation argument, and it deserves to be taken seriously, which is why what happens next matters."
It doesn't add much, and it has this condescending and pretentious LLM tone. For me as a reader, it distracts from an otherwise interesting article.
For the rest - pure untyped actors come with a lot of downsides and provoke engineers to make systems unnecessarily distributed (with all the consistency and timeout issues). There aren't that many problems which can be mapped well directly to actors. I personally find async runtimes with typed front-ends (e.g. Cats/ZIO in Scala, async in Rust, etc) much more robust and much less error-prone.
Livelock is something like you've got 1000 nodes that all want to do X, which requires an exclusive lock and the method to get an exclusive lock is:
Broadcast request to cluster
If you got the lock on all nodes, proceed
If you get the lock on all nodes, release and try again after a timeout
This procedure works in practice, when there is low contention. If the cluster is large and many processes contend for the lock, progress is rare. It's not impossible to progress, so the system is not deadlocked; but it takes an inordinate amount of time, mostly waiting for locks: the system is livelocked. In this case, whenever progress happens, future progress is easier.
This is a rough description of an actual incident with nodes joining pg2, I think around 2018... the new pg module avoids that lock (and IMHO, the lock was not needed anyway; it was there to provide consistent order in member lists across nodes, but member lists would no longer be consistent when dist distonects happened and resolved, so why add locks to be consistent sometimes). As an Erlang user with I think the largest clusters anywhere, we ran into a good number of these kinds of things in OTP. Ericsson built dist for telecom switches with two nodes in a single enclosure in a rack. It works over tcp and they didn't put explicit limits, so you can run a dist cluster with thousands of nodes in locations across the globe and it mostly works, but there will be some things to debug from time to time. Erlang is fairly easy to debug... All the new nodes have a process waiting to join pg2, what's the pg2 process doing, why does that lock not have the consensus building algorithm, can we add it? In the meantime, let's kill some nodes so others can progreas and then we'll run a sequenced start of the rest.
Sure, if you design a shit system that depends on ETS for shares state there are dangers, so maybe don’t do that?
I’d still rather be writing this system in erlang than in another language, where the footguns are bigger.
Hmm....
> Erlang is the strongest form of the isolation argument, and it deserves to be taken seriously, which is why what happens next matters.
OK I think I know who wrote this.
> The problem isn’t that developers write circular calls by accident. It’s that deadlock-freedom doesn’t compose.
Is there a need to regugriate it in this format? "two protocols that are individually deadlock-free can still combine to deadlock in an actor system." This is the actually meaningful part.
> Forget to set a timeout on a gen_server:call?
People have pointed out its factually wrong in the thread. Eh
> This is the discipline tax. It works when the team is experienced, the codebase is well-maintained, and the conventions are followed consistently. It erodes when any of those conditions weaken, and given enough time and enough turnover they do.
I know this is an LLM tell, but can't point out. It makes me uneasy to read this. Maybe the rule of three? Maybe the reguggeiation of a elementary SE concept in between a technical description? Maybe because it's tryhard to sound smart? All three I guess.
I could go on, but sigh, man don't use these clankers to write prose. They're like negative level gzip compression.
Isn't it more like, message passing is a way of constraining shared memory to the point where it's possible for humans to reason about most of the time?
Sort of like rust and c. Yes, you can write code with 'unsafe' in rust that makes any mistake c can make. But the rules outside unsafe blocks, combined with the rules at module boundaries, greatly reduce the m * n polynomial complexity of a given size of codebase, letting us reason better about larger codebases.
And with the tiny team working on it, it has remarkable performance.
https://www.dragonflybsd.org/performance/
That's a good way to look at it. A processes's mailbox is shared mutable state, but restrictions and conventions make a lot of things simpler when a given process owns its statr and responds to requests than when the requesters can access the state in shared memory. But when the requests aren't well thought out, you can build all the same kinds of issues.
Let's say you have a process that holds an account balance. If requests are deposit X or withdrawl Y, no problem (other than two generals). If instead requestors get balance, adjust and then send a set balance, you have a classic race condition.
ETS can be mentally modeled as a process that owns the table (even though the implementation is not), and the same thing applies... if the mutations you want to do aren't available as atomic requests or you don't use those facilities, the mutation isn't atomic and you get all the consequences that come with that.
Circular message passing can be an easy mistake to make in some applications, too.
the API models it that way, so i'd say its a bit more than just a mental model.
The main purpose of synchronization is creating happens-before (memory/cache coherence) relationships between lines of code that aren't in the same program order. Go channels are just syntactic sugar for creating these happens-before relationships. Problems such as deadlocks and races (at least in the way that TFA calls them out) are irreducible complexity if you're executing two sequences of logical instructions in parallel. If you're passing data in whatever way, there is no isolation between those two sequences. All you can enforce is degrees of discipline.
It's typical AI slop. I'd recommend for the author (or anyone else) to watch Jenkov's course[1] first if they have an honest interest in the topic.
[1] https://www.youtube.com/playlist?list=PLL8woMHwr36EDxjUoCzbo...