I’ve also written a more detailed writeup here: https://imtomt.github.io/ymawky/
Maybe it's finally time to move on from being a career programmer.
Congratulations to the OP for the accomplishment.
In a comparison between a similar fork-per-connection server written in C and this, I would imagine the throughput would be about the same, because the bottleneck in this model is fork() itself rather than the actual code. It probably matters more for binary size and startup time than requests/sec. Would be fun to actually benchmark, though.
Humbling.
In general, stable syscall numbers are just a Linux thing. Everyone else uses blessed system libraries
Really cool project though!
Modularizing it into multiple files was easier than I expected it to be, you basically have other functions/labels in other files, and mark them as .global at the top. The Makefile compiles each file into their own .o, which you then link all together. You can "b" or "bl" to any label from any other file, as long as it's global and linked together. Same with data in .bss or .data, mark them as .global and they can be accessed from elsewhere.
I had a very hard time simply using and even utilizing C++ or Java.
C and Turbo Pascal especially was easier because the compiled code was very much resembling to hand written code.
As the author described, you can do in 4.000 lines what others can do with way less pain in 100.
So you build macros, come up with your own library and in the end you kind of build a meta language build on top of assembly because some lines are so hard to grasp that you delegate working code into a library for reuse.
It is funny how much we take conventions for numbers for granted. If you happen to know assembly and its intricacies you immediately will learn to work with a sign bits which mark negative numbers. But how do you know? Maybe you use the whole addressable space only for positive numbers.
Small things that make a huge different.
Nice article, I enjoyed your adventures and would do the same.
Higher level languages are more convenient for 99% of things, but the directness of Assembly gives me a rush unlike any other. I didn't live through the C64/Amiga, but I was obsessed with old C64/ZX emulators growing up.
The last time I did anything in assembler was x86 under DOS. Your code makes ARM64 with a modern OS less scary than I thought it would be.
jk. Metal as fuck. Love it.
Nothing beats Go.
When you use HTMLX (goat) + sqlc (goat) + pgx (another goat) + Chi (yet another goat) and Sqlite (goat).
Most apps will not need anything more than Sqlite, i've several sqlite apps doing a couple of million visits per day.
Compiles to signal binary blazingly fast.
Deploy using systemd service, capture logs with alloy / Loki graphana setup, set up alerts and monitoring and go home.
And you can serve millions of requests on a server with 512MB RAM.
I don't think you'd ever need more speed than this.
Everything else is bloated, slow and doesn't give you enough room for optimization.
Here's the latency of one of my hobby projects (network latency not included): https://i.ibb.co/hJ6FQtyw/d3d6c9d15765.png
Request rate: https://i.ibb.co/Fq80nfJ4/67fcdbdb7491.png
It's running in US and EU (helps avoid atlantic routrip tax), in this one i am doing some 100s of checks, not simple CRUD work. With Go you can optimize a lot without complexity of Rust.
I didn't need to implement an Intel RDRAND streamer in C and assembler, but it was a ton of fun: https://github.com/ehbar/rdrand-stream
OP, I really liked this project. Kudos for publishing it!
I'm sure I'm not the only one who has fantasized about doing something like this as a self-soothing enterprise. Kudos to you for actually doing it!
As for why it wouldn't run on Linux, there are some pretty big differences in the actual assembly. One pretty superficial difference is calling conventions -- MacOS uses the x16 register for syscall numbers, Linux uses x8. Calling the kernel in Mac uses "svc #0x80", in Linux it's "svc #0". That's ~120 lines that need to be replaced, but easy enough to just use sed. Syscall numbers are all different, as are the struct layouts for sigaction(), MacOS has an "sa_tramp" field that Linux doesn't have. Enforcing max processes is done here using the MacOS-specific proc_info() syscall, which can be used to get the number of children any given process has. Linux doesn't have an equivalent, so process tracking would need to be done differently. Finally, Linux has the getdents64() syscall, rather than getdirentries64(), which uses a different struct and is called differently.
I'm sure an LLM could make all those changes, but it's a pretty large codebase, so it would probably make some mistakes or miss things.
Today, I just think, "how long would LLMs have taken to write this?"
I mourn the death of a human artform.