Sunsetting the Techempower Framework Benchmarks
69 points by nbrady 20 hours ago | 20 comments

idoubtit 12 hours ago
I've contributed a few optimisations to some implementations in these benchmarks, but as I read the code of many other implementations (and some frameworks) I lost most of the trust I had in these benchmarks.

I knew that once a benchmark is famous, people start optimising for it or even gaming it, but I didn't realise how much it made the benchmarks meaningless. Some frameworks were just not production ready, or had shortcuts made just for a benchmark case. Some implementations were supposed to use a framework, but the code was skewed in an unrealistic way. And sometimes the algorithm was different (IIRC, some implementation converted the "multiple sql updates" requirements into a single complex update using CASE).

I would ignore the results for most cases, especially the emerging software, but at least the benchmarks suggested orders of magnitudes in a few cases. I.e. the speed of JSON serialization in different languages, or that PHP Laravel was more or less twice slower than PHP Symfony which could be twice slower than Rails.

reply
brightball 10 hours ago
This was also my experience.
reply
ezekiel68 5 hours ago
I found a lot of value in these benchmarks and evangelized about them at my various employers over the years. Almost any enterprise is interested in lowering their cloud compute costs. Riddle me this: other than rotating out stale logs in cloud object storage or blocking malicious bandwidth drains from cloud CDN, what intervention lowers non-AI cloud costs more effectively than using a web service stack that requires dramatically fewer CPU and RAM resources while maintaining a high, error-free request rate?

A lot of handwaving about hAx in the benchmarks but many of these claims are from people who got their information secondhand (or worse). Actually reading code from the top submissions in the techempower/FrameworkBenchmarks repo (organized neatly under the frameworks/ directory hive) yielded valuable insigthts for me:

* Pipelining SQL requests has a massive effect on RPS for web services that will access SQL databases

* A well-maintained HTTP2/HTTP3 web server written in c named h2o is relevant in 2026, even if it is used as a proxy that delegates business logic to simpler web service workers written in Rails or in python 3 (via Gunicorn)

* For web services that write to a SQL database, the Axum rust stack, now with a healthy ecosystem of middleware modules, may provide up to twice the RPS as the Spring (Java) stack (externally discovered: at lower CPU and much lower RAM usage)

* Even frameworks written in JS (hyperexpress, just-js) or python (aiohttp) can vault into the realm of top-10 performers if they leverage OS-level asynchronous IO and SQL pipelining.

reply
WatchDog 17 hours ago
I really liked these benchmarks, and would check in with them from time to time.

No benchmark is perfect, but these ones cover such a wide variety of different languages and frameworks, it's a good resource for getting a rough idea of the kind of performance that a given stack is capable of.

I don't know much about TechEmpower the company, it seems to be a small consultancy, maintaining this project probably takes non insignificant resources from them.

The end of the project seems kind of unceremonious, but they don't owe anything to anyone.

Hopefully an active fork emerges.

reply
silisili 16 hours ago
It's cool in a 'how much can you tune it' kind of way, but has little practical value. Most sites would be tickled with a 4 digit requests per second number, so does it matter if your chosen framework does 50k/sec or 3 million/sec? Not really.

I think the biggest problem was it just had too many entries, most of which seem tuned to cheating benchmarks. Would probably be more valuable just choosing the top 3 by popularity from the top 15 languages or so.

reply
fredrikholm 12 hours ago
> too many entries, most of which seem tuned to cheating benchmarks

Even for entries that didn't cheat, the code was sometimes unidiomatic in the sense that "real programmers can write Fortran in any language".

This[0] article articulates the issue with by highlighting an ASP.NET implementation that was faster than more 'honest' Java/Go implementations primarily by not using ASP.NET features, skirting some philosophical line of what it means to use something.

For me, the more interesting discussion of whether a language/library is faster/leaner than another exists in actual idiomatic use. In some languages you are actively sweating over individual allocations; in some you're encouraged to allocate collections and immediately throw them away. Being highly concerned with memory and performance in the latter type of language happens, but is seldom the dominant approach in the larger ecosystem.

[0] https://dusted.codes/how-fast-is-really-aspnet-core

reply
c0wb0yc0d3r 9 hours ago
For anyone wondering, the ASP.NET Core benchmark applications appear to be largely the same.

However it also appears that as of the last benchmark (round 23), “aspnetcore“ has fallen to 35on the fortunes leaderboard. The code for that result, really just uses kestrel. It doesn’t even import any of the usual ASP.NET Core NuGet packages, just what’s provided by the web sdk. [0]

[0]: https://github.com/TechEmpower/FrameworkBenchmarks/blob/57d9...

reply
re-thc 11 hours ago
> most of which seem tuned to cheating benchmarks

The fix would have been requiring tests to catch the cheating. There were suggestions but it didn't happen.

It was definitely possible to catch not having sent date headers (or caching them) etc.

reply
xnorswap 10 hours ago
I always liked these benchmarks, I've been following them since the earliest rounds.

One thing to note is how much things have improved over that time. Numbers that used to top the benchmarks would now be seen as "slow" compared to the top performers.

The other useful thing about these benchmarks is being able to easily justify the use of out of the box ASP.NET Core.

For many languages, the best performers are custom frameworks and presumably have trade-offs versus better known frameworks.

For C# the best performing framework (at least for "fortunes") is aspnet-core.

That side-steps a lot of conversations that might otherwise drag us into "Should we use framework X or Y" and waste time evaluating things.

Are the benchmarks gamed? Yes of course, the code might not even be recognisable as Asp.NET Core to me, but that doesn't really matter if I can use it as an authoritative source to fend off the "rewrite in go" crowd, and it doesn't matter that it is gamed, because the real-world load is many orders of magnitude less than these benchmarks demonstrate is possible.

reply
nchmy 3 hours ago
My first thought is "good riddance". Not only were the benchmarks surely gamed by many frameworks, but it was my impression that the benchmarks didn't even really reflect any real world application - which have plenty of i/o and compute. Moreover, ain't nobody receiving 1000 (let alone 100k) rps.
reply
mseepgood 18 hours ago
This text lacks information about why it is being sunset.
reply
bob1029 12 hours ago
Maintaining something like this is probably a little bit stressful.

We all know some of us take our language and framework choices as seriously as religion. I wouldn't be surprised if there was a lawsuit involved.

reply
cies 9 hours ago
Indeed. It's weird they write so much with addressing the elephant.

So lets discuss it...

From the start I thought that the TechEmpower Benchmarks were testing all the metrics the JVM is good at, and non the JVM is bad at (mainly: memory usage, start-up time, container size). I got the idea back then than they were a JVM shop (could not confirm this on their current website).

Lately the JVM contenders are not longer at the top. And the benchmark contains many contenders with highly optimized implementations that do not reflect real life use.

reply
dom96 10 hours ago
Sad to see this. I had so much fun implementing a http server (called httpbeast) from scratch to get as far up these benchmarks as possible.

I do agree with others here that it was possible to game them, but it still gave a good indication of the performance bracket a language was in (and you could check if interpreted languages were cheating via FFI pretty easily).

Feels like the end of an era.

reply
dzonga 10 hours ago
well done to the techempower team for the work done.

though the benchmarks were not exactly 100% accurate - they gave good estimates on how different frameworks / perform in handling web tasks.

they also helped people move to simpler / lighter web frameworks that are more performant and kind helped usher in the typical 'Sinatra/express' handlers for most web frameworks e.g .net core

they also showed the performance hit of ORMs vs RAW. so yeah well done.

reply
narrator 16 hours ago
Engineering has kind of moved on in a weird way from web frameworks. Now AI just writes document.getElementById('longVariableName') javascript and straight SQL without complaining at all. The abstraction isn't as important as it used to be because AI doesn't mind typing.
reply
re-thc 11 hours ago
> Now AI just writes document.getElementById('longVariableName') javascript and straight SQL without complaining at all

I got a newer model that bypasses all that. It takes out Wireshark and send bytes straight.

reply
GrooveSAN 15 hours ago
Would you know any alternative?
reply
jerf 7 hours ago
The primary alternatives are:

One, you don't need this. The vast majority of people working on the web are now so thoroughly overserved by their frameworks, especially the way that benchmarks like this measured only the minimal overhead the frameworks could impose, that measuring your framework on how many nanoseconds per request it consumes (I think time per request is a more sensible measure than request per time) is quintessential premature optimization. All consulting a table like this does for the vast majority of people is pessimize their framework choices by slanting them in the direction of taking speed over features when in fact they are better served by taking features over speed.

Two, you are performance bound, in which case, these benchmarks still don't help very much, because you really just have to stub out your performance and run benchmarks yourself, because you need to holistically analyze the performance of your framework, with your database, with any other APIs or libraries you use, to know what is going to be the globally best solution. Granted, not starting with a framework that struggles to attain 100 requests per second can help, but if you're in this position and you can't identify that sort of thing within minutes of scanning their documentation you're boned anyhow. They're not really that common anymore.

This sort of benchmark ranges from "just barely positive" value to a significant hazard of being substantially negative if you aren't very, very careful how you use the information.

Framework qua framework choice doesn't matter much anymore. It's dominated by so, so many other considerations, as long as you don't take the real stinkers.

reply
andymors 7 hours ago
there are some, I've seen this one which is new https://mda2av.github.io/HttpArena/
reply
nbrady 20 hours ago
[dead]
reply
derodero24 9 hours ago
[dead]
reply
andrewmcwatters 17 hours ago
[dead]
reply