In a lot of ways it’s like investigations into airplane crashes.
you are able to be paranoid on your own just fine
About 4 hours before the grid collapse on the 28th of April 2025 was recorded the largest purchase of Monero in the past 3 years (to remember: monero is coin of choice for special operations), making it surge +40% in 24 hours. The initial Spanish reports mentioned conflicting power information from dozens of locations at the same time which is consistent with a sequential attack using the blinkencity method so the grid itself is forced to close down.
How to explain that to non-engineers is another problem.
As the years went on, the bridge's weight capacity was slowly eroded by subsequent construction projects like adding thicker concrete deck overlays, concrete median barriers and additional guard rail and other safety improvements. This was the second issue, lining up with the first issue of thinner gusset plates.
The third issue that lined up with the other two was the day of the bridges failure. There were approximately 300 tons of construction materials and heavy machinery parked on two adjacent closed lanes. Add in the additional weight of cars during rush hour when traffic moved the slowest and the bridge was a part of a bottleneck coming out of the city. That was the last straw and when the gusset plates finally gave way, creating a near instantaneous collapse.
https://devblogs.microsoft.com/oldnewthing/20080416-00/?p=22...
"You’ve all experienced the Fundamental Failure-Mode Theorem: You’re investigating a problem and along the way you find some function that never worked. A cache has a bug that results in cache misses when there should be hits. A request for an object that should be there somehow always fails. And yet the system still worked in spite of these errors. Eventually you trace the problem to a recent change that exposed all of the other bugs. Those bugs were always there, but the system kept on working because there was enough redundancy that one component was able to compensate for the failure of another component. Sometimes this chain of errors and compensation continues for several cycles, until finally the last protective layer fails and the underlying errors are exposed."
I had one of those, customer is adamant latest version broke some function, I check related code and it hasn't been touched for 7 years, and as written couldn't possibly work. I try and indeed, doesn't work. Yet customer persisted.
Long story short, an unrelated bug in a different module caused the old, non-functioning code to do something entirely different if you had that other module open as well, and the user had disciverdd this and started relying on this emergent functionality.
I had made a change to that other module in the new release and in the process returned the first module to its non-functioning state.
The reason they interacted was of course some global variables. Good times...
That wasn't really a result of an alignment of small weaknesses though. One of the reasons that whole thing was of particular interest was Feynman's withering appendix to the report where he pointed out that the management team wasn't listening to the engineering assessments of the safety of the venture and were making judgement calls like claiming that a component that had failed in testing was safe.
If a situation is being managed by people who can't assess technical risk, the failures aren't the result of many small weaknesses aligning. It wasn't an alignment of small failures as much as that a component that was well understood to be a likely point of failure had probably failed. Driven by poor management.
> Fukushima
This one too. Wasn't the reactor hit by a wave that was outside design tolerance? My memory was that they were hit by an earthquake that was outside design spec, then a tsunami that was outside design spec. That isn't a number of small weaknesses coming together. If you hit something with forces outside design spec then it might break. Not much of a mystery there. From a similar perspective if you design something for a 1:500 year storm then 1/500th of them might easily fail every year to storms. No small alignment of circumstances needed.
> [Fukushima] No small alignment of circumstances needed.
The tsunami is what initiated the accident, but the consequences were so severe precisely because of decades of bad decisions, many of which would have been assumed to be minor decisions at the time they were made. E.g.
- The design earthquake and tsunami threat
- Not reassessing the design earthquake and tsunami threat in light of experience
- At a national level, not identifying that different plants were being built to different design tsunami threats (an otherwise similar plant avoid damage by virtue of its taller seawall)
- At a national level, having too much trust in nuclear power industry companies, and not reconsidering that confidence after a number of serious incidents
- Design locations of emergency equipment in the plant complex (e.g. putting pumps and generators needed for emergency cooling in areas that would flood)
- Not reassessing the locations and types of emergency equipment in the plant (i.e. identifying that a flood of the complex could disable emergency cooling systems)
- At a company and national level, not having emergency plans to provide backup power and cooling flow to a damaged power plant
- At a company and national level, not having a clear hierarchy of control and objective during serious emergencies (e.g. not making/being able to make the prompt decision to start emergency cooling with sea water)
Many or all of these failures were necessary in combination for the accident to become the disaster it was. Remove just a few of those failures and the accident is prevented entirely (e.g. a taller seawall is built or retrofitted) or greatly reduced (e.g. the plant is still rendered inoperable but without multiple meltdowns and with minimal radioactive release).
Wind and solar are very far from dead, but they do need some adjustments - as the report makes clear.
When it's everybody's fault it's nobody's fault.
I don't know what will come of this report in the next months/years, I will keep an eye on it though, since I live in Spain :)
In my system's case, switching to this grid profile was just a software toggle.
Tesla’s Megapack system at the Hornsdale Power Reserve in Australia was the first example of this being proven out at scale in prod. Batteries everywhere, as quickly as possible.
The problem is that the line itself is a giant capacitor. It's charged to the maximum voltage on each cycle. Normally the grid loads immediately pulls that voltage down, and rotating loads are especially useful because they "resist" the rising (or falling) voltage.
So when the rotating loads went away, nothing was preventing the voltage from rising. And it looks like the sections of the grid started working as good old boost converters on a very large scale.
Anything that's not a spinning slug of steel produces AC through an inverter: electronics that take some DC, pass it through MOSFETs and coils, and spits out a mathematically pure sine wave on the output. They are perfectly controllable, and have no inertia: tell them tout output a set power and they happily will.
However, this has a few specific issues:
- infinite ramps produce sudden influx of energy or sudden drops in energy, which can trigger oscillations and trip safety of other plants
- the sine wave being electronically generated, physics won't help you to keep it in phase with the network, and more crucially, keep it lagging/ahead of the network
The last point is the most important one, and one that is actually discussed in the report. AC works well because physics is on our side, so spinning slugs or steel will self-correct depending on the power requirements of the grid, and this includes their phase compared to the grid. How out-of-phase you are is what's commonly called the power factor. Spinning slugs have a natural power factor, but inverter don't: you can make any power factor you want.
Here in the spanish blackout, there was an excess of reactive power (that is, a phase shift happening). Spinning slugs will fight this shift of phase to realign with the correct phase. An inverter will happily follow the sine wave measured and contribute to the excess of reactive power. The report outlines this: there was no "market incentive" for inverters to actively correct the grid's power factor (trad: there are no fines).
So really, more storage would not have helped. They would have tripped just like the other generators, and being inverter-based, they would have contributed to the issue. Not because "muh renewable" or "muh battery", but because of an inherent characteristic of how they're connected to the grid.
Can this be fixed? Of course. We've had the technology for years for inverters to better mimic spinning slugs of steel. Will it be? Of course. Spain's TSO will make it a requirement to fix this and energy producers will comply.
A few closing notes:
- this is not an anti-renewables writeup, but an explanation of the tech, and the fact that renewables are part of the issue is a coincidence on the underlying technical details
- inverters are not the reason the grid failed. but they're a part of why it had a runaway behavior
- yes, wind also runs on inverters despite being spinning things. with the wind being so variable, it's much more efficient to have all turbines be not synchronized, convert their AC to DC, aggregate the DC, and convert back to AC when injecting into the grid
The storage gives you operational and resiliency strength you cannot obtain with generators alone, because of how nimble storage is (advanced power controls), both for energy and grid services.
> Can this be fixed? Of course. We've had the technology for years for inverters to better mimic spinning slugs of steel. Will it be? Of course. Spain's TSO will make it a requirement to fix this and energy producers will comply.
This is synthetic inertia, and is a software capability on the latest battery storage systems. "There was no market mechanism to encourage the deployment of this technology in concert with Spain’s rapid deployment of solar and wind." from my top comment. This should be a hard requirement for all future battery storage systems imho.
Potential analysis of current battery storage systems for providing fast grid services like synthetic inertia – Case study on a 6 MW system - https://www.sciencedirect.com/science/article/abs/pii/S23521... | https://doi.org/10.1016/j.est.2022.106190 - Journal of Energy Storage Volume 57, January 2023, 106190
> Large-scale battery energy storage systems (BESS) already play a major role in ancillary service markets worldwide. Batteries are especially suitable for fast response times and thus focus on applications with relatively short reaction times. While existing markets mostly require reaction times of a couple of seconds, this will most likely change in the future. During the energy transition, many conventional power plants will fade out of the energy system. Thereby, the amount of rotating masses connected to the power grid will decrease, which means removing a component with quasi-instantaneous power supply to balance out frequency deviations the millisecond they occur. In general, batteries are capable of providing power just as fast but the real-world overall system response time of current BESS for future grid services has only little been studied so far. Thus, the response time of individual components such as the inverter and the interaction of the inverter and control components in the context of a BESS are not yet known. We address this issue by measurements of a 6 MW BESS's inverters for mode changes, inverter power gradients and measurements of the runtime of signals of the control system. The measurements have shown that in the analyzed BESS response times of 175 ms to 325 ms without the measurement feedback loop and 450 ms to 715 ms for the round trip with feedback measurements are possible with hardware that is about five years old. The results prove that even this older components can exceed the requirements from current standards. For even faster future grid services like synthetic inertia, hardware upgrades at the measurement device and the inverters may be necessary.
page 11 contains "Full root cause tree" - one image with all the high level info
We did have many many problems previously. The state of South Australia went out for a couple of weeks at one point in similar cascading failures. This doesn’t happen anymore. In fact the price of electricity is falling and the grid is more stable now https://www.theguardian.com/australia-news/2026/mar/19/power...
This price drop is inline with the lowered usage of gas turbine peaker plants (isn’t that helpful right now? No need for blockaded gas for electricity).
A lot of people say it can’t be done. That you can’t have free power during the day (power is free on certain plans during daylight due to solar power inputs dropping wholesale prices to negative) and that you can’t build enough storage (still not there but the dent in gas turbine usage is clear).
It’s one of these cases where you’ve been lied to. Australia elected a government that listened to reports battery+solar is great for grid reliability and nuclear was always going to be more expensive.
But they're not really complementary in that one can't fill in for the gaps in the other. So the case for new nuclear gets more and more uneconomic the more cheap renewables we deploy.
Or at least nuclear would if it was cheap, but since its costs haven't fallen the same way that the costs of other energy did... well new nuclear buildout really doesn't have a good role at all right now, it's just throwing away money.
Solar and nuclear complement eachother fine - because their shortfalls (darkness for solar, high demand for nuclear) are mostly uncorrelated... a mix of non-dispatcahble power with uncorrelated shortfalls helps minimize the amount of dispatchable power you need... but batteries have made it cheap enough to transform non-dispatchable power to dispatchable power that nuclears high costs really aren't justifiable.
This is the moment were at the news you read "There's a drought because it isn't raining" and similar excuses, when in reality your five years of water's reservoirs become reduced to half -or one third- due they focused the electricity production over the population real water demand.
I mean, hydroelectric needs at least two level’s reservoirs, one to generate electricity (or even exclusive two level's reservoirs with water pumps for this), and the next one, absolutely untouchable by the electric companies, targeted as water storage for the population/agriculture, the classic more than five years reservoir, for real.
The report you mean (csiro) was wildly biased though. They based their nuclear power cost estimate on a nuclear reactor that was never deployed anywhere (Nuscale) instead of "normal" nuclear power plants that have been deployed for decades.
Large scale nuclear $155-$252/MWh.
Solar PV and wind with storage $100-150/MWh.
https://www.abc.net.au/news/2024-05-22/nuclear-power-double-...I find it funny when people get outraged because all CSIRO does is use real world construction costs easily proving how unfathomably expensive new built nuclear power is.
I immediately rebooked the same hotel, but when we got back there the receptionist had left so you had to check in over the phone instead. Except WhatsApp wasn't working. Then mobile data went down. And before long we were walking through the old town going hostel to hostel looking for a place to sleep, as everything got darker and darker (due to the lack of powered street lighting). The old town in almost pitch black was pretty scary!
We ended up breaking back into the hotel, borrowing a bunch of towels from a laundry cart in the hallway and sleeping in this lockable room we found in the basement.
Besides that somewhat stressful part, it was a really strange but fun experience to see the city without power: no traffic lights, darkened shops with lots of phone lights, cafés still operating just with only outdoor seating and limited menus, the occasional loud generator, and most of all the people seemingly having a great time in spite of it.
I would've loved to have stayed out all night exploring the city, but finding somewhere to sleep that night was a bit more pressing!