If anyone here can't wait (as it looks like there's very little info on this at the moment..)
I wrote up detailed instructions for Ant Group's KVM-PVM patches. Performance is OK for background servers/tasks, but does take a hit up to 50% on complex builds like Kernels or Go with the K8s client.
DIY/detailed option:
https://blog.alexellis.io/how-to-run-firecracker-without-kvm...
Fully working, pre-built host and guest kernel and rootfs:
https://docs.slicervm.com/tasks/pvm/
I'll definitely be testing this and comparing as soon as it's available. Hopefully it'll be accelerated somewhat compared to the PVM approach. There's still no sign whether those patches will ever end up merged upstream in the Linux Kernel. If you know differently, I'd appreciate a link.
Azure, OCI, DigitalOcean, GCE all support nested virt as an option and do all take a bit of a hit, but it makes for very easy testing / exploration. Bare-metal on Hetzner now has a setup fee of up to 350 EUR.. you can find some stuff with 0 setup fee, but it's usually quite old kit.
Edit: this doesn't look quite as good as the headline.. Options for instances look a bit limited. Someone found some more info here: https://x.com/nanovms/status/2022141660143165598/photo/1
I don't understand what you are paying for here, nested virtualization doesn't need any extra setup for hardware compared to normal one
... or you are saying Hetzner wants 350 EUR for turning on normal virtualization option in BIOS ?
So they literally want money to fix what they fucked up the first time
https://www.hetzner.com/dedicated-rootserver/matrix-ex/ https://www.hetzner.com/dedicated-rootserver/matrix-ax/
Worst case I ever had a hard drive failed and I had to wait I think a week for OVH to physically replace it.
GCP has had nested virtualization for a while.
Renting a server from cheaper hosting providers can be massive savings but you now need to re-invent all of the AWS APIs you use or might use and it's big CAPEX time investment. And any new feature you need, whether that's queue, mail gateway or thousand other APIs need to be deployed and managed first before you can even start testing.
It's less work now than it was before just due to amount of tools there are to automate it but it's still more work that you could be spending on improving your product.
Or maybe you just never needed most of these in the first place. People got into this "AWS" mentality like it is the only way to do things. Everything had to be in a queue, event driven etc.
I'd argue not using AWS means simplifying things and it'll be less expensive not just in server cost but developer time.
https://techcommunity.microsoft.com/blog/azurecompute/scalin...
(I work there)
Nested virtualization can mean a lot of things. Not just full VMs.
Good use-case for what?
The technical details are a lot more complex than most realize.
Single level VMX virtualization is relatively straightforward even if there are a lot of details to juggle with VMCS setup and handing exits.
Nested virtualization is a whole another animal as one now also has to handle not just the levels but many things the hardware normally does, plus juggling internal state during transitions between levels.
The LKML is filled with discussions and debates where very sharp contributors are trying to make sense of how it would work.
Amazon turning the feature on is one thing. It working 100% perfectly is quite another…
The more interesting signal is that AWS is restricting this to 8th-gen Intel instances only (c8i/m8i/r8i). They're likely leveraging specific microarchitectural improvements in those chips for VMCS shadowing — picking the hardware generation where they can guarantee their reliability bar rather than enabling it broadly and dealing with errata on older silicon. That's actually the careful engineering approach you'd want from a cloud provider.
AWS is just late to the game because they've rolled so much of their own stack instead of adapting open source solutions and contributing back to them.
This is emphatically not true. Contributing to KVM and the kernel (which AWS does anyway) would not have accelerated the availability.
EC2 is not just a data center with commodity equipment. They have customer demands for security and performance that far exceed what one can build with a pile of OSS, to the extent that they build their own compute and networking hardware. They even have CPU and other hardware SKUs not available to the general public.
If my sources are correct, GCP did not launch on dedicated hardware like EC2 did, which raised customer concerns about isolation guarantees. (Not sure if that’s still the case.) And Azure didn’t have hardware-assisted I/O virtualization ("Azure Boost") until just a few years ago and it's not as mature as Nitro.
Even today, Azure doesn’t support nested virtualization the way one might ordinarily expect them to. It's only supported with Hyper-V on the guest, i.e., Windows.
> While nested virtualization is technically possible while using runners, it is not officially supported. Any use of nested VMs is experimental and done at your own risk, we offer no guarantees regarding stability, performance, or compatibility.
https://docs.github.com/en/actions/concepts/runners/github-h...
This is really big news for micro-VM sandbox solutions like E2B, which I work on.
I remember playing with nested virty some years ago and deciding it is a backwards step except for PoC and the like. Given I haven't personally run out of virty gear, I never needed to do a PoC.
The place I've probably wanted it the most though is in CI/CD systems: it's always been annoying to build and test system images in EC2 in a generic way.
It also allows for running other third party appliances unmodified in EC2.
But also, almost every other execution environment offers this: GCP, VMWare, KVM, etc, so it's frustrating that EC2 has only offered it on their bare metal instance types. When ec2 was using xen 10+ years ago, it made sense, but they've been on kvm since the inception of nitro.
Basically you setup a small LAN with HyperV or something similar (I have only done it with HyperV)
There is no real reason to use it on hardware you own; but in case of cloud you just not always have enough to do to excuse paying for whole entire server
If EC2 were like your home server, you might be right. And an EC2 bare metal instance is the closest approximation to that. On bare metal, you've always been free to run your own VMs, and we had some customers who rolled their own nested VM implementations on it.
But EC2 is not like your home server. There are some nontrivial considerations and requirements to offer nested virtualization at cloud scale:
1. Ensuring virtualized networking (VPC) works with nested VMs as well as with the primary VM
2. Making sure the environment (VMM etc) is sufficiently hardened to meet AWS's incredibly stringent security standards so that nesting doesn't pose unintended threats or weaken EC2's isolation properties. EC2 doesn't use libvirt or an off-the-shelf KVM. See https://youtu.be/cD1mNQ9YbeA?si=hcaZaV2W_hcEIn9L&t=1095 and https://youtu.be/hqqKi3E-oG8?si=liAfollyupYicc_L&t=501
3. Ensuring performance and reliability meets customer standards
4. Building a rock-solid control plane around it all
It's not a trivial matter of flipping a bit.
Thanks for the well-reasoned response.
A few of the best technical presentations that I've watched were at a pre-SKO event. Nitro, Graviton and Firecracker.
Great engineering pieces, the three of them.
and in Xen (which they used to run) for at least as long
* you are right, it just works
* but there were scary notes about the stuff which might happen when you live migrate a virtual machine between hypervisors and the machine has nested virtual machines inside it. I remember the words "neither safe nor secure"
Specifically, in this case: https://github.com/aws/api-models-aws/commit/8bca88a33592ca4...
I don't know if this applies to the specific nested virtualisation AWS are providing though.
pure CPU should be essentially unaffected, if they're not emulating the MMU/page tables in software
the difference in IO ranges from barely measurable to absolutely horrible, depending on their implementation
traps/vmexits have another layer to pass through (and back)
spoiler though: I'm referencing the part where gimp is running in Wine running in asm.js in a Chrome browser running in another asm.js in Firefox
Remember, “customer obsession”.
But “protect revenue first”.
You can tell people to just do something else, there's probably a separate natural solution, etc. but sometimes you're willing to sacrifice some peak performance just have that uniformity of operations and control.