This is a great idea and something I would love to use, but it's a lot less docker compatible and user friendly than it may seem.
For instance there is no way to automatically run the image's specified command, effectively leaving you with a dead VM: https://github.com/weaveworks/ignite/issues/874. You also simply can not share directories with the host (out of scope for Firecracker).
microVMs for running containers are definitely a great idea; another project aiming to do it (a little differently) is Kata Containers. It has a lot more industry support, can also run on Firecracker (though QEMU microVMs are just as good) and can (theoretically, I haven't gotten it to work) interact with many container runtimes that are not Docker (the focus appears to be on Kubernetes).
While more powerful, Kata Containers are also much, much more complicated (and honestly under- and misleadingly documented). Ignite could fill the role of something simpler and more easy to get started with. But please, if you aim to attract Docker users, actually make it compatible with Docker principles.
There's still a lot of unused potential in containers as VMs; Kata+Kubernetes probably currently realizes the most of it.
> Why not just make a base image and some simple setup scripts?
Two major reasons:
- Reusing the vast availability of prebuilt OCI images. I don't need to create scripts at all if I find a good image someone already created a Dockerfile for. Dockerfiles are scripts anyway, especially tailored for provisioning.
- Docker's main benefit is a friendly UI over low-level Linux primitives. In this case, Ignite offers a similarly familiar UI over Firecracker/KVM, which is also cumbersome to use directly.
> Why not just make a base image and some simple setup scripts?
That makes it possible for you to create an image (VM or container?, good start. But what Firecracker and containers solve is isolation and ease of distribution.
General advice: when smart people are investing time and energy into a solution, and your alternative solution involves the word "just", it's usually a code smell that you're not giving the people doing the work enough credit.
OTOH, smart people seem to constantly try to reinvent existing products or over-engineer tools to set themselves apart and create a new business. See the endless stream of new JS frameworks popping up every day. Or the myriad ways that people try to create clouds on top of clouds (Yo dawg! I heard you like clouds!). Or new databases (although I confess I follow these). Or reinventing static HTML page delivery. ... and it goes on and on. All of these are smart people investing time and energy into a "solution" ... where there may not actually be a problem to begin with.
> See the endless stream of new JS frameworks popping up every day
Citation needed. E.g.
2010: AngularJS, Backbone
2011: EmberJS, ReactJS
2012: MeteorJS
2013: HexoJS
2014: CycleJS, VueJS
2015: MithrilJS, PolymerJS, Serverless Framework
2016: Angular2, AureliaJS, NextJS, Svelte
Those are a lot of the most popular ones, and some not very popular. Do you really consider this an endless stream of new JS frameworks popping up every day?
Yes!! Thank you!! If anything webdev is very boring (in a good way) right now, React is a de facto standard with Angular and VueJs.
It's such an outdated stereotype. And it's especially funny considering how "greybeard oldschool" stuff like Linux distros, utils and packaging seem to have a lot more of an "endless stream of new stuff" right now.
I think I'm also a pretty smart person in certain areas, or at least I should be by now having been doing this stuff for almost 30 years.
While I definitely always want to check myself, it's also part of my job to help the newer/younger engineers remember the golden rule of business technology:
Innovate in the business domain, not in the technical one. Keep technology simple and boring.
It's great to experiment, and sometimes the results of the experiments are winners. But generally you can tell winners from losers based on whether they make everything less or more complex.
NodeJS, for example, reduced complexity because it's one language everywhere and has a very simple single threaded execution model. Now with es6 modules, commonjs fading, we simplify even more.
Rollup/vite reduce webpack's complexity, but will also be replaced as native solutions emerge.
REST reduced SOAP/xml-rpc's complexity.
HTML5 replaced flash (and most of the browsers plugins), because flash had grown too complex with a massive attack surface and was impossible to properly secure. Remember ActiveX controls that would load and run in the browser with full permissions? I do. I wrote several.
Linux as an app server reduced Windows' cost and complexity.
Anything that adds needless complexity to the stack, no matter how interesting, will end up being a fad and replaced within a few years.
So, wait. You’re saying the people who developed that squeeze juice thing or the drop of blood thing shouldn’t be criticized because it is obviously a stupid idea from the get-go? C’mon man, some of us have critical thinking skills and some of us will tell you when an idea is stupid or over-engineered.
Containers as VMs is usually peak over-engineering. It’s unnecessary most of the time. In fact, there’s only a few use-cases where it’s desired to have that isolation:
- different clock rates in the container than outside (deliberate skew)
- high security applications
- require certain kernel modules that you don’t want installed in all containers
Isolation can already be guaranteed by the kernel (excepting some bugs), so I don’t think that is a valid use-case, but I could be wrong.
"guarantees". cgroups are indeed designed for isolation, but they are a feature that's been added on over time and still exposes the entire kernel as attack surface. It's considered way less trustworthy than VMs, especially minimal-surface ones like Firecracker.
Firecracker is essentially just KVM with some extremely opinionated choices to allow it to be as fast as it is. I’m hand-waving over a lot here but to differentiate it from KVM isn’t very useful here.
I think lightweight VMs will have a major impact. VMs are powerful and solve legions of problems that arise trying to make containers match them, which is exactly the road currently being paved by so much of Kubernetes.
Firecracker has a number of limitations, however. Firecracker is tailored to the highly vertical 'lambda' use case and too much of the power of the kernel and userspace is stripped away. You can't even live migrate a firecracker VM right now.
So yes, VMs are great and will ultimately resurge to the detriment of 'everything is a container' serverless orchestration, but Firecracker in its current form won't cut it. A gulf exists between giant cloud operators and their lucrative 'serverless' (only) model and everyone else. Everyone else must avoid the insurmountable costs associated with abandoning the immense value that stateful VMs provide and the minute something emerges that delivers a seamless VM orchestration system it will skyrocket.
Which kernel powers are you specifically talking about having been stripped away, and how are they relevant to serverside workloads? GPUs are the common one that gets brought up, but I'm not familiar with many others. Firecracker doesn't strip anything out of the guest kernel --- it boots a vanilla Linux kernel, which can be as heavy or light as you compile it to be.
Firecracker is an alternative to QEMU that is purpose-built for running serverless
functions and containers safely and efficiently, and nothing more. Firecracker is
written in Rust, provides a minimal required device model to the guest operating
system while excluding non-essential functionality (only 5 emulated devices are
available: virtio-net, virtio-block, virtio-vsock, serial console, and a minimal
keyboard controller used only to stop the microVM). This, along with a streamlined
kernel loading process enables a < 125 ms startup time and a < 5 MiB memory
footprint.
Basically they removed anything that wasn't needed to run a Lambda. All your I/O, CPU, memory, etc are all going to be limited to one very simplistic implementation, in addition to whatever KVM exports. So rather than saying "what does it limit", it's more like "imagine anything that might take advantage of/depend on hardware, or even host<->guest or guest<->guest, and just forget about it".
Can you be specific about what those hardware things are, and why they matter for serverside workloads? Yes: the premise of Firecracker is that the hypervisor only supports the virtio devices, and doesn't emulate real hardware. Where does that end up being problematic?
I'm not sure what you mean by your "CPU and memory" being "limited", either.
- Firecracker networking uses a tap device. Want any kind of advanced networking hardware, say, to offload packet processing, do network inspection, etc, and access it from the containerized app? Not gonna work... IPSec/IKE VPN? May not work (iirc this requires some specific paravirt features? I might be misremembering). Have some advanced network controller that transparently handles mesh networks, or some other fancy shmancy system designed to manage complex interactions between containers and networks? Probably not gonna work due to assumptions about what lies between the layers, what components use what tricks to handle advanced routing (Netfilter, eBGP, sidecars, etc). To say nothing of link-level differences (what if your network isn't Ethernet?). And all traffic is copied from an I/O thread of an emulated network device to a host TAP device, before it makes it to the real device; my guess is network latency and maximum PPS were not a priority.
- The exposed CPU is based on what KVM supports exposing to guests; presumably QEMU supports a much wider selection. Clocksources, of which I know absolutely nothing (:D), are only allowed as kvm-clock.
- All I/O is rate-limited by a custom scheme built by the Firecracker authors. I'm sure this is fine for most use-cases, but some weird high-performance outlier is gonna hit some sort of bottleneck with this thing I'm sure.
- Firecracker emulated block devices are backed by files on the host. Ergo, any app that wants to control a disk directly, or use some fancy shmancy SAN directly, etc is out of luck.
- The guest requires a balloon driver to use balloon support, which means the guest needs this special software, and compromising the guest driver could be a serious issue. I don't know if Kata does this differently.
- Aarch64 support has a bunch of errata currently, so x86_64 is the only fully-supported platform; I dunno if Kata does any better, but this is a real hardware limitation, esp. if you're trying to buy a shit-ton of cheap powerful machines.
- This is not hardware-specific, but Firecracker seems limited to specific kernel releases; right now the latest it supports is 5.10, according to https://github.com/firecracker-microvm/firecracker/blob/main... They emphasize that they want your host and guest to run "a supported kernel", even if it's possible to run different ones. To me that says that there's the potential for Firecracker-specific bugs in newer (or older?) guest kernels. From that page: "Firecracker represents a component in a larger stack, one in which it is tightly coupled with the guest and host kernels on which it is run."
I would add that containers are used in far more settings than server-side, and would be great to have on the edge, in IoT, and on desktops, if they were a little less... funky. In general, containers requires a lot of extra steps to be "usable" as general purpose applications, and access to hardware will absolutely be a barrier we need to cross in order for "general purpose containerized apps" to become commonplace.
I'm not trying to be argumentative, but rather just to clear things up for you:
All serverside cloud-style VMs get tap devices. That's not a Firecracker thing.
IPSEC works just fine from within a VM. Firecracker doesn't care; it's just a hypervisor.
I'm not sure what you're trying to say about the CPU thing; Firecracker is a hypervisor, not an emulator. Linux QEMU VMs are KVM, too.
SANs work fine from within VMs. The point of a SAN is that the disk isn't attached. Firecracker talks to host block devices the same way other hypervisors do.
Here's `uname -a` from a Firecracker:
Linux 0a581153 5.12.2 #1 SMP Thu Jun 30 19:35:04 UTC 2022 x86_64 GNU/Linux
Unless you want to PCI pass-through an SRIOV VF into a guest? If you're paying out the ass for Cisco cloud gear you probably want the guest to have direct access to a card...
Again, I don't remember the specifics, but depending on the underlying networking (on the host etc), there may be some issues with IPSec (but I could be wrong). From some random searching: https://linux-ipsec.org/wp-content/uploads/slides/2018/quest...
> SANs work fine from within VMs. The point of a SAN is that the disk isn't attached. Firecracker talks to host block devices the same way other virtualizers do.
But you may want to talk to an attached HBA? Firecracker's docs (https://github.com/firecracker-microvm/firecracker/blob/main...) say: "Firecracker emulated block devices are backed by files on the host. To be able to mount block devices in the guest, the backing files need to be pre-formatted with a filesystem that the guest kernel supports."
Based on everything else in the Firecracker docs, I don't see any vHBA support provided, so the guest cannot access an HBA on the host. And a virtual device backed by a file is going to have crap performance.
Yep, PCI pass-through is 100% something you can't do with Firecracker. To me, that's a perfect example of a feature that's not really there to support cloud-style serverside workloads, but that might just be my bias.
I think the other way to look at it is there's just different workloads that need different solutions. Firecracker is perfect for Lamba-style workloads.
But some companies (who don't want to manage datacenter resources, but do want the large scale and flexibility of cloud computing) want a whiz-bang multi-tenant enterprise containerized solution (say, using K8s as orchestrator). And maybe one of their critical applications has certain requirements that necessitate an HBA, deep packet inspection, an HSM, a high-precision clock, etc. They need strong isolation guarantees, so they want a VM, but they want K8s to manage it.
Kata Containers seems to fit the latter, as it can change hypervisors, supports more architectures, advanced networking features, device assignment, etc. Seems to align most with on-prem cloud as it's maintained by the OpenStack people.
I'll bet you it's more like 85%. AWS has a good amount of hardware designed to pass through into the VM, various custom guest drivers, some advanced high-performance networking stuff, and the GPU support like you mentioned. Customers probably pay a hefty premium, so we shouldn't discount it from either a business or technical perspective.
>"All serverside cloud-style VMs get tap devices."
I'm having trouble understanding this sentence. Specifically "serverside cloud-style VMs." Does "serverside" mean from the host OS that's running the Hypervisor? Like if I were logged into that host an "ifconfig" would show me tap devices?
Why would you want to migrate a firecracker vm? They are supposed to be for small, short lived processes. Or in this case, Docker containers (which are likely to be controlled by some kind of orchestration).
If you want to be able to migrate a running VM, do you still really need it to start super fast? It’s going to be a long lived service (otherwise you wouldn’t migrate it), so a longer spin up time should be acceptable. If that’s the case, then you can use a different VM tech that supports migration (KVM/Xen/etc).
> If you want to be able to migrate a running VM, do you still really need it to start super fast?
Yes. "You" is two different people in this sentence. The person doing the migration is often not the person who owns whatever service is running in the VM. They have uncorrelated needs. As a platform operator, I want to migrate VMs to be able to balance load and evacuate dying hardware. As a service operator on that platform I want high uptime - or, more accurately to this case, low downtime.
One of the reasons someone goes for a long-lived service on a VM platform is because the service is in some sense critical: it's Bad News when it drops a connection or doesn't respond. So the operator of that service wants something with a fast restart so that if it ever does need to be bounced, it's back quickly and they return to whatever redundancy level they originally designed to as fast as possible.
"Long uptime means a slow boot is fine" has never been true, in my experience.
Assume any suggestion starting with "just" is equally obvious to the operators as it is to you.
In this case, it would be helpful to start from the observation that live migration is non-negotiable, which immediately rules out any form of "turn it on then off again."
In this case, I think the “just” is justified. (Normally I’d agree with you)
If you’re starting from the premise that live migration is non-negotiable, then I’d argue that you likely don’t need a quick start. Long lived services can afford a slower start using a different VM technology. We’re not talking hours to provision a server, but the difference between 100ms and a few minutes.
My question to you is — what kind of service would need to start fast and be able to migrate? I just don’t see that Venn diagram having a big overlap. But maybe I’m wrong…
Doesn't matter what the service is. The features are used and therefore valued by two different audiences (who may in some circumstances be the same individuals wearing different hats). Live migration: platform ops. Fast boot: platform users.
The platform ops people want to offer the best service possible to the platform users. Fast boot is a feature users may want for their services, so the operators want to offer it. It's directly user-facing, and is a feature users can evaluate on when selecting the platform.
The platform ops people also want to offer high uptime and good performance. So behind the scenes they use live migration to mitigate host risk and balance load. Live migration is not user-facing, and is not a feature users can or should evaluate on when selecting the platform, because they shouldn't have to care.
> what kind of service would need to start fast and be able to migrate?
For ops, all VMs need to be able to migrate. For users, some services need fast boot. Therefore all services that need fast boot also need to migrate.
I wouldn't. I pointed out the lack of live migration as an of example why firecracker can't cut it as the basis of general purpose VM orchestration platform.
> then you can use a different VM tech that supports migration (KVM/Xen/etc)
"KVM/Zen/etc" are in the dark ages compared to the state of the art of Kubernetes container orchestration.
The reason for that isn't technical; KVM is incredible and full-featured VMs conceivably could be orchestrated with the same fidelity as containers. The reason is business: the cloud behemoths invest only in their serverless business models; thus k8s, firecracker, etc. I contend there is an enormous market for orchestration of more complex workloads that VMs suit well.
Can you try to make that a little more specific? What does "general-purpose" mean here? "Secure containerized workloads" tends to imply serverside. It's hard to think of things --- other than GPU access --- that Firecracker takes away here.
No, it isn't. Investigate the K8S world and observe the backflips and somersaults being performed there to deal with networking and storage. Much of that is a consequence of people trying make containers do things that should be run in VMs. A full featured Linux kernel and certain key user space tools can obviate much of that nonsense with ease, deliver better performance, use less resources and be vastly less complex.
The answer given is appropriate for firecracker use cases but insufficient otherwise. I'm not anti-firecracker; it's the right choice for many things. Just not all things.
The sort of VM I want orchestrated has encrypted (by contract) multi-pathed network block devices to encrypted storage volumes. 3-10 per tenant. This is trivial for a full-featured kernel; multi-path just works, encryption just works.
Again, trivial for a full-featured Linux kernel. Has been for ages.
I think you're missing the point. It's not about what hypothetical thing firecracker can or can't do. It's about elevating VM orchestration to some degree of parity with what has been created for container orchestration. These VMs and their complex storage and networking requirements should be modeled as we model containers now; through an orchestration system that makes management easy and as foolproof as possible. The fact that firecracker isn't sufficient to be the Micro-VM of choice for this isn't relevant.
You can encrypt a drive from within Firecracker trivially. It's just Linux, and they're just block devices.
You can do all the standard Linux tap interface networking with Firecracker; it just presents as virtio ethernet to the guest.
This is the second time you've compared Firecracker to a "full-featured Linux kernel". Again: Firecracker is a hypervisor. It's not a kernel. It runs Linux kernels. "Full-featured" Linux kernels. Whatever kernel you compile.
No, you're linking to tickets that I don't think you really grok. That's someone asking for an extra encryption feature on the host side for Firecracker images, not someone saying it's impossible to encrypt a drive from within a Firecracker VM, which is obviously possible. You can just, you know, boot one up and try.
The other link you provided is to an Ignite ticket that's talking about integrating Firecracker with Linux namespaces. That has nothing to do with what Firecracker itself is capable of doing. If you want to slag Ignite, that's fine with me; I don't know much about the project, and am just here because weird things are being said about Firecracker, like that it doesn't run "full Linux kernels".
Azure is famous for two things - being extremely slow (including hitting timeouts on connected software), and since a year or so ago for having trivial yet highly critical security vulnerabilities that escape the tenant barrier.
Regarding the speed, urban legend has it Azure is basically a giant mess of Powershell duct tape.
Also it's not great redundancy-wise, most regions are not composed of multiple Availability Zones, and their control plane had a reliance on a single DC(which, like their security failures, shows that redundancy and security simply aren't priorities there).
The Big Cloud Providers (Amazon, Microsoft, Google) are all terribly slow. This is probably because of the (administrative) overhead of provisioning a VM to a customer, setting up billing, any private networking you may need, redundant storage in three timezones, and probably tons of scripts and hooks I don't even know about.
In contrast, Firecracker restores a VM state to do session resumption. The OS has practically already been booted the moment execution for your specific VM begins.
That said, creating a brand new VM takes less than a second on my machine and then maybe 30 seconds for a full first boot, surely the super smart people at these huge cloud companies can figure out a way to get up and running in less than a minute. The ability to spawn quick disposable VMs is one of the reasons I stick to local VMs for dev work rather than some expensive cloud solution.
15 minutes is on the high end, but 5-10 minutes is typical for Windows. You can’t deploy a ready image, usually, you have to deploy one that is sysprepped so Windows can do device discovery on first boot.
You can deploy them with the devices already set up, but sometimes (literally sometimes, it’s hard to tell when it happens) Windows gets really upset if any of the hardware changes. I think Windows 10/11 (and Server equivalents) has made that easier though (MS still recommends sysprep).
There's your problem. I can boot a tinyconfig Linux kernel that's 600K in QEMU using the microvm machine type in milliseconds. A basic Windows install is tens of gigabytes, and doesn't have the capability of being stripped down to the barebones essentials.
You should be able to provision, customize and boot a full VM in 60 seconds or so. Maybe 2-3 minutes if it involves copying in a lot of data and SELinux relabelling. If it's taking too much longer than that it's something wrong with the cloud or environment you're using.
For me, this is the deal breaker, but this is still exciting to me:
> Note: At the moment ignite and ignited need root privileges on the host to operate due to certain operations (e.g. mount). This will change in the future.
I’d love it if this ends up getting changed. It’s a hard line to walk, but this would make Docker containers feasible on multi-user systems. (Without using Singularity, which has its own downsides).
So, you’ve been waiting for something like this too?
I’d love to be able to submit Slurm jobs backed by a Docker container. That would be a reproducibility dream for me. And make it so I didn’t have to install so much software or still maintain Environment modules.
As it is, I can get partially there with Singularity, which isn’t terrible.
Absolutely. With firecracker, AWS seems to have solved the multi-tenancy problem that has plagued HPC since its inception.
I've just been waiting for some way to implement that in a sane way for on-prem infrastructure. If not for the root requirement, this would be perfect.
Singularity does get us part of the way there, but it's just another abstraction layer that researchers have to work around. And it's far from perfect.
Damn, this is going to eat the world. Docker but with full host isolation making it suitable for untrusted workloads, hot damn. This could for real make fully managed multi-tenant completely serverless k8s possible once it gets to the point where it can be used as a containerd and CRI implementation.
Been doing that by running kubernetes inside SmartOS zones via bhyve VMs on multi-tenant systems. Works pretty well, although they are slightly more resource hungry VMs than the ones in this article.
I like the way this sounds, but... is the project still alive? There haven't been any commits to the main branch since July, and the last release was over a year ago.
Talos no longer supports creating a cluster automatically in a firecracker VM (which you can still do in docker, via `talosctl cluster create`) but you can certainly run Talos Linux in firecracker VMs and create a cluster that way.
> Ignite makes Firecracker easy to use by adopting its developer experience from containers. With Ignite, you pick an OCI-compliant image (Docker image) that you want to run as a VM, and then just execute ignite run instead of docker run.
Damn, that's quite a sales pitch. Excited to follow along!
Does this work for anybody?
I've just tried it in a KVM Debian 11 VM on proxmox. Deploying from yml manifest does not like my `ssh: true', removing existing /etc/firecracker/manifest/*.yml file segfaults the daemon. Trying to `ignite run ubuntu:latest` or `ignite run alpine:latest` dies on trying to run init ... where's the benefit of leveraging existing containers?
The major reason for the initial excitement about containers was the relative efficiency as compared to VMs. I understand that Firecracker has a low memory overhead and a fast startup time but I'd be interested to know what the virtualization overhead is in terms of cpu performance. I poked around on google a bit for a few minutes but didn't find anything.
My understanding is that it's just KVM under the hood, so cpu perf should match KVM. Firecracker is mostly just optimizing startup time and memory overhead.
Running Docker in a VM is not new at all. This is even how Docker for Windows/Mac works.
The GitHub page states that Ignite does more than just "wrapping a container in a VM layer" but I'm not sure how that matters, since the notion of a "container" is purely about configuration. It seems like a distinction without a difference.
Docker for Windows/Mac uses a single VM for all your containers. Firecracker/Ignite has a dedicated micro-VM for each container - a completely different architecture, with much better isolation between containers. And the 125ms boot time is not something you get with a typical VM.
But I do agree that they don't explain how Ignite is different from "wrapping a container in a VM layer" that Kata/gVisor does.
The latter. If you typed `docker ps` on your host there would be nothing showing up. The application is running inside of a micro VM.
It should also be noted that containerized processes that are running on a host will still show up on the host's process tree if you look at the output of `ps`, `htop` or similar commands. Containers aren't real boundaries, just namespaces on the host machine. This is the not the case when using Ignite, as the process from the container image is running in an actual micro VM, rather than running on the host inside of a namespace
For instance there is no way to automatically run the image's specified command, effectively leaving you with a dead VM: https://github.com/weaveworks/ignite/issues/874. You also simply can not share directories with the host (out of scope for Firecracker).
microVMs for running containers are definitely a great idea; another project aiming to do it (a little differently) is Kata Containers. It has a lot more industry support, can also run on Firecracker (though QEMU microVMs are just as good) and can (theoretically, I haven't gotten it to work) interact with many container runtimes that are not Docker (the focus appears to be on Kubernetes).
While more powerful, Kata Containers are also much, much more complicated (and honestly under- and misleadingly documented). Ignite could fill the role of something simpler and more easy to get started with. But please, if you aim to attract Docker users, actually make it compatible with Docker principles.
There's still a lot of unused potential in containers as VMs; Kata+Kubernetes probably currently realizes the most of it.