Erase your darlings: immutable infrastructure for mutable systems

kreetx · on April 13, 2020

Since my move to nixos I've considered to reinstall my system every morning - since the entire system config is a few .nix files (+ home-manager). Haven't quite found the determinism yet. Perhaps I should do it tomorrow morning. :)

edit: If I get three more upvotes I'll do tomorrow.

edit: two more.

edit: one more.

edit: all filled up, it's going to happen!

edit: Some more context. Although I have a laptop (a 13" 2015 mbp) I don't bring it to work anymore, but have two desktops, one at home and one for work. All three run the same nix configuration, shared through a git repo. Whenever I discover there is some program I need I add it to the config and after switching machines run `sudo nixos-rebuild switch` - so the thing I added becomes available on the current machine there, too. All this just works (tm) and I'm fairly confident the re-install will be painless.

sp332 · on April 13, 2020

This reminds me of The Tiny C Compiler "tcc" and its ability to recompile the kernel from source on every boot. https://en.wikipedia.org/wiki/Tiny_C_Compiler

delusional · on April 13, 2020

I wonder if this could be used instead of image based computer management for enterprise settings. Basically the team managing the systems have some settings, employees have their own overlay (to account for personal preference in tools or directory structure), and then you just install the entire machine at the start of every day.

It seems really complex though, so I'm not entirely convinced it's a good idea.

coredog64 · on April 14, 2020

My previous employer did this weekly with thousands of Linux hosts. You can’t have an APT without persistence.

GordonS · on April 14, 2020

How would this work from a bandwidth and caching perspective? I'm thinking it would be problematic if hundreds of workstations need to download gigs of software at 9am every morning.

kreetx · on April 15, 2020

At least for nix you can set up local caches, so this would save most of the bandwidth. But you could also just keep the /nix/store folder - as it's an immutable store for all the packages.

kreetx · on April 14, 2020

I would like to report that this is now done, thank you for the encouragement.

AgentME · on April 13, 2020

Is using nixos not enough to have a deterministic install?

arianvanp · on April 13, 2020

Read the article. It's the author exactly telling what parts aren't deterministic for them yet. E.g. Bluetooth devices, Networkmanager configs added by hand, wireguard keys, ssh host IDs.

kreetx · on April 14, 2020

It depends on what you mean. If setting up a development machine (i.e non-server) there is the home-manager package which on top of the system configures users (what packages they have, home folder content).

Secondly, for an exactly deterministic install you'd also pin the nixpkgs to some commit (the package repository is a git repo). But I guess many don't do this unless they have specific needs.

robbintt · on April 13, 2020

*determination :-D

jedberg · on April 13, 2020

I would love to get to a point where my laptop can be managed as immutable infrastructure.

All the big chunks of data are already isolated into redundant partitions, but it’s the system config that’s tough.

I have a time machine backup, but that’s still not the same as being able to say “I’m gonna wipe my hard drive and start over today”.

So does anyone have have good suggestions on maintaining a MacOS laptop in an immutable way?

joshspankit · on April 13, 2020

In the world of commercial software that’s essentially impossible.

Any immutable infrastructure lights up license keys and demo restrictions like a spotlight.

Take, for example, Apple’s desktop OS: You used to be able to drag-drop an application to install it. To uninstall you would delete it. Simple. Easy. Stateless. They talked about it a lot and so did Mac evangelizers. But it also meant you could walk in to an Apple store, connect a USB drive in your sleeve to a demo computer, drag, drop, and walk away with full versions of very expensive software. So, the OS fell prey to the same stateful pitfalls as Windows: places to hide keys, system hooks, etc, etc, etc.

Your best bet these days is likely to manage all software config through a system management tool, keep your data backed up in Time Machine, and “reprovision” your laptop every X days or months.

jkachmar · on April 13, 2020

You can work around this in the event that your commercial software provides a consistent interface for the license key stuff; e.g. via the CLI, a key file (ideally plain text) on-disk, or some scriptable interface.

Then you can store this info in separate key management software (e.g. Unix pass for your local machine) at which point bootstrapping your system could be done relatively automatically.

EDIT: I should add that this is my view of how things should work in the-world-as-it-ought-to-be, which isn’t necessarily the same as the-world-that-we-live-in.

joshspankit · on April 13, 2020

Yes, I agree that would be excellent.

lstamour · on April 13, 2020

Given iCloud/Apple logins, though, which authenticates and creates the keys necessary for the Mac to run signed apps, wouldn’t the process of wiping and reinstalling simply include either not deleting preinstalled apps or starting from a state that downloads the apps you need and signs you in/enrols your Mac as necessary? The same would be true for Windows, Active Directory, and any other state required to run the system that has to sync or work with a third-party.

I guess to me that’s what the /persist drive was in the example, the need to customize system data beyond the basic install steps. Think of your Application Support folder, or apps that don’t sync to the cloud, or system kexts you might need to run apps.

The biggest problem isn’t software like Apple’s which use the Internet to authenticate how many systems are in use with generous limits, it’s Software like iLok and such that use various stateful properties of your system in an undocumented way so it’s hard to preserve them across installs. If that software uses timestamps, for example, it might be hard to preserve what it needs. Keychain might be another example, I haven’t fully investigated how that interacts with other chips like the T2 in these scenarios.

The problem is that outside of servers, it can be hard to distinguish between files I care about and files I don’t. What I’m hoping we get in the future is a filesystem and (on Windows) a registry, that automatically containerizes all saved state to the app and user it belongs to, ideally with some kind of historical log metadata. I know it’s asking for too much and would never be perfect, but it’s a nice thought. Right now we rely too much on apps to be well-behaved, but it’s the apps that aren’t well-behaved or that share data between themselves and other apps that are the issue. Apps asking for access is one approach, but I’m less concerned for privacy in this scenario than I am for state cleanup. The trouble isn’t reading anywhere though, it’s when an app shared state by updating data in another “container”, then you’d have to create containers to represent files shared between apps, or the container idea disappears and you’re left with metadata about file usage and global state as an alternative... At that point you’re limited by how much work you want to do to keep the system clean, and it might be easier to identify anomalies periodically than to keep a whitelist updated with files to persist...

jedberg · on April 13, 2020

> Your best bet these days is likely to manage all software config through a system management tool,

Got any good examples? This is usually where I get stuck and can't find a good solution that works well with MacOS.

joshspankit · on April 13, 2020

I can’t remember the particulars right now, but there’s a small team that remotely manages the macbooks of Google employees. I suspect that their tools would be a great fit here.

joshspankit · on April 14, 2020

Found the resource I was thinking of:

- https://www.usenix.org/conference/lisa13/managing-macs-googl...

Unfortunately that was 2013, and https://github.com/google/macops only has a couple commits, so those devops processes are likely very outdated.

The mailing list is dead, but as of mid-2017 this statement was made:

> The main google/macops project is still alive, but there is more development currently going on in the other projects we link to (like santa).

> We still use puppet, as do our Linux and Windows fleet, but none of us actually use puppet infrastructure. We are all working with a standalone (masterless) model.

santa seems to be the only actively-developed macOS management tool on the google github profile, but other tools listed in the talk are actively developed (puppet of course, munki, etc)

lstamour · on April 13, 2020

They started using Puppet against Mac dev machines, then switched to internal tools I believe. But it was one of the first times it occurred to me that we should manage and use DevOps and SRE practices against user machines and user workloads where possible. The trouble is the tooling isn’t that mature yet, and we can’t assume real-time data or always connected machines, and we don’t have a herd because users only have one machine with them at any time and remote state deletion only works when you know you’re not deleting critical state, which in turn requires better tooling and a greater understanding of user application state persistence behaviour than most are willing to invest time in doing.

This is what makes Chromebooks so easy to maintain though: web apps and sandboxed Android apps can all easily sync to the cloud and compartmentalize their data.

sneak · on April 13, 2020

I would pay real money for a linux distro that works as well as ChromeOS or macOS that doesn’t have all the phone-home endemic to both (yes, Macs phone home like mad even with iCloud and all of the analytics off).

nightfly · on April 13, 2020

People have been doing this for a long time. My team uses Puppet & and Ansible to manage Linux user workstations for our university, and the Windows team uses SCCM + other Windows tools. A big reason why we've stuck with Puppet is is it's flexible enough to manage internal systems + and user workstations fairly well, with a lot of the code being shared with all systems Linux/Unix systems we manage.

bradly · on April 13, 2020

Does Homebrew not do this for you? You can install pretty much any popular app use cask installs. If Homebrew works for you, you can use a Brewfile similar to a Gemfile or package.json to handle adding and removing apps from your system?

jedberg · on April 14, 2020

Homebrew works for a lot of stuff, but not little details. For example:

All my Firefox plugins and their config (I guess this is technically data)

All the settings I've changed though the control panel (I can't figure out every file that changes when I make a change).

Any time I've changed a setting with nvram or defaults

Most of the settings of the default apple apps.

My wireless configs

For most of these, there is probably a shell script one could write that can backup and restore them, but I've never found a holistic solution.

mason55 · on April 13, 2020

nix + home-manager will get you pretty good coverage for non-commercial software. The learning curve is steep but once you get there it's amazing.

mixmastamyk · on April 13, 2020

> But it also meant you could walk in to an Apple store, connect a USB drive in your sleeve to a demo computer, drag, drop, and walk away with full versions of very expensive software

Security permissions/demo versions are the proper solution to this, aren't they?

joshspankit · on April 13, 2020

Permissions fails because if your user has the ability to read the application’s files, they can copy it. If they cannot, then neither can the application itself (aka it cannot run).

Demo versions (as in compiled to be a limited demo) are viable, but was counter to Apple’s image at that time (it doesn’t really show off the experience if you’re getting restricted or nagged).

mixmastamyk · on April 13, 2020

Read yes, but not write to removable drives. This setting is used on every machine in a VFX company I worked for. Although, I don't know how supported it is across platforms.

joshspankit · on April 14, 2020

Right you have a possible win there. Totally slipped my mind since it’s not currently available.

Apple stores still need to allow people to connect their iOS devices and maybe USB sticks for photos, but they could have created a “read-only” policy and applied it to store machines.

Of course, these days they just distribute through the app store, and those apps put tentacles all over the place. No USB copies would function.

girvo · on April 14, 2020

Weirdly, an app I purchased on the App Store ran perfectly fine copied to my friends Mac. I wonder why?

oefrha · on April 14, 2020

The license check is opt-in on the developer’s side.

https://www.objc.io/issues/17-security/receipt-validation/

gav · on April 13, 2020

I'd love to get to make further inroads, but I treat my MacBook as something that's disposable, there's nothing on it that I care about.

- Pretty much all my software comes from brew or brew cask

- All my files are stored in either DropBox or Google Drive (split is for legacy reasons mostly)

- All my work is in a remote Git repository

- My local config is also in Git

Anything else isn't backed up. I treat ~/Documents as scratch.

It's not ideal because it still takes some time to configure and log on to a bunch of cloud services, but maybe it says something about moving into management that I spent 90% of my day inside of Firefox, Mail, and Slack...

gowld · on April 13, 2020

But do you have a way to migrate to new hardware when the time comes?

gav · on April 13, 2020

Basically grabbing a saved Brewfile[1], running `brew bundle install` and waiting a while.

[1]https://github.com/Homebrew/homebrew-bundle

jhardy54 · on April 13, 2020

Fun trivia: the original implementation[0] was 17 lines of Ruby, which was absolutely terrible because I didn't know what I was doing. I'm happy to see that it's grown so much!

[0]: https://github.com/Homebrew/legacy-homebrew/pull/24107

gav · on April 13, 2020

Projects are often like stone soup, it takes somebody to kick the process off, and I thank you for that.

I saw that it now supports Whalebrew[1], which seems interesting. One of the issues that I have with Brew is that often my need is temporary and that I want to be able to clean up afterwards (similar to `git stash`).

[1] https://github.com/whalebrew/whalebrew

jedberg · on April 13, 2020

Do you know of an easy way to create a brew bundle from already installed applications?

the_alchemist · on April 13, 2020

In the readme it's mentioned. brew bundle dump https://github.com/Homebrew/homebrew-bundle/blob/master/READ...

jedberg · on April 13, 2020

Thanks. Totally missed that.

Edit: Huh, only 90 packages. I was expecting more!

zimbatm · on April 13, 2020

In the same vein as TA, nix-darwin[1] can be used to configure macOS applications and services. It's not exactly the same as NixOS because the whole system is not entirely managed by Nix but it gets you closer.

There is also Ansible, Chef, Puppet, ... but in my experience the overhead to write and test those config files is even hired than using Nix.

[1]: https://github.com/LnL7/nix-darwin

jkachmar · on April 13, 2020

In practice you can actually get extremely close to having Nix manage your whole macOS system with the exception of actual OS-level updates (and particular tools like Xcode, I suppose).

John Wiegley’s system configuration [0] is an example of this, although it is extremely ambitious and definitely not a good place to start for anyone who has used Nix before.

One of the interesting things that Wiegley does for further reproducibility is to have Nix manage and install DMG-based Applications [1] in addition to CLI apps, services, and other system-level things.

[0] https://github.com/jwiegley/nix-config

[1] https://github.com/jwiegley/nix-config/blob/bbad310daa1106f6...

jimbokun · on April 13, 2020

I find iOS devices get surprisingly close to this.

If you have iCloud backups, do a final sync, then restore to the device you just bought. This has been the process for the last several iPhones I've bought for my family. At the Apple Store, they walk you through this process so everything is set up before you leave.

It's hard to tell the difference between your old and new device, in terms of software and data, for the most part.

jedberg · on April 14, 2020

The iPhone is amazing in this regard. I've upgraded multiple phones this way, and it's almost flawless.

joshspankit · on April 14, 2020

I would temper this slightly as I’ve actually had cases where cruft also follows from one device to another.

It’s much better now than it used to be, but there are still time where I have to sacrifice data by uninstalling/reinstalling or resetting settings.

Top of my wishlist for iOS is Apple letting us manage (and potentially fix) our backup data.

abathur · on April 13, 2020

I'm not as far along this curve with macOS as Graham is with NixOS (and I'm a little jealous), but I'm approaching something like what you ask for. Feel free to ask questions; I'll try to remember to check back.

I have an air from 2013; I moved (regrettably, given the keyboard) to a new air around this time last year and used the process as an excuse to force myself to specify everything well enough that I could recover everything I need from stock macOS from a short bootstrap script.

I use yadm (~git) for my dotfiles, Nix for everything Nix can do readily, brew bundle for a few mac apps around the edges, and 3 small restic backups for some project files and such. My dotfiles contain a longer bootstrap script that takes over the heavy lifting once yadm is installed. This script configures most of the settings I need on the way, and prompts me to do the few bits I haven't found a good way to automate. It also collapses ~/Downloads, ~/Desktop, and ~/Documents into a single directory, so that I don't have out-of-sight/mind places for state to hide.

It has been a lot of work, but it has bought me peace-of-mind that I can replace my system quickly (i.e., be ready to work on a new device in ~2h with maybe 20 minutes of actual babysitting?)

That said, I think the basics here are actually fairly low-hanging fruit. You can also make some really quick progress once you know you've got everything essential safeguarded if you're willing to take a dive like Graham has here and let minor losses roll off your back. Access to a second mac also helps a lot; iterating on problems in the actual bootstrap process is slow, and best done while you have a working system you can stay productive on.

I'm currently most of the way through building out the rough inverse of my bootstrap script--a script that audits as much of my "state" as it can; I hope to have it working for a living as part of my backup runs soon. It cleans up and auto-commits anything it can, tells me what is in a clean state, reassures me not to worry about everything I've worried about during past moves, and gives me an explicit checklist for anything that isn't buttoned up.

I think the main things left are forcing me to deal with downloads/desktop/documents as I go (with measures like Graham's being on the table...), and figuring out if there's a sane way to audit for drift in my macOS system/app settings.

jedberg · on April 14, 2020

Thanks for the details. When you set up a new system, how do you solve for these:

All my Firefox plugins and their config (I guess this is technically data)

All the settings I've changed though the control panel (I can't figure out every file that changes when I make a change).

Any time I've changed a setting with nvram or defaults

Most of the settings of the default apple apps.

My wireless configs

This is always the part I get stuck on.

abathur · on April 14, 2020

I'll number based on your post's paragraphs with a zero index :)

1. Safari is my daily driver, so I'd have to trawl other people's dotfiles/bootstrap scripts for evidence anyone's handling FF in a smart way. But, this might still help: Apple recently changed Safari extensions to be an adjunct to native apps. Before they made this change, I had a section of my bootstrap that used the `open` command to launch the Safari extension page for each extension I used, which made it pretty trivial to click install on each tab.

Another thing you might find tractable is looking for FF config/state files that you can back up and restore (narrowly). This is easier to play with if you have a 2nd system as I mentioned. I figured out, for example, how to restore my macOS Terminal.app windows from my previous install with all of my scrollback history.

2. These are rough, but you're already on the right track. It sucks, so maybe bitch at Apple about it--more voices might move the needle here. I have 3 basic approaches: find a defaults (and, rarely, another command) that does the same thing; figure out which plist file is implicated and diff it before/after changes until I figure out how to force the right change (broadly, I find it better to try preference edits with all involved apps closed); AppleScript it. I have applescripts in my bootstrap for stuff like setting my Finder preferences and enabling specific Safari extensions (now that they're bundled with .apps). If you haven't already tried it, UIElementInspector makes it much more plausible to actually write an AppleScript that has to do UI manipulation.

3. No direct comment on nvram as I've never fiddled with it, though I noticed it does have an option for outputting the current settings and a flag for setting them from a file. For defaults, I'm not sure there's anything better than just not doing it unless you add it to your script. (This is where I was wondering about whether there's a good way to diff my delta against my setup script...)

4. Too vague for me to give a good answer. If you can't find a way to set these from the CLI or hacking around with plists, I'm not sure anything shy of AppleScript will help.

5. What do you mean by wireless configs? Passwords and network names? I pull mine in through iCloud keychain and don't otherwise sweat these.

memco · on April 13, 2020

I experimented with a version of this: https://github.com/mathiasbynens/dotfiles. It worked ok. Still had a few things I had to tweak, but if you have a few things you want to keep it's easy. I basically only use a brewfile + a shell script to set a few defaults and some configs for iterm + kitty that I keep in my personal repo. Makes the most repetitive bits a lot easier. The MAS homebrew stuff didn't work too well when I migrated machines so I still had to install some stuff by hand.

asdf-asdf-asdf · on April 13, 2020

you can use NixOS on MacOS.

i was planning to try it out (currently i use homebrew), but with Catalina it seems things are somewhat complicated for nixos+macos.

my understanding is that NixOS wants to live in "/nix", and Catalina does not like that.

there are some solutions to this problem but they seem somewhat incomplete (you have to make an unencrypted partition for Nix etc..), though i haven't tried it so i might be wrong :)

https://github.com/NixOS/nix/issues/2925 https://github.com/NixOS/nix/pull/3212

bcrosby95 · on April 13, 2020

I don't know if it's usable for a MacOS laptop, but I use Ansible for my various Linux dev machines, and if you handed me a fresh install, once I got my Ansible directory onto the laptop I could get it completely up and running with a single command.

j88439h84 · on April 13, 2020

You can run nix and home-manager on mac.

dirtydroog · on April 13, 2020

Well, going by the functional programming crowd's take on immutability, you just have to buy a new laptop each time.

solatic · on April 13, 2020

As someone who loves NixOS and runs it on my daily-driver laptop -

I can't see running NixOS in production.

We're running 100% Kubernetes, including for databases and other stateful workloads. Kubernetes implements the author's pattern just fine - any OS state is defined within the container image, and any application state is defined within a Persistent Volume. Unfortunately, NixOS doesn't have a good story yet for service management (Disnix isn't nearly as featureful as the Kubernetes scheduler and doesn't see nearly the same activity / community buy-in as Nix / NixOS) let alone ensuring that networked storage is re-attached to the particular node that runs the service in the same reliable manner that Kubernetes offers.

IMO the way forward for Nix / NixOS in production is to:

a) develop a container runtime that would allow a Kubernetes node to run pods that specify Nix expressions directly in the image field, instead of the current workaround of creating Docker containers from Nix expressions and dealing with the overhead of external registries

b) improve the experience of running Kubernetes on NixOS such that ease of installation more closely approaches that offered by managed Kubernetes providers.

tazjin · on April 13, 2020

> develop a container runtime that would allow a Kubernetes node to run pods that specify Nix expressions directly in the image field

https://github.com/google/nixery

I'm the author of this, and here is a talk about it: https://www.youtube.com/watch?v=pOI9H4oeXqA

solatic · on April 13, 2020

I'm familiar with Nixery and I think it's a really cool project. It's extremely close to what I'd like but it's not quite it - it requires either relying on nixery.dev to be online (unacceptable for production considering there are no availability guarantees) or running my own instance (which essentially means that I'm maintaining a type of registry).

Why is there a need for an image registry? Part of the beauty of Nix is that Nix benefits from remote binary caches, but they are not required. Why not have a container runtime that, instead of downloading image layers, instead fetches from a Nix binary cache if possible and builds from source if not (with the caveat that production nodes should basically never be building from source)?

(Also Nixery is GCE-only and we're on AWS but leave that aside).

tazjin · on April 13, 2020

> which essentially means that I'm maintaining a type of registry

Mhm, there's no state that can't be thrown away and recreated, so I'd argue the overhead of running it is much lower than a full-blown registry.

> Why not have a container runtime that, instead of downloading image layers, instead fetches from a Nix binary cache

It depends on where you want to do this - Kubernetes for example has lots of opinions about images and how they're downloaded, so just replacing the runtime wouldn't be enough.

Nixery is an incremental step towards the end-goal, but there's a lot of mindset shifting that needs to happen first I think.

> Also Nixery is GCE-only

Nope, you can use a disk as the storage backend and then there's no dependency on GCS. S3 support would also be relatively easy to add by just implementing this interface: https://github.com/google/nixery/blob/master/storage/storage...

ecnahc515 · on April 13, 2020

Not OP, but a registry works anywhere, whereas a custom runtime doesn't (managed node groups in EKS, GKE, AKS etc).

Additionally, once you use a custom runtime, now you have to deal with multiple runtimes in your cluster. You can no longer easily just run pods, you have to ensure they run on the nodes with the runtime for the images you want.

ghuntley · on April 13, 2020

Use nixos to run your k8 cluster if you are insane (or have the $$$ for salaries) enough to run your own k8.

See https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/s...

Specifically "services.kubernetes.apiserver.enable = true"

My strong recommendation is to use nix to generate super optimised and amazing docker images then use them with k8 on a PaaS.

https://grahamc.com/blog/nixos-on-zfs

If using k8 (self host or PaaS), keep an eye on https://github.com/xtruder/kubenix as it'll blow your mind. Noyaml, infrastructure testing framework, deployment etc using nix.

solatic · on April 13, 2020

I'm familiar with that flag, I tried to use it to set up a local development environment on my NixOS laptop. It forces you to use easyCerts and Flannel, neither of which you should use in production on AWS as a default. Disabling them to have more control over rotating certificates and to use AWS VPC CNI networking takes you well far away from a managed experience.

Additionally, the channel ecosystem as it exists today does not allow you to choose your minor version of Kubernetes, which is another issue if you want to keep your underlying system up to date but also want to make sure that you're controlling when you adopt a new minor version so that you can deal with deprecations as necessary.

ghuntley · on April 13, 2020

Ah yeah, that was a toy example to highlight how services are defined/enabled as k8 is notoriously known to be f'ing hard, involved, complex and intense to setup.

"Services.<service>.enable" is very similar to freebsd and /usr/local/etc except with standardised language to configure every daemon.

As for channel ecosystem and having control. Pin and override using layers — https://github.com/digital-asset/daml/blob/master/nix/nixpkg...

Then use https://cachix.org/ for stupid fast builds via caching.

arianvanp · on April 14, 2020

There is https://github.com/moretea/kubernix but it's an experiment. However seems to do exactly what you're looking for

solatic · on April 14, 2020

Yes it does... but no commits in three years :(

ris · on April 13, 2020

> develop a container runtime that would allow a Kubernetes node to run pods that specify Nix expressions directly in the image field

Do you really want to give kubernetes the added responsibility of building your images?

solatic · on April 13, 2020

Kubernetes isn't building the image, really, it's just passing the Nix expression directly to the container runtime that Nix would provide. This is more or less how Nix works already, as the Nix tooling takes Nix expressions and builds derivations which are stored in the Nix store.

Spivak · on April 13, 2020

I mean it's kinda the logical conclusion or else your images become your darlings.

CaveTech · on April 13, 2020

This is basically the setup we've been using with vagrant (and in production) for years.

Vagrant launches a bare-bones VM. Local files mount on /vagrant Chef-zero uses local mount to provision the systems and configurations necessary.

On every vagrant reload, this process repeats. Chefs idempotent nature means that any manual drift is automatically repaired.

In this setup there's no difference between `vagrant reload` and `vagrant destroy && vagrant up`.

It's possible to package this so that it's simultaneously "Infrastructure as Code" while also satisfying "Immutable Infrastructure". Our stab at this is now 6 years old and we're surely not the first to do it.

sargun · on April 13, 2020

I feel pretty strongly against the idea of immutable infrastructure when you're "infrastructure" (shared systems, running other people's software), but this article isn't about that.

The beef I have with this article is the idea of:

> New computer smell

> Getting a new computer is this moment of cleanliness. The keycaps don’t have oils on them, the screen is perfect, and the hard drive is fresh and unspoiled — for about an hour or so.

In my observation (and in datasets that I have access to), computers systems tend to follow the "infant-mortality" curve. This means that if they run for a little bit, they're likely to run for a long time (and in addition, if you have many of them, they tend to die around the same time). My conjecture is that many computer systems have initialization routines which are not as thoroughly tested as the normal operating state of the system. Due to this, we tend to run into more issues in "immutable" systems than you otherwise would in "mutable" systems.

ScottBurson · on April 14, 2020

Horror story: https://thedailywtf.com/articles/Designed-For-Reliability

outworlder · on April 13, 2020

So this is mostly about Nix, but I've used (in production!) CoreOS, which implemented some of the same concepts.

You couldn't just update anything easily. Well, anything is possible, but CoreOS made it very hard to do it the wrong way, with the readonly system partitions.

But it made upgrades really easy. And you have a second, backup system partition to boot from, if the update messed up things.

We had to move back to a 'standard' Linux distribution and now all those old habits are creeping up. It takes a lot of discipline (and enforcement) to avoid the applications of 'fixes' which get eventually forgotten.

ghuntley · on April 13, 2020

Updates are easy. For an advanced example of overriding nixpkg and usage of pinning/override layers (ie some mandate or reason to run super old version of grpc) — https://github.com/digital-asset/daml/blob/master/nix/nixpkg...

yjftsjthsd-h · on April 13, 2020

> We had to move back to a 'standard' Linux distribution and now all those old habits are creeping up

That sounds very relevant: Why did you have to go back?

ecnahc515 · on April 13, 2020

Most likely because Container Linux (previously CoreOS) is nearing end-of-life. Your options are to move to Fedora CoreOS or Flatcar Linux, so it's likely they decided to go with a more vanilla distro instead of migrating to one of the more similar options.

yjftsjthsd-h · on April 13, 2020

That's actually a fairly encouraging failure mode, given that it means there's nothing inherently wrong with the approach, just that particular implementation.

ghuntley · on April 13, 2020

Here's my dotfiles for some of my nixos servers and home computers.

https://github.com/ghuntley/dotfiles-nixos

Steal away and enjoy.

downerending · on April 13, 2020

For years I've been doing this in a lightweight way with a shell script (and cache of auxiliary data files). I suspect many others do this as well.

The basic idea is to image with some lean/vanilla image, then run the script to put the system into the desired state. Kind of like Puppet, except far easier to understand and change.

Properly done, one can reimage pretty much at will, which is nice if there are lot of people making local 'root' changes on boxes.

edit: typo

eeZah7Ux · on April 13, 2020

> The basic idea is to image with some lean/vanilla image, then run the script to put the system into the desired state.

Spot on. Much better than most CM tools.

The Nix setup is pretty hackish and does not track files changed by running applications.

What we really need is automatic version history on whole filesystems.

pas · on April 13, 2020

ZFS (and btrfs) snapshot diff does that easily. Also any overlay filesystem. But after a point combing through the diff becomes an insurmountable task. Nix forces the user to declare what goes where up front. (And yes, it's not exactly a user friendly syntax/style. systemd has support for WorkingDirectory, RootDirectory, RootImage, RuntimeDirectory, StateDirectory, CacheDirectory, LogsDirectory and ConfigurationDirectory, but so far most unit files don't take advantage of that.)

downerending · on April 13, 2020

This is nice to have, but it doesn't really distinguish between the "logical" differences and the "physical" differences.

As a trivial example, one step in one's setup script might be "install package X". That can end up creating/modifying a lot of files, but many of those differences aren't necessarily meaningful or something one would want to carry into future reimages. And as things progress over time, those diffs might even be "wrong".

I liken it to whittling vs CNC. Either can be the right way to go, but usually at scale we end up doing better with CNC. And the best CNC program is a compact one.

pas · on April 13, 2020

That's the advantage of Nix's top-down approach. You declare everything on as high level as you want/can/wish. You specify the package plus your customization for it.

It's not that much different than a one-liner that installs the package plus uses echo to write your config file. The magic is the "forced" reproducibility by default.

downerending · on April 14, 2020

Conceptually, I think nix is great. Most of us can appreciate the value of a perfectly reproducible system, etc.

That said, unfortunately nix is a practical option for about zero percent of commercial shops. Sigh.

I did actually work at a place that tried it once. Summarizing, the introduction failed through some combination of politics and the perception that nix would introduce a lot of complexity vs a standard vendor distro.

eeZah7Ux · on April 13, 2020

No, it's not automated and does not track changes with the required granularity: what file is changed by what application, what OS package and what reason.

pas · on April 13, 2020

If you use docker (which uses overlay2) for every application you can easily track these changes. I don't think even Nix can/does track anything on this level.

That said, yes, something like that would be great. Using overlay + namespaces/cgroups could get us close to that. Then we'd have to detect which packages are completely passive (read only, shared dependency of many others) which are shared but have their own state, which are shared but their state is a mix of per-dependant states (so if multiple packages/apps depend on a DB service then the schemas/tables ought to be isolated), and so on.

eeZah7Ux · on April 20, 2020

> If you use docker (which uses overlay2) for every application you can easily track these changes

No: it can only create a serialized sequence of changes that cannot be reordered. Also it's not aware of the context and meaning of a change.

> That said, yes, something like that would be great. Using overlay + namespaces/cgroups could get us close to that. Then we'd have to detect which packages are completely passive (read only, shared dependency of many others) which are shared but have their own state, which are shared but their state is a mix of per-dependant states (so if multiple packages/apps depend on a DB service then the schemas/tables ought to be isolated), and so on.

apt-get based distribution already track which files and directories are related to which package.

Also systemd unit files already can ensure that daemons are not making changes elsewhere.

Indeed, what's missing is tracking dependencies across applications and across hosts.

Linux distributions got most of it right decades ago (before the container dumpster fire)

eeZah7Ux · on April 13, 2020

How comes that every time I post from this account I get few immediate downvotes and when I use other accounts it does not happen?

Interesting...

downerending · on April 14, 2020

Perhaps you mixed tabs and spaces in a past life.

infogulch · on April 14, 2020

As a tiny effort to tame the immense mutability of Windows, at least for the question of "what programs are installed at which version?", I'm using a combination of Chocolatey (an unofficial package manager for windows, https://chocolatey.org/) and Git.

I maintain a 'packages.config' xml file in a git repo that lists all the packages I decide I want to be installed and their versions. I intentionally don't list dependencies of the packages I want in this file, so I can remove a package later and I don't have to trawl through the huge list of potentially unnecessary packages now that I've changed what I want. I don't (typically) manage the version number listed in packages.config, I have a few scripts to help me do that:

install.ps1 has two uses: 1. install on a new system by cloning (or downloading the zip of) the repo and running the script, which will download and install the exact versions of the software that is declared in packages.config. 2. If I want a new package I can edit the packages.config to add the line then run install.ps1 (this second capability could use some UX work).

update.ps1 lists all of the packages that are currently out of date, gives you the option to update all of them, and then (regardless of which option you chose) rewrites packages.config with the currently installed versions. This allows you to use git to identify and manage the version differences between multiple windows installations and also upgrade all of them easily.

This doesn't configure any of the applications, but at least they're installed and they're the same version when you move from one computer to another. You can also quickly identify version differences between installations which makes debugging application version problems much easier. And you also don't forget what you have installed.

greymeister · on April 13, 2020

Haha, nuke and pave as a first resort rather than a last resort.

rconti · on April 13, 2020

Over the years I went from making fun of Windows Admins rebooting to fix everything, to, say, 12 years ago rebooting my unix boxes any time I made a change.

Virtually by definition if I was changing something, it wasn't in production at the moment, and I just learned my life was so much easier if I made sure my changes were really committed to state at the moment I made the change rather than learning it the hard way at 2am 6 months down the road, and desperately trying to remember what I had "fixed" and why.

Granted, these were still pets, but at least they were well-trained pets.

imhoguy · on April 13, 2020

How that could work with suspend/hibernation when one doesn't do reboot for weeks?

I rarely shutdown Linux desktop. I also keep desktop VMs with project context suspended to just reopen it next day or in a month to be at place where I stopped the work.

pas · on April 13, 2020

Why wouldn't it work with suspend? NixOS works because it's brutally up-front about its declarativeness. Since you have to specify everything in your config that's not what the default gives you, you'll have your own changes in your config file (or files).

It's great because it separates /etc into vendor-provided defaults and your own customizations. It's not so great because it's not that automatic, you need to script it.

systemd/Lennart also explored this topic a bit: https://www.youtube.com/watch?v=pL0AMLiwPj8

https://www.freedesktop.org/software/systemd/man/systemd-vol...

lmm · on April 14, 2020

It couldn't. You need to enforce that you really do reboot pretty often (e.g. by scheduling a weekly reboot).

zeveb · on April 14, 2020

Sounds pretty neat. I wonder if the same is achievable with Guix, which is based on Nix. Now I am gonna have to spend some of my precious free time noodling around the Guix docs!

nojvek · on April 14, 2020

Wouldn’t using k8s that needs docker images built from docker files achieve this ?

jackcviers3 · on April 13, 2020

Git. Vagrantfile. Last pass. G Suite. I can work anywhere if I've got a vm.

ggm · on April 14, 2020

wipes the bits they dont consider persisting, but designs a persistence model in ZFS to say "ok: next time, keep this bit"

takes discipline, but interesting.

ghuntley · on April 13, 2020

Check out the NixOS for existing sysadmins workshop over at https://github.com/ghuntley/workshops/tree/master/nixos-work...

If you want a TL;DR overview of NixOS then start here https://github.com/ghuntley/workshops/tree/master/nixos-work...

ghuntley · on April 13, 2020

For an advanced example of overriding nixpkg and usage of pinning/override layers (ie some mandate or reason to run super old version of grpc) — https://github.com/digital-asset/daml/blob/master/nix/nixpkg...

crimsonalucard · on April 13, 2020

I love immutability it makes things ten times easier. This isn't really immutability? It looks like an infrastructure refresh on deploy. If you are refreshing something by definition it's probably not immutable. I guess the author means getting rid of the mutation of state.

I'm just waiting for the day they can make the database immutable. It'll probably look something like git.

elbear · on April 13, 2020

It's immutable in the sense that the system files are generated by Nix and they're read-only. So, if you want to make a change to your system, you change the Nix files and rebuild the entire system. The old version of the system remains /nix/store and you can rollback to it.

You could also call it "declarative infrastructure", because you declare in your configuration.nix how the system should look like: users, packages, file paths, etc.

OJFord · on April 13, 2020

> This isn't really immutability? It looks like an infrastructure refresh on deploy.

The title says 'erase' rather than 'refresh', and does acknowledge they're 'mutable systems'.

crimsonalucard · on April 13, 2020

The HN title also says immutability, so I'm referring to that. "Erase" and "immutability" are contradictory. The author clearly can only mean one.

jedberg · on April 13, 2020

> I'm just waiting for the day they can make the database immutable. It'll probably look something like git.

Netflix had immutable databases back in 2012. I mean, as close as you can get I suppose. We could lose 1/3 of the nodes and keep running normally, and lose 2/3 and keep reading.

It was built with Cassandra and based on the Dynamo model.

There was also an open source, fully in memory database with the same reliability: https://github.com/Netflix/dynomite

seb314 · on April 13, 2020

you seem to imply that it is no longer like this? If so, why?

jedberg · on April 13, 2020

Oh no I'm saying it was like that then when I was there, and I can't speak to how it is now.

I assume it is still that way, because it would be odd otherwise. But I just can't say for certain because I'm not there anymore.

jiveturkey · on April 13, 2020

agreed. this is reproducibility, not immutability. When dealing with such things, precision in language is important. I don't understand the downvotes for you.

But, more than anything this is an ad for nixOS.

Nuzzerino · on April 13, 2020

> I'm just waiting for the day they can make the database immutable.

Wouldn't event stores with CQRS effectively do this?

crimsonalucard · on April 13, 2020

No. You still need a regular database due to read performance. If I want to read the "latest" data, a search on an event store for a piece of data that wasn't changed in the past year would take an inordinate amount of time.

Additionally what if I want to look up a "row" of data that was modified 5 times on 5 different "columns" at various dates across 3 years? That's an aggregation job across 3 years of event data.

For event sourcing you still need to turn the "event" into an actual operation and record that database operation in a classic database.

Event stores just make the "event" the source of truth. It doesn't get rid of regular databases.

Traditionally other services read a single entity database/service and use that as a source a truth. Now a single button click records data across several databases and several services. It's not necessarily a better architecture, just different/buzz-wordy, and definitely more complicated.

waheoo · on April 13, 2020

Look up database as a value rich hickey talk on datomic which essentially does this.

Bottleknecks writes through an acid layer, makes reads against any single database connection (the value) immutable.

Scarbutt · on April 13, 2020

Querying history(values of attributes in time) is very very slow in datomic. It is meant for auditing/troubleshooting, not to be used for your application domain.

Nuzzerino · on April 13, 2020

> For event sourcing you still need to turn the "event" into an actual operation and record that database operation in a classic database.

That's where the CQRS architecture comes into play. An event is recorded to the event store, the projection database is updated to the latest point in time, then any further reads are done from the projection database.

cachehit · on April 13, 2020

>I love immutability it makes things ten times easier.

Me too. Mutable State is to Software as Moving Parts are to Hardware :-)

[link redacted]

crimsonalucard · on April 13, 2020

The next level after making your variables immutable is to get rid of variables all together. Just function pipes from IO in to IO out.

It's called the point free style.

ashishb · on April 13, 2020

Why not use docker containers which will do this by default for stateless portion of your infrastructure?

ghuntley · on April 13, 2020

Docker containers are not reproducible and building the containers takes bloody forever/poor cacheability.

Look at every Dockerfile

FROM Ubuntu/Ubuntu

RUN apt-get update (or apt-get install) # BOOM — no longer possible to reproduce the build

GordonS · on April 14, 2020

Sure, if you always rebuild your container images - but surely a more typical approach is to build them and put them in a container registry?

Of course, they would be rebuilt when base layers change, but if you really want exactly the same image, you reference it by a digest, which will give you a point-in-time image.