With Terraform the open source CLI and libraries for defining and deploying infrastructure are fully usable, even for large systems.
What I would like to know is, how usable Pulumi is without a paid subscription?
I thought when I first looked at Pulumi, the only non-local state backend was the paid Pulumi Service. But looking at it now, they seem to support the normal object store backends that Terraform does (https://www.pulumi.com/docs/intro/concepts/state/). Though the talk at lack of concurrency control seems to imply they don't support locking like Terraform does.
Pulumi is fully functional in open source form. The analogy I like to draw is Git and GitHub. You can use Git fully independent of GitHub, or you can choose to use them together, for a seamless experience within a team. (Not a perfect analogy since we built both Pulumi open source and the Pulumi SaaS, which causes this very confusion!) We don't hold anything back, if it's in the SDK, it's open.
We recently added concurrency control to the alternative backends. I'm sorry the docs are confusing on this matter -- we will get that fixed up. We also have many large customers in production on the open source alone. It's easier with the SaaS just because we handle security, reliability, and sharing with your team along with access controls, auditing, etc. But if you prefer to roll your own there we are entirely happy to have you in the community and help out. Admittedly our marketing materials aren't super clear here and we are working to fix this.
Hope this helps to clear things up and again apologies for the confusion.
You don't need to use the service, although it solves a lot of problems. You can use only the Pulumi SDK and then solve the various state management issues that their service provides in another fashion.
I think its a reasonable business model. The SDK is completely open and free. The service is how they make money .
State is a snapshot of what is deployed at a given time, it is used at the next run to compare if there were any drifts since, eg.: "Did anyone delete a server manually?" etc.
By default it is stored with Pulumi, but if you are on a budget you can just use S3 with a few tradeoffs (concurrent deployments would need a lock that you implement yourself):
Side rant: Terraform has somehow convinced people that mandatory state tracking is a good idea, because it is a useful feature sometimes. It's often more harmful than good.
These tools are just configuration management for the cloud. No other configuration management tool requires your changes conform to something that was pre-existing in a state file. They simply modify the system state to reach what your intended configuration is, without needing to track state.
Orchestration tools that rely entirely on state files to make all changes are poorly designed. State snapshots are a limited view of the past, and do not reflect the actual current state. So you have to grab the state, then see what has changed, and wonder if what has changed was intentional and should be preserved, or if it needs to be overwritten. This is basically distributed change consensus, like merging Git trees, or Paxos. But tools like Terraform basically throw away the current real-world state, like ignoring what's in the mainline branch, because merging is, like, hard, man.
State files are useful when your CM tool cannot understand which resources need to exist in what form. Sometimes you may need to maintain certain infrastructure, but your tool doesn't have an easy way to determine what resources it is maintaining versus the resources that are not managed by it. But in most cases, there are various ways to detect that in code, rather than recording it in a state file. Every CM tool in the world does this, because you don't really care about the previous state as much as reaching your desired state!
The other way state files are useful is as a log of past actions. But that's not the way tools like Terraform use it; they lean on it like a crutch. Rather than just telling you what has changed since the last Terraform run, they can often cause Terraform to just refuse to apply changes, or fail to detect and import existing resources if they weren't put into the state file manually (terraform import) or during terraform apply*.
I think state management is unavoidable for something like Pulumi. If it had to query AWS's API for all of your thousands of resources every time you deployed, deploys would be unbearably slow. So it just assumes your hundreds of DNS records from yesterday are all still there, instead of querying AWS for all of them every deploy. The state file is how that happens.
If you want, you can run pulumi with the --refresh flag and you can see just how much slower it is.
That's more of a cache than a state file. It's great that it supports a cache (other CM tools do too) but you obviously need to refresh it before you deploy changes, or you don't know if your changes will break. Even if you're not changing those specific records, some other change you're trying to make may depend on the latest DNS record values, and so you'd want the latest version of them before you deploy.
Ansible does not need a state file to manage AWS R53 records, but it does support a cache. (This is not an endorsement of the tire-fire that is Ansible.)
Terraform's .plan file is like a cache before you apply changes. Since it's synced up to the state file too, it will fail to apply if the state has changed, so it's useful to "safely" apply only changes that seem like they might work. The problem is the state file acts like a giant boat anchor tied to the HCL code the rest of the time.
One of the strength of pulumi is to export and reimport state, which is represented as an easy to digest json. You can edit state and that is even suggested by pulumi when it's stuck.
You can do that with Terraform too, but that is missing the point. The tool should ignore the state and simply apply the desired configuration, like every other CM tool that has ever existed.
You don't need a state file for Ansible, or Puppet, or Chef, or Salt, or CFEngine, or literally any other such tool.
Ansible will not clean up after itself when you revert a diff. Once added, you can't remove a resource from the desired state unless the tool has a way to remember resources it created so that it can decommission them.
It's funny you mention this. Removing resources from AWS is the one thing that reliably breaks in Terraform all the time.
If you have never tried to destroy all your Terraform resources before, it will probably not work. You'll have to modify your Terraform to get it to jump through the correct hoops in the correct order in order to destroy without dying.
At this very moment I am fighting with Terraform to destroy some resources as part of a code change. AWS wants them destroyed in a very particular way, and Terraform won't create the resources I do want because apply is dying because the destroy is failing.
That's a good point, but sadly not how tools like terraform marketing themselves. If terraform markets itself as "we make clean up easier for that 10% of time when you need it, and to achieve that goal, we give you 3 times more headache by asking you to keep states", I am sure many more people wouldn't make the trade-off.
In my quick read, I didn't see a way not to use the service.
"If this is your first time running pulumi new or most other pulumi commands, you will be prompted to log in to the Pulumi service. The Pulumi CLI works in tandem with the Pulumi service in order to deliver a reliable experience."[0]
I think Pulimi is similar to terraform in that you can use multiple backends for state management, of which their "service" is one[0]. Meaning you could use the CLI backed by an S3 bucket for free.
Because it says: «the next major version of the Pulumi open source project». Apparently, either not very complete, or not very open-source, if self-hosting requires a license.
[0] https://www.pulumi.com/docs/guides/self-hosted/