Case Study: Why Ably Switched from Terraform Cloud to Scalr
Scalr is a cost effective, drop-in replacement for Terraform Cloud, with feature parity and many quality of life improvements.
About Ably
Ably provides the capabilities to deliver live experiences without go-live delays, runaway costs, and unhappy users. Ably's Serverless WebSocket platform reliably handles high-scale realtime data distribution to web and mobile apps at the edge, so engineering teams can focus on core product innovation without having to provision and maintain complex realtime infrastructure.
Developers at companies like HubSpot, Toyota, and Webflow use Ably's APIs and global edge network to power things like business-critical live chat, food order delivery tracking, and document collaboration for more than 300 million people each month.
The Challenge: Please Wait in the Queue
As Ably grew, so did their infrastructure footprint, which led to more and more Terraform workspaces and Terraform operations. They began noticing that runs were getting stuck in a queue. Digging into the details of their TFC plan, it only included one runner. This was sufficient when they first started using Terraform, but it was time for an upgrade.
HashiCorp does not provide detailed pricing information on their website, so Ably set up a call with their sales team to find out more. The quotes that came back led them to evaluate other providers.
Ably's Terraform Requirements
When evaluating different Terraform service offerings, the key areas Ably looked at were:
- Migration effort — will we need to drastically change our current code base and ways of working?
- State management — is state storage part of the service?
- SLA — Terraform is critical to our workflow, so we need a reliable service with an SLA to back it.
- Support terms — if something goes wrong, how quickly can we expect a response?
- Cost, including future growth — if we grow by 10x, will the pricing still be competitive?
- Disaster recovery — are we going to be dead in the water if the provider suffers an outage?
Why Scalr
Ably discovered that Scalr is the only other Terraform service provider to support the remote and cloud Terraform backend configurations. This was vital both to the developer experience and from an operational perspective. Without these features, they would have had to spend significant resources rolling out their own state management system, and ensuring access control was done in such a way engineers could still perform a terraform plan locally, but not have the permission to run terraform apply.
Being able to perform a terraform plan against a production workspace without committing code is vital for the developer experience by providing short feedback loops.
Scalr also ticked all the other boxes:
- 99.9% uptime SLA
- 2-hour ticket response time
- Transparent pricing with an easy-to-use calculator on their pricing page
- Custom hooks — the ability to run scripts at specific events in the Terraform workflow, such as post-apply for automatic DR state export and Terraform CDK support
The Migration
There were two stages to Ably's switch from TFC to Scalr:
- Build-up: Creating all the prerequisite resources, such as Scalr environments and workspaces, and AWS IAM roles.
- State Migration: Migrating the state for all workspaces from TFC to Scalr.
The goal throughout this process was to minimize any disruption to engineers and allow them to use TFC until the last second.
Build-up
Scalr supports AWS IAM role delegation, which meant Ably could use temporary credentials and could remove the AWS IAM user they had to use for TFC.
They used the Scalr Terraform Provider to effectively make Scalr manage itself after an initial bootstrap step. All TFC workspaces were already managed via Terraform code, so creating the Scalr equivalents was straightforward: branch off main, add the Scalr Terraform provider, change tfe_workspace to scalr_workspace, and update a few variable names.
State Migration
Using both Scalr's and TFC's APIs, Ably created a bash script to migrate state across all workspaces:
- Parsed the workspace name from code
- Found the corresponding
workspace_idin both TFC and Scalr - Downloaded the existing state from TFC and Scalr
- Compared the
serialandlineageas safety checks - Uploaded the state to Scalr if the TFC serial was larger or the state didn't exist in Scalr
With the checks for state serial, they could run this script on demand to transfer the latest state files. This allowed them to easily run a plan on each Scalr workspace to verify permissions, variables, and that Terraform did not show any differences compared to TFC.
Going Live
On the go-live day, Ably locked all TFC workspaces, ran the state migration script once more, and executed plans on all Scalr workspaces one last time. After they passed, they merged the pre-prepared PRs and asked engineers to update any feature branches they were working on.
Total downtime: approximately 20 minutes. It all went without a hitch.
Results
Ably has been using Scalr for over 3 months and is very happy with the experience. As well as being reasonably priced, they also have a great relationship with the Scalr team, who are always eager to hear product suggestions and help out with any questions. Several features that Ably requested have now made their way into the product, such as Git submodule support and pre-init custom hooks.
Key Takeaways
| Metric | Detail |
|---|---|
| Migration Downtime | ~20 minutes |
| Migration Approach | Phased — build-up via Terraform provider, state via API bash script |
| Primary Driver | Run concurrency (queue bottleneck) and cost |
| Key Requirement | remote/cloud backend support for local terraform plan |
| Result | No disruption to engineers, responsive support, feature requests shipped |
Further Reading
For a complete step-by-step guide to planning and executing your own migration, see our comprehensive pillar article: Migrating Off Terraform Cloud/Enterprise: A Complete Guide.