We use Terraform to manage the majority of our infrastructure as code. Terraform enables us to automate provisioning and easily review changes. Since we didn’t manage all of our infrastructure in Terraform—or monitor our AWS environment for unmanaged resources—our list of unmanaged resources continued to grow.
Resources that aren’t managed via Terraform are hard to keep track of and can potentially have security implications. Resources such as Route53 records, S3 buckets, and KMS keys were causing some problems when not managed in Terraform:
- We needed to ask a couple engineers to perform actions on our production environment that involved these resources. One example was to add DNS records for domain verification. These changes didn’t reap the benefits from GitOps, such as review and revertability.
- Resources not in Terraform differed across staging and production environments, and we didn’t have a deep grasp about how they differed.
- It was possible for an engineer to make temporary changes to resources and forget to revert them. This was a particular problem for our staging environment.
- Any static analysis we applied to our Terraform configuration wouldn’t apply to these unmanaged resources.
Moving existing resources to Terraform
The ability to create and destroy resources is well-supported in Terraform, but importing resources isn’t as intuitive. The team at Segment worried about inadvertently destroying resources when getting started, so they created a separate AWS account for Terraform and gradually removed resources from their old AWS account while creating the equivalent resource in their Terraform-managed AWS account. For our use case, this felt too heavy-handed. We already had an environment that was mostly, but not wholly, Terraform-managed.
Within the same AWS environment, we also considered creating an equivalent resource in Terraform, followed by the deletion of the corresponding, manually-created resource. For resources that had to have particular identifiers, such as S3 buckets or Route53 records, this was a non-starter—we would have had to delete the manually-created resources first and incur some downtime. For resources that didn’t have this problem, it wasn’t clear how we could mechanically verify that the resource described in Terraform was identical to the resource we created manually beforehand.
Thankfully, Terraform already has a command to add already existing resources to Terraform state without creating or destroying anything: terraform import. The import command will add a resource to state given configuration, but it will not generate that configuration in your .tf files for you. A subsequent terraform apply after the resource is imported will change the resource to reflect the configuration, so it is important to make sure that the terraform configuration reflects the current state of the resource.
We then broke up this problem into a few steps:
- Regularly monitor our environments for resources that are not managed via Terraform
- Import these resources via the import command
- Add an observability monitor to maintain the invariant that all resources that should be managed via Terraform are managed via Terraform
Thankfully, the monitoring tooling already existed. We ran driftctl as a Github Action on a regular cadence to determine what resources are in our AWS environment but not our Terraform state. The team at driftctl is also quite responsive. One caveat, however, is that driftctl doesn’t yet monitor all possible AWS resources. Here is the current list of resources that they support.
Next, we focused on the import command timing. If running the import out-of-band of your continuous deployment (CD) pipeline, a terraform apply at the wrong time could:
- Delete your newly imported resources
- Attempt to create redundant copies of the resources that you intend to import
If applying Terraform changes automatically in CD, it is important to atomically merge the new Terraform configuration into the CD pipeline and perform the Terraform import. Only then will the subsequent terraform apply become a safe operation.
In our case, we temporarily disabled applying any Terraform changes via CD before running our import step.
Since this was disruptive to our existing CD workflows, we wanted to go through this process as few times as possible. We aimed to perform dry runs of imports in development. Once we were sure that we described our configurations and import commands correctly, running our actual imports should be as simple as running a dry run script with the --do-it flag enabled.
Making Terraform imports dry-runnable
To verify whether our Terraform imports match the resource, we wanted to have a clean terraform plan. This terraform plan in turn needs to be compared against a Terraform state with imported resources. Terraform workspaces provide tooling to manage duplicate, temporary state files. After finishing the dry run, we can delete the workspace.
Be careful when working with state files. State files can store sensitive information, such as database passwords.
Our workflow looked something like this:
We pieced together a script to abstract away these details so that developers only focus on the address-id pairs (and the accompanying Terraform configuration).
Determining Terraform configuration
We’ve now reduced importing resources in Terraform to a process of guess-and-check. Even if we are completely incorrect in our configuration, the dry run’s terraform plan will give you a diff you can interpret as the reversed instructions to import a resource as is.
Let’s say we were trying to import a Route53 zone. We wrote up an attempt at Terraform configuration and now run our dry run script:
Oops! This configuration attempt forgot to associate the resource with a VPC, as it is a private hosted zone. The comment attribute was also missing.Even this process can feel somewhat tedious, so we used Terraformer to generate resource configurations for us. These configuration attribute values are string literals, so they still require some tweaking to instead reference attributes to already-Terraform-managed resources.
What resources are we responsible for?
Not all resources should be managed via Terraform. Here are some examples of resources driftctl marked as drifted which we decided not to import:
- IAM roles, policies, and policy attachments generated by AWS. Some examples include roles with the prefixes AmazonSSMRoleFor, AWSServiceRoleFor, AWS_InspectorEvents, and AWSReservedSSO . The AWS console’s policies page can give you some hints about whether certain IAM roles are customer managed versus AWS managed.
- Resources managed via a separate cloudformation stack
- Default AWS Network ACL rules
- AWS IAM Access keys, which cannot be changed if the resource is imported.
Thankfully, driftctl supports a .driftignore file that can ignore resources. Once we imported all of our resources, we found that deciding whether to manage a new resource in Terraform became easier. We now have more temporal context when a resource suddenly appears in driftctl that wasn’t there before, so we can make a guess about why the resource was created.
What we learned
After importing, deleting, or ignoring all the resources in our production and staging AWS environments that were not managed via Terraform, we added a Datadog Monitor to notify us via Slack if a new resource appears in driftctl. This monitoring gives us the added benefit of learning when AWS automatically creates certain resources in our AWS environment.
Making changes with Terraform can feel risky. Making changes replicable in a development environment helps us build confidence in the safety of these changes. Since Terraform imports are primarily changes in Terraform state, we were able to simulate imports by creating throwaway duplicates of our Terraform state. This enabled us to find a solution that was more effective than trying to create and then import a resource in a development AWS environment.
Ready to join the team?
Want to improve how we manage our infrastructure? Vanta’s engineering team is hiring. Join us!