Engineering at Vanta: How we imported our AWS environment into Terraform
We use Terraform to manage the majority of our infrastructure as code. Terraform enables us to automate provisioning and easily review changes. Since we didn’t manage all of our infrastructure in Terraform—or monitor our AWS environment for unmanaged resources—our list of unmanaged resources continued to grow.
Resources that aren’t managed via Terraform are hard to keep track of and can potentially have security implications. Resources such as Route53 records, S3 buckets, and KMS keys were causing some problems when not managed in Terraform:
- We needed to ask a couple engineers to perform actions on our production environment that involved these resources. One example was to add DNS records for domain verification. These changes didn’t reap the benefits from GitOps, such as review and revertability.
- Resources not in Terraform differed across staging and production environments, and we didn’t have a deep grasp about how they differed.
- It was possible for an engineer to make temporary changes to resources and forget to revert them. This was a particular problem for our staging environment.
- Any static analysis we applied to our Terraform configuration wouldn’t apply to these unmanaged resources.
Moving existing resources to Terraform
The ability to create and destroy resources is well-supported in Terraform, but importing resources isn’t as intuitive. The team at Segment worried about inadvertently destroying resources when getting started, so they created a separate AWS account for Terraform and gradually removed resources from their old AWS account while creating the equivalent resource in their Terraform-managed AWS account. For our use case, this felt too heavy-handed. We already had an environment that was mostly, but not wholly, Terraform-managed.
Within the same AWS environment, we also considered creating an equivalent resource in Terraform, followed by the deletion of the corresponding, manually-created resource. For resources that had to have particular identifiers, such as S3 buckets or Route53 records, this was a non-starter—we would have had to delete the manually-created resources first and incur some downtime. For resources that didn’t have this problem, it wasn’t clear how we could mechanically verify that the resource described in Terraform was identical to the resource we created manually beforehand.
Thankfully, Terraform already has a command to add already existing resources to Terraform state without creating or destroying anything: terraform import. The import command will add a resource to state given configuration, but it will not generate that configuration in your .tf files for you. A subsequent terraform apply after the resource is imported will change the resource to reflect the configuration, so it is important to make sure that the terraform configuration reflects the current state of the resource.
We then broke up this problem into a few steps:
- Regularly monitor our environments for resources that are not managed via Terraform
- Import these resources via the import command
- Add an observability monitor to maintain the invariant that all resources that should be managed via Terraform are managed via Terraform
Thankfully, the monitoring tooling already existed. We ran driftctl as a Github Action on a regular cadence to determine what resources are in our AWS environment but not our Terraform state. The team at driftctl is also quite responsive. One caveat, however, is that driftctl doesn’t yet monitor all possible AWS resources. Here is the current list of resources that they support.
Next, we focused on the import command timing. If running the import out-of-band of your continuous deployment (CD) pipeline, a terraform apply at the wrong time could:
- Delete your newly imported resources
- Attempt to create redundant copies of the resources that you intend to import
If applying Terraform changes automatically in CD, it is important to atomically merge the new Terraform configuration into the CD pipeline and perform the Terraform import. Only then will the subsequent terraform apply become a safe operation.
In our case, we temporarily disabled applying any Terraform changes via CD before running our import step.
Since this was disruptive to our existing CD workflows, we wanted to go through this process as few times as possible. We aimed to perform dry runs of imports in development. Once we were sure that we described our configurations and import commands correctly, running our actual imports should be as simple as running a dry run script with the --do-it flag enabled.
Making Terraform imports dry-runnable
To verify whether our Terraform imports match the resource, we wanted to have a clean terraform plan. This terraform plan in turn needs to be compared against a Terraform state with imported resources. Terraform workspaces provide tooling to manage duplicate, temporary state files. After finishing the dry run, we can delete the workspace.
Be careful when working with state files. State files can store sensitive information, such as database passwords.
Our workflow looked something like this:
We pieced together a script to abstract away these details so that developers only focus on the address-id pairs (and the accompanying Terraform configuration).
Determining Terraform configuration
We’ve now reduced importing resources in Terraform to a process of guess-and-check. Even if we are completely incorrect in our configuration, the dry run’s terraform plan will give you a diff you can interpret as the reversed instructions to import a resource as is.
Let’s say we were trying to import a Route53 zone. We wrote up an attempt at Terraform configuration and now run our dry run script:
Oops! This configuration attempt forgot to associate the resource with a VPC, as it is a private hosted zone. The comment attribute was also missing.Even this process can feel somewhat tedious, so we used Terraformer to generate resource configurations for us. These configuration attribute values are string literals, so they still require some tweaking to instead reference attributes to already-Terraform-managed resources.
What resources are we responsible for?
Not all resources should be managed via Terraform. Here are some examples of resources driftctl marked as drifted which we decided not to import:
- IAM roles, policies, and policy attachments generated by AWS. Some examples include roles with the prefixes AmazonSSMRoleFor, AWSServiceRoleFor, AWS_InspectorEvents, and AWSReservedSSO . The AWS console’s policies page can give you some hints about whether certain IAM roles are customer managed versus AWS managed.
- Resources managed via a separate cloudformation stack
- Default AWS Network ACL rules
- AWS IAM Access keys, which cannot be changed if the resource is imported.
Thankfully, driftctl supports a .driftignore file that can ignore resources. Once we imported all of our resources, we found that deciding whether to manage a new resource in Terraform became easier. We now have more temporal context when a resource suddenly appears in driftctl that wasn’t there before, so we can make a guess about why the resource was created.
What we learned
After importing, deleting, or ignoring all the resources in our production and staging AWS environments that were not managed via Terraform, we added a Datadog Monitor to notify us via Slack if a new resource appears in driftctl. This monitoring gives us the added benefit of learning when AWS automatically creates certain resources in our AWS environment.
Making changes with Terraform can feel risky. Making changes replicable in a development environment helps us build confidence in the safety of these changes. Since Terraform imports are primarily changes in Terraform state, we were able to simulate imports by creating throwaway duplicates of our Terraform state. This enabled us to find a solution that was more effective than trying to create and then import a resource in a development AWS environment.
Ready to join the team?
Want to improve how we manage our infrastructure? Vanta’s engineering team is hiring. Join us!
PCI Compliance Selection Guide
Determine Your PCI Compliance Level
If your organization processes, stores, or transmits cardholder data, you must comply with the Payment Card Industry Data Security Standard (PCI DSS), a global mandate created by major credit card companies. Compliance is mandatory for any business that accepts credit card payments.
When establishing strategies for implementing and maintaining PCI compliance, your organization needs to understand what constitutes a Merchant or Service Provider, and whether a Self Assessment Questionnaire (SAQ) or Report on Compliance (ROC) is most applicable to your business.
Answer a few short questions and we’ll help identify your compliance level.
Does your business offer services to customers who are interested in your level of PCI compliance?
Identify your PCI SAQ or ROC level
The PCI Security Standards Council has established the below criteria for Merchant and Service Provider validation. Use these descriptions to help determine the SAQ or ROC that best applies to your organization.
Good news! Vanta supports all of the following compliance levels:
A SAQ A is required for Merchants that do not require the physical presence of a credit card (like an eCommerce, mail, or telephone purchase). This means that the Merchant’s business has fully outsourced all cardholder data processing to PCI DSS compliant third party Service Providers, with no electronic storage, processing, or transmission of any cardholder data on the Merchant’s system or premises.
Get PCI DSS certified
A SAQ A-EP is similar to a SAQ A, but is a requirement for Merchants that don't receive cardholder data, but control how cardholder data is redirected to a PCI DSS validated third-party payment processor.
Learn more about eCommerce PCI
A SAQ D includes over 200 requirements and covers the entirety of PCI DSS compliance. If you are a Service Provider, a SAQ D is the only SAQ you’re eligible to complete.
Use our PCI checklist
A Report on Compliance (ROC) is an annual assessment that determines your organization’s ability to protect cardholder data. If you’re a Merchant that processes over six million transactions annually or a Service Provider that processes more than 300,000 transactions annually, your organization is responsible for both a ROC and an Attestation of Compliance (AOC).
Automate your ROC and AOC