Infrastructure as Code (IaC) is a powerful process - replacing manual, error prone and expensive operations with automated, consistent and quick provisioning of resources. In many cases, IaC is dependent on existing infrastructure, typically including a configuration management system. Chef, Puppet and SaltStack are all commonly referenced players in this market, each requiring resources to be in place and having their own difficulties in setup and maintenance. As we move to microservices and container orchestration, our need for resource-intensive and complex tooling to provision infrastructure and application dependencies diminishes. So how do you solve the chicken-and-egg problem of standing up IaC without relying on other infrastructure?
Our solution in Amazon Web Services (AWS) was Terraform, cloud-init, Minimal Ubuntu and Ansible. Terraform was an easy choice given our existing use and expertise with the product for provisioning in AWS. We were building Amazon Machine Images (AMIs) using Packer with a minimal set of software packages to bootstrap systems for dynamic configuration based on their role by our configuration management system. However, every change, no matter how subtle it was, required building a new AMI. It also didn't save much on boot time since an agent would configure the system dynamically at first boot-up. We were also spending a lot of time maintaining a configuration management system and scripts, as well as keeping up on Domain Specific Languages (DSLs).
Enter Minimal Ubuntu - images designed for automating deployment at scale with an optimized kernel and boot process. Needing only to install a small set of packages and most of our tooling at the orchestration layer, we are still able to provision a system that is ready for production traffic in under four minutes. The simplicity of these images also provide greater security and ease of administration.
Cloud-init is installed on Minimal Ubuntu, which allows further configuration of the system using user data. Given the lack of documentation and more sophisticated features of other configuration management systems, we were still looking for something else. Ansible became an attractive option for several reasons: simplistic yet powerful approach to automation, readable configuration and templating using YAML and Jinja2 versus a DSL, and the community contributions and industry embracement.
Most of the documentation for Ansible, though, focuses on the use of a master server that pushes configuration to clients. This doesn't solve the problem of IaC without relying on infrastructure. Also, maintaining dynamic inventories of clients and pushing configurations to systems in auto scaling groups that need to be ready for production traffic as soon as possible did not make sense. Ansible has a concept of local playbooks, but there isn't much light shed on the power and simplicity of it. This blog post will walk you through combining these tools to build a bastion host configured with Duo Multi-Factor Authentication (MFA) for SSH and a framework to easily add additional host roles. For brevity, other configuration of our bastion hosts is left out. You will want to perform further tuning and hardening depending on your environment.
Starting with Terraform (note all examples are using version 0.12.x) at the account/IAM level, you will need a EC2 instance profile with access to an S3 bucket where the Ansible playbook tarball will be stored. Terraform for creating the S3 bucket is left to the reader - it is straightforward, and many examples exist for it. It is recommended to enable encryption at rest on the S3 bucket as sensitive information may be required to bootstrap a host:
With a policy to read the S3 bucket and an instance profile the bastion host can assume, define the bastion host EC2 instance:
Most variables are self-explanatory. For this exercise, we will bring attention to the ami and user_data values. The ami value can be found by selecting the version of Ubuntu and the Amazon region for your instance here: https://wiki.ubuntu.com/Minimal.
The user_data value defines the cloud-init configuration:
The cloud-init.cfg specifies a minimal configuration - installing the AWS CLI tool and Ansible to handle the rest of the process:
The shell script following the cloud-init template downloads the Ansible playbook tarball and executes it. Variables for the environment (dev, stage, prod), VPC name and AWS region are passed to customize the configuration based on those settings. The role variable is passed as a tag to define what role the host will play, somewhat correlating to Ansible roles (explained later):
The Ansible tarball is created from another Git repository with the Ansible playbook and uploaded to the secure S3 bucket. The directory layout is as follows:
Ansible roles provide convention over configuration to simplify units of work. We break out each package into a role so they can be reused. We leverage Ansible tags to associate Ansible roles with our concept of a host "role," i.e., bastion. This keeps site.yml simple and clear:
always is a special tag, specifying to always run a task regardless of the tag specified at execution. It provides the mechanism to run common tasks regardless of the host "role." For this example, we will only use roles/common/tasks/main.yml to load our variable hierarchy but could include tasks for creating admin users, installing default packages, etc.:
This provides a powerful and flexible framework for defining variables at different levels. Site level variables apply to all hosts. Variables that might differ between dev and prod (i.e., logging host) can be defined at the environment level in vars/dev/main.yml and vars/prod/main.yml. main.yml must exist for each environment, VPC and AWS region, if only just "—" for its content. In this example, we will define one site level variable in vars/main.yml:
This defines the variable aws.secrets, an S3 bucket and path for downloading files that need to be secured outside of the Ansible playbook Git repository. This value can be customized per environment, VPC and/or region by moving it down the variable hierarchy. Moving onto bastion, roles/bastion/tasks/main.yml disables selective TCP ACKs and installs Ansible roles for software, which for this example, is limited to duo:
Lastly, we have duo in roles/duo/tasks.yml:
The duo configuration file contains secrets, so it is downloaded from the encrypted S3 bucket in the secrets/bastion path:
The remaining files are kept in version control for auditing:
Create the Ansible playbook tarball that extracts to ansible/ and upload it to the S3 bucket specified in Terraform. Apply the Terraform for IAM first, and then continue to the EC2 instances. Minutes later, you will be able to login to your bastion hosts with Duo MFA.
You now have a framework that is easy to extend – add software packages to existing host roles, customizing configuration, and adding new host roles that consume software packages. A special thanks to @_p0pr0ck5_ for his work on the variable hierarchy loading in Ansible.