Introduction to Terraform
January 13th 2018 - Cory Finger
Over the years, there have existed a wide variety of solutions to manage technological infrastructures. All of these tools vary greatly in complexity, utility, customization, and price. Why does Terraform matter? What makes Terraform interesting and useful?
Before that, it'd probably be useful to step back and take a look at why you should use Terraform when building the infrastructure for your web applications.
In future Terraform tutorials, I'll be going more into depth about how to use many aspects of Terraform.
If you're interested in topics like this, why not subscribe to the blog?
Infrastructure as Code
"Infrastructure as code" is here used to mean "the important state of the infrastructure at any given time is completely contained in human-readable text files."
When the engineering team is comprised of a lone engineer, everything can live within that person's head. This can often go on for awhile without incident.
But - as the company grows, and by extension engineering team grows, a problem of knowledge-transfer starts to reveal itself. The initial engineer needs to teach the new team about all the servers, configurations, and interacting machinery of the infrastructure.
This comes at the worst possible time. Website traffic often grows with the company. Servers start to fail and fall apart under the strain. Out of the entire engineering team, only one person knows which servers to kick to keep everything running. It's hard to transfer knowledge while constantly putting out server fires.
By keeping the entire state of the infrastructure as code, any engineer can read it and understand every piece of machinery in the infrastructure. They can use this to make informed decisions on how to scale to keep up with the company's traffic. Instead of learning the intricacies of the company's infrastructure via word of mouth and memory, an engineer can learn how to read Terraform files and know everything they need to be productive.
Even more importantly, keeping the infrastructure as code allows the team to keep the state of the infrastructure in source control (like Git and SVN). If anything starts breaking, engineers will be able to know the current state of the system and every change that's been recently made (which could have caused the breaks). They can roll the infrastructure back into a stable state much faster than they would if they had to play detective on every server or their coworker's memory.
Infrastructure as code can mean the difference between hours of downtime and minutes of downtime. It can mean the difference between months of onboarding new employees and days of onboarding new employees. When comparing the benefits to the (learning) cost of the concept, infrastructure as code is a must-have for any technology company that plans to scale (or has scaled) their infrastructure to more than a single server or engineering team to more than a single engineer.
"Immutable infrastructure" is here used to mean "infrastructure with a state that, once defined, is not changed through any unobserved/undocumented process."
Over the past few years, the concept of an immutable infrastructure has been gaining a lot of steam. This is in part due to advances in VM technology (a-la Docker and other containers), efforts from the open source community, and to increased popularity in topics like big data analysis, multi-threading, and functional programming within the corporate world.
The reason immutability is useful is that it allows us to think of problems and components in a simpler way. If a server can be in States A, B, C or D - When interacting with that server, we might have to know whether the server is in State A or State B before deciding what to do. Immutability removes that variable from the situation and lets us focus on the important parts of a problem.
This problem with mutability becomes even more apparent when dealing with multiple servers. When deploying a change to 10, 20 or 40 servers, things can get complicated fast if every possible server state hasn't been taken into account. Do some of the servers have Ruby installed? Did some have the Ruby installation silently fail when you deployed it 6 months ago? Will that cause a catastrophic failure on this deploy? What else could be wrong with them? Did some of the servers get their RAM downgraded to decrease costs? Are they all connected to the load balancer?
Some of these problems are helped by the infrastructure as code concept that was mentioned earlier. But, that concept is somewhat meaningless if the servers can change out from under the code. If the configuration only represented the initial state of the infrastructure, the real state can silently drift over time until it becomes completely divorced from the initial configuration. This often leads to servers that randomly fail or servers that magically succeed when they should be failing. This sense of magic makes it so that engineers are terrified to make changes and potentially ruin the magic that's holding the company together.
Terraform deals with this problem by destroying and recreating resources that require substantial changes. While that sounds scary at first, in practice, it is often harmless and very useful. If anything goes wrong, the person making the change can undo it and put the infrastructure back into a stable state. Furthermore, Terraform ensures that the engineer is aware of every change about to be made before it starts making those changes. So the engineer can make sure that things aren't needlessly or accidentally destroyed at inopportune times.
In practice, immutable infrastructures can be more easily understood and modified by the engineers in charge of them. By recreating and destroying resources that require substantial changes, Terraform ensures that the configuration always represents the state of the infrastructure at any given time. By knowing that the configuration is correct, engineers can more effectively change it and respond to evolving business needs.
Unified PaaS/SaaS Interface
Whether using Amazon Web Services (AWS), Google Cloud, DigitalOcean or Microsoft Azure - it all looks the same to Terraform. Want to set up your company Github or Bitbucket accounts? It's the exact same process.
This can seem like a hassle at first. Having to learn a second way to interact with tools you've been using for years. But after a while it becomes second-nature and very liberating.
Does it make sense for your company to switch from AWS to Google Cloud? Without Terraform, engineers would have to learn an entirely new set of tools and web interfaces before they could even start considering that transition. The entire engineering team would need to be trained in this new interface and basic tasks become very hard to execute. This would also be the case when hiring new engineers that have used other cloud services in the past and need to learn the company's primary service.
With Terraform, it's just a matter of looking up the right modules and applying them. A multiple-month transition can turn into a week-long transition. A day's work turns into an hour's work. Your company can use whatever service makes the most sense for your use-case (and financial resources). Engineers that used other services can be onboarded faster. When it comes time to cut costs by changing the stack, this can be quickly executed and entirely documented.
Even outside of the service-agnostic interface, modifying a text-based configuration file becomes so much faster than navigating a web interface. Even for an experienced engineer in an established infrastructure, it might take navigation through 10 different pages of settings and a number of memorized commands, with a lot of load-screens and sanity checks in between, to get a server started with the proper configurations. Compare this with the few seconds it takes to change a text file or a couple minutes of writing a new configuration file.
These productivity benefits of Terraform don't end with PaaS. New, freely-available, open-source providers are being created every day to manage a large host of tools. From source control (through providers for Github, Bitbucket, and Amazon CodeCommit) to monitoring tools (through providers for Datadog and New Relic) to Continuous Integration and Continuous Deployment pipelines.
The productivity and stability gains attainable through Terraform make this tool a must-have with a stack and an engineering team of any size.
The fact that it is a free, open-source and platform-agnostic tool has cemented Terraform as a tool that's likely here to stay. Worthy of a place in any engineer's toolbox.
Future chapters will go more into depth as to how you can add this tool to your toolbox for use in your web applications!
In the meantime - Have anything you'd like me to include in these chapters? Are there any tools you'd like me to write about in the future?
Let me know in the comments below!
Check out Chapter 2