Self-Hosting a Ghost Blog on AWS

Self-Hosting a Ghost Blog on AWS

Self-hosting your own website can be a bit overwhelming. What if you mess up the server configurations? What if you accidentally destroy your database? What if your server crashes and you have to remember all the little bits and pieces that went into it?

We're going to use Terraform to help towards solving those problems. This tutorial will get you set up with the infrastructure you need to host your own blog. We'll also add an extra step of putting it behind Cloudfront. That'll give your blog the speed it needs to more competitively rank in SEO!

This is also the infrastructure that hosts this blog! So we know it works!

Feel free to follow along with the Github repository associated with this post!


New to Terraform or new to AWS? Check out my posts introducing Terraform or AWS!


File Structure

We'll be creating a Terraform directory that looks like this:

- ghost-database
- ghost-server
- .gitignore
- main.tf
- outputs.tf
- providers.tf
- security_groups.tf
- variables.tf

.gitignore

# Compiled files
*.tfstate
*.tfstate.backup

# Module directory
.terraform/

.DS_Store
/variables.tf

Nothing too special here. The main thing to keep in mind is that your variables.tf file should be in your .gitignore so you don't accidentally upload your AWS credentials to Github!

How you share those credentials among your team is up to you.

variables.tf

I always like to start with the variables.js file. It helps show the configuration options available in the set up and allows people to jump right in and terraform apply whenever they feel the need!

Let's take a look at the file:

# AWS Config
variable "aws_access_key" {
  default = "YOUR_ACCESS_KEY_ID"
}

variable "aws_secret_key" {
  default = "YOUR_SECRET_KEY"
}

variable "aws_region" {
  default = "us-west-2"
}

variable "key_pair_name" {
  default = "your_key_name"
}

variable "key_pair_location" {
  default = "~/Documents/your_key_name.pem"
}

# This ACM certificate MUST be in us-east-1
variable "cloudfront_ssl_acm_arn" {
  default = "arn:aws:acm:us-east-1:YOUR_AWS_ID:certificate/YOUR_CERTIFICATE_ID"
}

# Ghost Config
variable "domain_name" {
  default = "my-ghost-blog.com"
}

variable "db_name" {
  default = "ghost_blog_db"
}

variable "db_user" {
  default = "ghost_user"
}

variable "db_pass" {
  default = "593!MyB!og!Ghost!Pass"
}

aws_access_key, aws_secret_key and aws_region - The access keys to our AWS IAM User and the location where you'd like the servers to be created.

key_pair_name - The name of the SSH Key Pair we'd like to use to provision the servers

key_pair_location - Where on the hard drive our key pair's PEM file can be found

cloudfront_ssl_acm_arn - The ARN of our SSL certificate stored in Amazon Certificate Manager. This is how we'll get HTTPS on our blog for free to secure our readers communication and possibly give us an SEO boost. If you don't have one created, you can take a look here. Make sure it's in us-east-1 so that we can use it with Cloudfront!

domain_name - The domain name from which our blog will be accessed, without the www. In my case, this would be pragmacoders.com.

db_name - This is the name of the database where all the Ghost data will be stored.

db_user - The username that we'll use to access the Ghost database

db_pass - This is the password we'll use to access the database. You should change this for security purposes, just in case! Even though we'll be keeping our database inaccessible from the open internet via a firewall. Keep in mind that you can't use $ in your password due to a weird Terraform issue.

Want to read more posts like this? Subscribe to the blog! Sign in with Github

ghost-database module

Let's set up the database that we'll be using to store our blog's data. This will be done in a Terraform module to keep everything nice and organized.

main.tf

resource "aws_db_instance" "ghost" {
  allocated_storage         = 20
  storage_type              = "gp2"
  engine                    = "mysql"
  engine_version            = "5.7.19"
  instance_class            = "db.t2.micro"
  identifier                = "${var.name}"
  name                      = "${var.db_name}"
  username                  = "${var.db_user}"
  password                  = "${var.db_pass}"
  backup_retention_period   = 7


  vpc_security_group_ids    = ["${var.security_groups}"]
  final_snapshot_identifier = "${var.name}-final-snapshot"

  lifecycle {
    prevent_destroy = true
  }

  tags {
    Name = "${var.name}"
  }
}

We're setting allocated_storage (hard drive size) to 20 because that's a pretty reasonable default. storage_type to gp2 will give us a nice, SSD-backed storage for quick loading.

A db.t2.micro is fine to start out with for a smaller blog. It's the least expensive option. But if you know that your blog is getting a good amount of traffic, feel free to bump it up to a db.t2.small or db.t2.medium. Any more than that and you might want to look at adding more Cloudfront caching for your static pages, in the future.

lifecycle.prevent_destroy will make it so that Terraform doesn't allow you to destroy the database (unless you get rid of that line). There's not much worse you can do than accidentally destroying your database.

final_snapshot_identifier will create a snapshot if you end up accidentally destroying your database. This will allow you to create a new database, filled in with all your data, by restoring the snapshot.

backup_retention_period is set to 7 just so that we will always have a week's worth of database backups on hand. You can never be too careful with your data!

security_groups is how we'll attach the firewall rules to the server and make sure only our blog's server can log into the database.

The rest of the attributes in this file will be defined through the variables.tf file we'll talk about next!

variables.tf

variable "name" { type = "string" }
variable "db_name" { type = "string" }
variable "db_user" { type = "string" }
variable "db_pass" { type = "string" }
variable "security_groups" { type = "list" }

Given the previous sections, these variables are all pretty self-explanatory! These variables define the attributes the module will take in when we instantiate it later in the configuration.

outputs.tf

Here we'll define all the variables from the database that other parts of our configuration might care about:

output "db-host" { value = "${aws_db_instance.ghost.address}" }
output "db-name" { value = "${aws_db_instance.ghost.name}" }
output "db-user" { value = "${var.db_user}" }
output "db-pass" { value = "${var.db_pass}" }

We'll want the db-host so that the blog server will know where it can connect to the database. The other outputs of db-name, db-user and db-pass are just there because they make the rest of the code more clean. You'll see what I mean!

ghost-server module

This is where things get a little complex. But, keep in mind that each of these is an individual piece of the puzzle. It might help to think about them one at a time before looking at the whole thing.

ami.tf

What operating system will we want our blog to run on? Ubuntu is generally a great choice!

data "aws_ami" "ubuntu" {
  most_recent = true

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-*"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }

  owners = ["099720109477"] # Canonical
}

All this data source does is find the latest Ubuntu 16.04 AMI in our region. We'll use that when initializing the server.

variables.tf

variable "name" { type = "string" }
variable "domain_name" { type = "string" }

variable "db_host" { type = "string" }
variable "db_name" { type = "string" }
variable "db_user" { type = "string" }
variable "db_pass" { type = "string" }

variable "key_pair_name" { type = "string" }
variable "key_pair_loc" { type = "string" }
variable "cloudfront_ssl_acm_arn" { type = "string" }

variable "security_groups" { type = "list" }

We want to pass in a lot of the variables we defined earlier, so that we can initialize the blog configuration with things like domain_name, set up SSL with cloudfront_ssl_acm_arn, configure the firewall with security_groups and log in to the database.

We also want the key_pair information so that we can tell Terraform to SSH into the machine and install the proper software.

template_files.tf

Template files in Terraform can load raw text files and replace variable placeholders with information from our Terraform-created servers and configuration.

We need this for things like the db-host, that we couldn't know in advance. It's also convenient to use it for domain-name so that we don't have to do anything outside of our configuration files.

# Configuration file
data "template_file" "ghost-config" {
    template = "${file("${path.module}/configs/config.production.json.tpl")}"

    vars {
      mysql-host      = "${var.db_host}"
      mysql-db-name   = "${var.db_name}"
      mysql-user      = "${var.db_user}"
      mysql-pass      = "${var.db_pass}"
      domain-name     = "${var.domain_name}"
    }
}

data "template_file" "nginx-site-config" {
    template = "${file("${path.module}/configs/nginx-site.tpl")}"

    vars {
      domain-name   = "${var.domain_name}"
    }
}

data "template_file" "service-config" {
    template = "${file("${path.module}/configs/ghost.service.tpl")}"
}

Here we're loading the files off the hard drive (will show those next) and defining variables that'll be replaced within the files. Through string interpolation, the variables will be replaced with their proper values.

configs/config.production.json.tpl
{
    "url": "https://${domain-name}",
    "server": {
      "host": "127.0.0.1",
      "port": "2368"
    },
    "database": {
        "client": "mysql",
        "connection": {
            "host"     : "${mysql-host}",
            "user"     : "${mysql-user}",
            "password" : "${mysql-pass}",
            "database" : "${mysql-db-name}"
        }
    },
    "paths": {
        "contentPath": "content/"
    },
    "logging": {
        "level": "info",
        "rotation": {
            "enabled": true
        },
        "transports": ["stdout"]
    }
}

This is the file we'll use to configure the Ghost application. domain-name along with the database credentials are interpolated into this template file.

configs/ghost.service.tpl
[Unit]
Description=Ghost
After=network.target

[Service]
Type=simple

WorkingDirectory=/var/www/ghost
User=ghost
Group=ghost


Environment=NODE_ENV=production

ExecStart=/usr/bin/yarn start --production
ExecStop=/usr/bin/yarn stop --production
Restart=always
SyslogIdentifier=Ghost

[Install]
WantedBy=multi-user.target

This is the file that we'll use to configure systemctl. The systemctl application will use this to make sure that Ghost launches on startup and keeps running.

configs/nginx-site.tpl
server {
    listen 80 default_server;
    listen [::]:80 default_server ipv6only=on;

    server_name _;

    root /usr/share/nginx/html;
    index index.html index.htm;

    client_max_body_size 1G;

    location / {
        proxy_pass http://localhost:2368;
        proxy_set_header X-Forwarded-For ${replace("%proxy_add_x_forwarded_for", "%", "\\$")};
        proxy_set_header Host ${replace("%http_host", "%", "\\$")};
        proxy_set_header X-Forwarded-Proto https;
        proxy_buffering off;
    }

    location ~ ^/img_responsive/([0-9]+)(?:/(.*))?${replace("%", "%", "\\$")} {
      proxy_pass http://localhost:2368/${replace("%2", "%", "\\$")};
      proxy_set_header X-Forwarded-For ${replace("%proxy_add_x_forwarded_for", "%", "\\$")};
      proxy_set_header Host ${replace("%http_host", "%", "\\$")};
      proxy_set_header X-Forwarded-Proto https;
      proxy_buffering off;
      image_filter_buffer 10M;
      image_filter_jpeg_quality 80;
      image_filter resize ${replace("%1", "%", "\\$")} -;
    }

    location = /health {
      return 200;
      #access_log off;
    }
}

We'll use this to configure NGINX. We won't go too in-depth into any of this configuration. But its main purpose is taking traffic from port 80 and redirecting it safely through to port 2368. Where your Ghost application is listening.

You might notice some weird stuff in there like ${replace("%proxy_add_x_forwarded_for", "%", "\\$")}. That's because Terraform really didn't like dollar signs ($) for some reason. So in order to do simple things like $proxy_add_x_forwarded_for I had to do that crazy workaround. Comment on this post if you know a better way around that!

Notice that we are setting up a custom subdirectory called img_responsive/. This is a directory that allows us to utilize an awesome NGINX feature that resizes images on the fly. I might write a post about this in the future. But combining this with Cloudfront and the HTML5 srcset attribute will allow your site to perform really well on mobile. I encourage you to check it out!

Lastly, we set up a subdirectory called /health. This is just a URL that can be easily used to check whether or not the NGINX server is still running.

main.tf

This is a large one. It will bring together all the configurations we've just defined and create the server for our blog. Let's post it here and break it down piece by piece.

resource "aws_instance" "ghost" {
  ami                     = "${data.aws_ami.ubuntu.id}"
  instance_type           = "t2.small"
  key_name                = "${var.key_pair_name}"
  vpc_security_group_ids  = ["${var.security_groups}"]

  tags {
    Name = "${var.name}"
  }

  lifecycle {
    ignore_changes  = ["ami"]
  }

  root_block_device {
    volume_type = "gp2"
    volume_size = "50"
    delete_on_termination = "false"
  }

  provisioner "remote-exec" {
    inline = [
      "sudo apt-get update",
      "curl -sL https://deb.nodesource.com/setup_8.x | sudo -E bash -",
      "sudo apt-get install -y unzip nginx nginx-full nodejs",
      "curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -",
      "echo \"deb https://dl.yarnpkg.com/debian/ stable main\" | sudo tee /etc/apt/sources.list.d/yarn.list",
      "sudo apt-get update && sudo apt-get install yarn",
    ]

    connection {
      type          = "ssh"
      user          = "ubuntu"
      private_key   = "${file("${var.key_pair_loc}")}"
    }

    on_failure = "fail"
  }

  provisioner "remote-exec" {
    inline = [
      "wget https://ghost.org/zip/ghost-latest.zip",
      "sudo apt-get install unzip",
      "sudo unzip -d /var/www/ghost ghost-latest.zip",
      "cd /var/www/ghost/",
      "sudo yarn install --production",
      "sleep 1"
    ]

    connection {
      type          = "ssh"
      user          = "ubuntu"
      private_key   = "${file("${var.key_pair_loc}")}"
    }

    on_failure = "fail"
  }

  provisioner "remote-exec" {
    inline = [
      "sudo chown -R ubuntu:ubuntu /var/www/ghost/",
      "cat <<FILEXXX > /var/www/ghost/core/server/config/env/config.production.json",
      "${data.template_file.ghost-config.rendered}",
      "FILEXXX",
      "cd /var/www/ghost/",
      "yarn global add knex-migrator",
      "NODE_ENV=production yarn run knex-migrator init",
      "sleep 1",
    ]

    connection {
      type          = "ssh"
      user          = "ubuntu"
      private_key   = "${file("${var.key_pair_loc}")}"
    }

    on_failure = "fail"
  }

  provisioner "remote-exec" {
    inline = [
      "sudo adduser --disabled-password --shell /bin/bash --gecos 'Ghost application' ghost",
      "sudo chown -R ghost:ghost /var/www/ghost/",
      "sudo touch /etc/nginx/sites-available/ghost && sudo chown ubuntu:ubuntu /etc/nginx/sites-available/ghost",
      "sudo touch /etc/systemd/system/ghost.service && sudo chown ubuntu:ubuntu /etc/systemd/system/ghost.service",
      "cat <<FILEXXX > /etc/nginx/sites-available/ghost",
      "${data.template_file.nginx-site-config.rendered}",
      "FILEXXX",
      "sudo cat <<FILEXXX > /etc/systemd/system/ghost.service",
      "${data.template_file.service-config.rendered}",
      "FILEXXX",
      "sudo ln -s /etc/nginx/sites-available/ghost /etc/nginx/sites-enabled/ghost",
      "sudo rm /etc/nginx/sites-enabled/default",
      "sudo systemctl enable ghost.service",
      "sudo systemctl start ghost.service",
      "sudo service nginx restart",
      "sleep 1",
    ]

    connection {
      type          = "ssh"
      user          = "ubuntu"
      private_key   = "${file("${var.key_pair_loc}")}"
    }

    on_failure = "fail"
  }
}
What that all does:

ami and instance_type define the operating system the server will use as well as the size of the server. We're using a t2.small as a default. But feel free to use something bigger if you get a lot of traffic.

key_name is the SSH key we'll use to connect to the server and provision it with the proper software

vpc_security_group_ids let us define the firewall security rules for the server

lifecycle.ignore_changes tells the server configuration to ignore changes to the AMI. If Amazon gets a new Ubuntu AMI, we don't want to destroy the server, and all our uploaded files, to update it.

root_block_device defines our hard drive. A 50GB SSD. We set delete_on_termination to false so that, just in case we destroy the server, our data can be recovered.

Our provisioners connect through SSH when the server is created:

  1. Install all the necessary ubuntu packages

  2. Download Ghost, unzip it and install the NPM modules

  3. Migrate the database (initialize it, in our case)

  4. Install all the NGINX and systemctl configurations. Start the server.

cloudfront.tf

We have our server set up. That's awesome! You can stop with that if you'd like, for this module.

But, I wanted my blog to be fast around the world and for mobile devices. I also wanted an HTTPS-only blog. I wanted that without much manual work on my end. So I took an extra step and put a Cloudfront instance in front of it!

resource "aws_cloudfront_distribution" "ghost-blog" {
  origin {
    domain_name = "${aws_instance.ghost.public_dns}"
    origin_id   = "${var.name}-origin"

    custom_origin_config {
      http_port                 = 80
      https_port                = 443
      origin_protocol_policy    = "http-only"
      origin_ssl_protocols      = ["TLSv1.1"]
    }
  }

  enabled             = true
  is_ipv6_enabled     = true
  default_root_object = "/"

  lifecycle {
    prevent_destroy = true
  }

  aliases = ["${var.domain_name}", "www.${var.domain_name}"]

  default_cache_behavior {
    allowed_methods   = ["DELETE", "GET", "HEAD", "OPTIONS", "PATCH", "POST", "PUT"]
    cached_methods    = ["GET", "HEAD"]
    target_origin_id  = "${var.name}-origin"
    compress          = true

    forwarded_values {
      query_string  = true
      headers       = ["*"]

      cookies {
        forward = "all"
      }
    }

    viewer_protocol_policy = "redirect-to-https"
    min_ttl                = 0
    default_ttl            = 3600
    max_ttl                = 86400
  }

  cache_behavior {
    path_pattern      = "assets/*"
    allowed_methods   = ["GET", "HEAD"]
    cached_methods    = ["GET", "HEAD"]
    target_origin_id  = "${var.name}-origin"
    compress          = true

    forwarded_values {
      query_string  = true
      cookies { forward = "none" }
    }

    viewer_protocol_policy = "redirect-to-https"
    min_ttl                = 0
    default_ttl            = 3600
    max_ttl                = 86400
  }

  cache_behavior {
    path_pattern      = "content/*"
    allowed_methods   = ["GET", "HEAD"]
    cached_methods    = ["GET", "HEAD"]
    target_origin_id  = "${var.name}-origin"
    compress          = true

    forwarded_values {
      query_string  = true
      cookies { forward = "none" }
    }

    viewer_protocol_policy = "redirect-to-https"
    min_ttl                = 0
    default_ttl            = 3600
    max_ttl                = 86400
  }

  cache_behavior {
    path_pattern      = "public/*"
    allowed_methods   = ["GET", "HEAD"]
    cached_methods    = ["GET", "HEAD"]
    target_origin_id  = "${var.name}-origin"
    compress          = true

    forwarded_values {
      query_string  = true
      cookies { forward = "none" }
    }

    viewer_protocol_policy = "redirect-to-https"
    min_ttl                = 0
    default_ttl            = 3600
    max_ttl                = 86400
  }

  cache_behavior {
    path_pattern      = "img_responsive/*"
    allowed_methods   = ["GET", "HEAD"]
    cached_methods    = ["GET", "HEAD"]
    target_origin_id  = "${var.name}-origin"
    compress          = true

    forwarded_values {
      query_string  = true
      cookies { forward = "none" }
    }

    viewer_protocol_policy = "redirect-to-https"
    min_ttl                = 0
    default_ttl            = 3600
    max_ttl                = 86400
  }

  tags {
    Environment = "${var.name}-production"
  }

  restrictions {
    geo_restriction {
      restriction_type  = "none"
    }
  }

  viewer_certificate {
    acm_certificate_arn       = "${var.cloudfront_ssl_acm_arn}"
    ssl_support_method        = "sni-only"
    minimum_protocol_version  = "TLSv1.1_2016"
  }
}

origin tells Cloudfront where to fetch the files. We can point this at our web server's public DNS name that EC2 instances initialize with. Cloudfront will use HTTP to fetch from the EC2 instance and will use HTTPS when talking to our users.

lifecycle.prevent_destroy is enabled so that we don't accidentally destroy the Cloudfront instance and have to change our DNS records to a new CNAME.

The cache configurations are set up to work well with Ghost (not caching any dynamic pages, only assets).

We've disabled geo_restrictions so that anyone in the world can access the blog.

viewer_certificate is set up to use the SSL certificate we created in Amazon Certificate Manager.

outputs.tf

Cool! That takes care of setting up the server and the additional Cloudfront instance! Let's output the variables from the module so that we can use them in our CLI output (for our own convenience).

output "cloudfront-dns" { value = "${aws_cloudfront_distribution.ghost-blog.domain_name}" }
output "server-ip" { value = "${aws_instance.ghost.public_ip}" }

security_groups.tf

This file will define the firewall rules that our servers will use. They will secure our server ports from prying eyes and only open the ones we want to be open.

# Get local machine's IP
data "http" "my-ip" {
  url = "http://icanhazip.com"
}

resource "aws_security_group" "ghost-server" {
  name        = "ghost-server"
  description = "Allow SSH inbound, all HTTP inbound, and all outbound traffic"

  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["${chomp(data.http.my-ip.body)}/32"]
  }

  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port = 0
    to_port = 0
    protocol = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_security_group" "ghost-db" {
  name        = "ghost-db"
  description = "Allow SSH inbound, all HTTP inbound, and all outbound traffic"

  ingress {
    from_port   = 3306
    to_port     = 3306
    protocol    = "tcp"
    security_groups = ["${aws_security_group.ghost-server.id}"]
  }
}

ghost-server will only allow SSH connections (port 22) from our IP address. It will allow HTTP connections (port 80) from anywhere. It will also be allowed to call out to the open internet to download any files it needs.

ghost-db will only allow MySQL connections (port 3306) from anything in the ghost-server security group (only our blog server). This will prevent anyone on the open internet from accessing our DB without hacking our server first.

main.tf

Here's where the magic happens! We bring all the modules we created together to create our final infrastructure.

module "ghost-db" {
  source              = "ghost-database"
  name                = "ghost-db"

  db_name             = "${var.db_name}"
  db_user             = "${var.db_user}"
  db_pass             = "${var.db_pass}"
  security_groups     = ["${aws_security_group.ghost-db.id}"]
}

# Set up the Ghost Server
module "ghost-blog" {
  source                  = "ghost-server"
  name                    = "ghost-server"
  domain_name             = "${var.domain_name}"

  db_host                 = "${module.ghost-db.db-host}"
  db_name                 = "${module.ghost-db.db-name}"
  db_user                 = "${module.ghost-db.db-user}"
  db_pass                 = "${module.ghost-db.db-pass}"

  key_pair_name           = "${var.key_pair_name}"
  key_pair_loc            = "${var.key_pair_location}"
  security_groups         = ["${aws_security_group.ghost-server.id}"]

  cloudfront_ssl_acm_arn  = "${var.cloudfront_ssl_acm_arn}"
}

Looks pretty, doesn't it? Because of the modules we made earlier, everything here is pretty self-explanatory!

outputs.tf

output "cloudfront-dns" { value = "${module.ghost-blog.cloudfront-dns}" }
output "server-ip" { value = "${module.ghost-blog.server-ip}" }

Let's output some useful information to our command line. So that we can easily point our DNS at the Cloudfront instance. We can also SSH into the server's IP if we want to.

Putting it all together!

Run terraform apply and you should have yourself a blog! It should output something like this:

Outputs:

cloudfront-dns = d2pua5lemutsdm.cloudfront.net
server-ip = 54.218.75.56

Point a CNAME DNS record at the cloudfront-dns output and you should be good to go!

Moving forward

One thing I don't like about this setup is that the hard drive (everything uploaded to the blog) is tied to the EC2 instance. I'd like to disconnect the two so that I can upgrade or modify the EC2 instance without destroying the data.

But that's a topic for another blog post! Enjoy your blog!

Want to read more posts like this? Subscribe to the blog! Sign in with Github