Service Development Kit: Terraform AWS ECS Setup with Rust Actix App, Postgress RDS, LLM, RAG, Cloudflare, and more.

10 min readMar 7, 2024

Prologue

A few weeks ago, I was reviewing my AWS bills for my pet projects and found myself genuinely amused by the amount I spend on infrastructure 🤣. Notably, the bulk of my AWS expenses 💰 are tied to numerous RDS instances, despite my traffic hardly utilizing them 🤷‍♂️. Additionally, I realized my infrastructure is quite fragmented, with different designs implemented for various applications.

Something is mesmerizing about starting new project from scratch, crafting a “state-of-the-art” infrastructure capable of handling XX K QPS with zero downtime, and effortlessly managing traffic spikes. … and never user it... 🥲

💪 I decided to simplify my infra and cut some costs. I migrated all apps to one ECS cluster with one shared RDS instance; and created a Terraform mono-repo to manage everything in one place.

Along the way, I developed a few templates to help spin off a new service: a slim and fast Rust service with some basic features (DB connection, user management, sessions, auth, LLM APIs 🪄 ); a small container for DB migration, etc.

TL;DR: Links to all repositories are at the very bottom 🔎: Rust Service, DB Migration with Golang-migrate and Terraform monorepo

🛣️ I thought it would be easier, but as usual 🤷‍♂️, there were numerous twists.

Overall I used the following:

Cloudflare — domain registrar and proxy (my favorite infrastructure company ❤️)
Rust & Actix— I want to use the smallest possible machines without sacrificing performance (I used Rust in the past for a mobile and loved it)
AWS ECS (For the old project, I did not rewrite them into Rust, just Dockerise and deploy in the ECS)
Share Postgres RDS between multiple ECS services
Terraform mono-repo to manage infra in one place (I don’t ever want to open the AWS console again 🙈)
GitLab for CI/CD and Terraform state management (Gitlab is ❤️)

Setup

My setup ended up like this:

Now I can deploy a new service in a few simple steps:

In the Terraform: (a) create a new ECR for your Docker image (b) Create an SSL certificate for *.your-domain.com; (c) Create a new ECS service and ECS task definition linked to your ECR image (d) Create a new rule on ALB to route traffic based on your domain name to correct ECS service.
On Cloudflare: (a) Purchase a new domain; (b) Verify SSL certificate domain ownership; (c) Setup routing to ALB & CloudFront
Setup Gitlab CI/CD to upload images to the new ECR and force update ECS service to fetch a new image

DONE ✅

Most of the steps required to spin off a new service require small changes in the *.tf code and simple manipulation in AWS/Cloudflare consoles. After that, whenever I need to deploy a new service, I need to run 1 CI task to push a new Docker to ECR and update the ECS task. Piece of cake 🍰 😋

aws ecs update-service --cluster existing-app-cluster --service new-app-service  --force-new-deployment

Implementation

Honestly, it should have been easier, but there were so many edge cases, that I had to spend quite a few nights to create this. In the example, I’ll use one of my pet projects: getthedeck.com

Access Keys

Let's start with setting up the required access keys from Gitlab and AWS.

In the AWS go to IAM and create:

Policy: “gitlab-ci-policy”
A new user “gitlab-ci-user” and attach the policy
Create access key for the user (AWS_ACCESS_KEY_ID & AWS_SECRET_ACCESS_KEY — add these two, plus AWS_REGION, to your ~/.zshrc 🥸 )

For the AWS to find the minimum required set of permissions for the policy I’m using iamlive.

export AWS_CSM_ENABLED=true

iamlive - set-ini - profile <yout profile> - output-file policy.json

This policy.json will allow you to create and destroy infra from the Terraform mono repo.

⌨️ In GitLab create an access token for your user (not project) with API READ & WRITE access and add it to your env in TF project and ~/.zshrc — GITLAB_ACCESS_TOKEN

➡️ For the Deployment in Gitlab I’m using OpenID instead of then AWS_ACCESS_KEY, which can be restricted to the gitlab.com domain.

SSL Certificate

Next step I’ll create an SSL certificate that will cover domain and sub-domains in AWS ACM : *.getthedeck.com

This is the CNAME record that we will use in Cloudflare later. You can also create it via TF, but I didn’t do it as I don’t ever need to destroy it.

Terraform

Terraform mono repo looks something like this

In short: we have a VPC with ALB and ECS clusters in the public subnets. ECS tasks load Docker images from ECR and allow connection to the RDS in the private subnet. (I intentionally didn’t use any modules to keep it super simple)

Terraform Data

A few things need to be added manually. For the existing SSL certificate a “certificate.tf” with:

data "aws_acm_certificate" "certificate" {
  domain = "*.getthedeck.com"
}

I do the same for the secrets that I want to pass on to my ECS task, tho it is not secure as it will be stored in an open test in the task description, better use API 🥲:

data "aws_secretsmanager_secret" "secrets_zeus" {
  arn = "arn:aws:secretsmanager:us-east-1:000000000000:secret:prod/app/name"
}

data "aws_secretsmanager_secret_version" "secrets_value_prod_zeus_main" {
  secret_id = data.aws_secretsmanager_secret.secrets_manager_prod_zeus_main.id
}

locals {
  secrets_map                  = jsondecode(data.aws_secretsmanager_secret_version.secrets_zeus.secret_string)
  secret_RDS_DB_NAME           = local.secrets_map["RDS_DB_NAME"]
  secret_RDS_USERNAME          = local.secrets_map["RDS_USERNAME"]
  secret_RDS_PASSWORD          = local.secrets_map["RDS_PASSWORD"]
}

Terraform State

I decided to use Gitlab for my Terraform state. After you created your Gitlab project, go to Operate -> Terraform State:

This is how init from my CI/CD looks for the Terraform:

//setup
variables:
  TF_USERNAME: $GITLAB_USER_LOGIN
  TF_PASSWORD: $GITLAB_ACCESS_TOKEN
  TF_STATE_NAME: default
  TF_ADDRESS: ${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/terraform/state/${TF_STATE_NAME}
  TF_ROOT: ${CI_PROJECT_DIR}/environments/${TF_STATE_NAME}

//usage
terraform init -reconfigure \
        -backend-config="address=${TF_ADDRESS}" \
        -backend-config="lock_address=${TF_ADDRESS}/lock" \
        -backend-config="unlock_address=${TF_ADDRESS}/lock" \
        -backend-config="username=${TF_USERNAME}" \
        -backend-config="password=${TF_PASSWORD}" \
        -backend-config="lock_method=POST" \
        -backend-config="unlock_method=DELETE" \
        -backend-config="retry_wait_min=5"

$GITLAB_ACCESS_TOKEN is your GitLab Account token, that we created before.

Since we want to run terraform from CI/CD and from the local machine, which in my case is Mac & Linux, after you add providers need to run locks for all platforms and commit updated “.terraform.lock.hcl” to the repo:

terraform providers lock \
    -platform=darwin_amd64 \
    -platform=linux_amd64 \
    -platform=darwin_arm64 \
    -platform=linux_arm64

ECS Task

Our ECS task running on the ECS cluster has 2 services in the “container_definitions”:

ECS task for (left) Rust service and (right) DB migration

The first image runs a DB migration using pre-compiled Golang-migrate. It will exit after applying migration. To allow service to proceed after the task ends, you need to mark it as non-essential explicitly

essential = false

2. The second one is the Rust app and it depends on the first “SUCCESS”

depends_on = [
        {
          containerName = var.ecs_db_migaration
          condition     = "SUCCESS"
        }
      ],

RDS

Postgres RDS is located in the private subnet that allows ingress traffic only to port 5432 from the service security group.

Note: in case you want to connect to DB for testing, you can add just one flag in the Terraform and it will become public:

publicly_accessible    = true

Terraform output

We will need outputs to upload our Docker image to ECR

ECR_NAME =`000000000000.dkr.ecr.us-east-1.amazonaws.com/zeus`
aws ecr get-login-password --region $AWS_REGION --profile <your profile>| docker login --username AWS --password-stdin $ECR_NAME
docker build --platform=linux/amd64 -t zeus -f Dockerfile.prd .
docker tag zeus:latest $ECR_NAME
docker push $ECR_NAME

✍️ Note: Your Docker image tag should match your ECR repository name.

You can also get the cluster name and service name to trigger the deployment of the service from the output:

// first
terraform init //.... 
ecs_cluster_name=$(terraform output -json ecs_cluster_name | tr -d '"')
ecs_service_name=$(terraform output -json ecs_service_name | tr -d '"')
// then
aws ecs update-service --cluster "$ecs_cluster_name" --service "$ecs_service_name" --force-new-deployment

Docker Images

Rust Service

I like Rust ❤️. But I would not lie, most of the time, it is very hard, but when it finally compiles, I’m not worried that something will go wrong — “0 runtime anxiety” 😎

My Project: Zeus, has everything to kick off a new project:

Clean architecture build around separate use cases
Postgres DB integration
User management, JWT session management, Middleware
LLM APIs 🦄: OpenAI, Cloudflare AI, Gemini 🦄
Useful Google APIs: Places & Vision
ACTIX web server (https://actix.rs) — one of the fastest servers ⚡️

You can create an image for ECR from the .prd file:

docker build --platform=linux/amd64 -t zeus -f Dockerfile.prd .

🔐 Note: You will need to update src/repository/secrets.rs with your keys. Inject them in the ECS task, read from AWS Secret Manager or hardcode 🤫

LLM APIs🦄

Now we have many alternatives to OpenAI ChatGPT APIs, including Gemini or Cloudflare Workers AI. I’ve been playing a lot with them, especially with Cloudflare Workers AI. Despite they don’t offer big models, models like lama-2–7b work quite well in RAG. It is also available in the repo with PG Vector and Llamaindex.

Don’t forget to add API keys in *.env* 🔐

Bruno

Already for a while, I moved for all my pet projects from Postman to Bruno. I love that I can keep my requests in the same repo with the source code 💪 It is still lacking some functions, but it is good enough for small projects.

Talk with your DB with LlamaIndex

I‘m also using natural language communication with DB as an alternative PGAdmin to quickly query some data. I’m using the Flask app with LlamaIndex & NLSQLTableQueryEngine. It will also require an OpenAI API key in .env:

OPENAI_API_KEY=x-y-z

Athena: Llamaindex for NL SQL, web client

The source code is in the same repo: zeus/athena. It will run with docker-compose and will be available at 127.0.0.1:3005/nlsql

DB Migration

To run DB migration scrips I’m using Golang-migrate, and I also use psql with psql-client to test it out locally. I also use migration image as a shortcut to import some .csv files 🙈

COPY migrations ./migrations
COPY import ./import
COPY scripts ./scripts

CMD ["sh", "-c", "\
    if migrate -path ./migrations -database $DATABASE_URL up; then \
        ./scripts/run-data-import.sh; \
        echo 'Import data succeeded'; \
    else \
        echo 'Migration failed'; \
        exit 1; \
fi"]

The source code is available here.

Cloudflare

You may wonder why I used Cloudflare here. I usually use it as my registrar for the domain. It provides free proxy and caching, so also helps to save some cost and serve content faster.

Note: Cloudflare doesn’t allow you to change NS, so if you purchase a domain here, you will not be able to manage it with Route53.

Domain

Let's start with the domain, I’ll show an example with one of my domains getthedeck.com, and here is what the setup looks like.

I have 3 CNAME records:

api.getthedeck.com — ALB for access to service
SSL certificate ownership verification from AWS
getthedeck.com — pointing to CloudFront distribution for my web page

On the ALB side, we have the rule to route HTTP to HTTPS, therefore we need to set SSL encryption mode to “Full”. Otherwise, it will get us into the dead loop.

You can remove this rule from Terraform since we also restricted ALB ingress to only Cloudflare IP addresses:

data "http" "cloudflare_ipv4" {
  url = "https://www.cloudflare.com/ips-v4"
}

locals {
  cloudflare_ipv4_cidr_blocks = [for cidr_block in split("\n", trimspace(data.http.cloudflare_ipv4.body)) : cidr_block]
}

//ALB Security group
cidr_blocks      = local.cloudflare_ipv4_cidr_blocks

⚠️ Other Cloudflare apps, from the same IP range, also will be able to ping our service. We can attach some custom headers on the Cloudflare side and add a listener rule on ALB to restrict this.

GitLab CI/CD

As for CI/CD, I’m using OpenID to get credentials for my “Deployment” role and then upload the web app to S3 or force-deployment of my ECS:

.assume_role: &assume_role
    - >
      STS=($(aws sts assume-role-with-web-identity
      --role-arn="${ROLE_ARN}"
      --role-session-name "GitLabRunner-${CI_PROJECT_ID}-${CI_PIPELINE_ID}"
      --web-identity-token $ID_TOKEN
      --duration-seconds 3600
      --query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]'
      --output text))
    - export AWS_ACCESS_KEY_ID="${STS[0]}"
    - export AWS_SECRET_ACCESS_KEY="${STS[1]}"
    - export AWS_SESSION_TOKEN="${STS[2]}"

deploy:
  stage: deploy
  image:
    name: amazon/aws-cli:latest
    entrypoint: 
      - '/usr/bin/env'
  id_tokens:
      ID_TOKEN:
        aud: the_deck_web_app_s3
  script:
    - *assume_role
    # Web App
    - aws s3 sync build/ s3://$S3_BUCKET
    # Rust Service
    - aws ecs update-service --cluster $ECS_CLUSTER_NAME --service $ECS_SERVICE_NAME --force-new-deployment
  rules:
    - if: $CI_COMMIT_TAG
      when: on_success
    - if: '$CI_COMMIT_REF_NAME == "main"'
      when: manual

🐌 In the past I also used Bitbucket and GitHub for my pet project, but I’m slowly moving everything to Gitlab.

Github

I have opensource early versions, so they might not be exactly what I used 🫣 tho it is a good place to start:

Zeus — Rust services: https://github.com/xajik/zeus-ai-api-gateway
Dazhbog — Terraform: https://github.com/xajik/dazhbog-tf
Veles — DB Migration: https://github.com/xajik/veles-db-migration

What’s next

It would be substantially cheaper to have a self-hosted DB instead of using RDS, tho I don’t want to spend time on backups… we will see.

If I wasn’t so keen on exploring AWS further, I’m confident I could achieve better pricing without sacrificing performance by switching to Digital Ocean 💡