Service Development Kit: Terraform AWS ECS Setup with Rust Actix App, Postgress RDS, LLM, RAG, Cloudflare, and more.
Prologue
A few weeks ago, I was reviewing my AWS bills for my pet projects and found myself genuinely amused by the amount I spend on infrastructure 🤣. Notably, the bulk of my AWS expenses 💰 are tied to numerous RDS instances, despite my traffic hardly utilizing them 🤷♂️. Additionally, I realized my infrastructure is quite fragmented, with different designs implemented for various applications.
Something is mesmerizing about starting new project from scratch, crafting a “state-of-the-art” infrastructure capable of handling XX K QPS with zero downtime, and effortlessly managing traffic spikes. … and never user it... 🥲
💪 I decided to simplify my infra and cut some costs. I migrated all apps to one ECS cluster with one shared RDS instance; and created a Terraform mono-repo to manage everything in one place.
Along the way, I developed a few templates to help spin off a new service: a slim and fast Rust service with some basic features (DB connection, user management, sessions, auth, LLM APIs 🪄 ); a small container for DB migration, etc.
TL;DR: Links to all repositories are at the very bottom 🔎: Rust Service, DB Migration with Golang-migrate and Terraform monorepo
🛣️ I thought it would be easier, but as usual 🤷♂️, there were numerous twists.
Overall I used the following:
- Cloudflare — domain registrar and proxy (my favorite infrastructure company ❤️)
- Rust & Actix— I want to use the smallest possible machines without sacrificing performance (I used Rust in the past for a mobile and loved it)
- AWS ECS (For the old project, I did not rewrite them into Rust, just Dockerise and deploy in the ECS)
- Share Postgres RDS between multiple ECS services
- Terraform mono-repo to manage infra in one place (I don’t ever want to open the AWS console again 🙈)
- GitLab for CI/CD and Terraform state management (Gitlab is ❤️)
Setup
My setup ended up like this:
Now I can deploy a new service in a few simple steps:
- In the Terraform: (a) create a new ECR for your Docker image (b) Create an SSL certificate for *.your-domain.com; (c) Create a new ECS service and ECS task definition linked to your ECR image (d) Create a new rule on ALB to route traffic based on your domain name to correct ECS service.
- On Cloudflare: (a) Purchase a new domain; (b) Verify SSL certificate domain ownership; (c) Setup routing to ALB & CloudFront
- Setup Gitlab CI/CD to upload images to the new ECR and force update ECS service to fetch a new image
DONE ✅
Most of the steps required to spin off a new service require small changes in the *.tf code and simple manipulation in AWS/Cloudflare consoles. After that, whenever I need to deploy a new service, I need to run 1 CI task to push a new Docker to ECR and update the ECS task. Piece of cake 🍰 😋
aws ecs update-service --cluster existing-app-cluster --service new-app-service --force-new-deployment
Implementation
Honestly, it should have been easier, but there were so many edge cases, that I had to spend quite a few nights to create this. In the example, I’ll use one of my pet projects: getthedeck.com
Access Keys
Let's start with setting up the required access keys from Gitlab and AWS.
In the AWS go to IAM and create:
- Policy: “gitlab-ci-policy”
- A new user “gitlab-ci-user” and attach the policy
- Create access key for the user (AWS_ACCESS_KEY_ID & AWS_SECRET_ACCESS_KEY — add these two, plus AWS_REGION, to your ~/.zshrc 🥸 )
For the AWS to find the minimum required set of permissions for the policy I’m using iamlive.
export AWS_CSM_ENABLED=true
iamlive - set-ini - profile <yout profile> - output-file policy.json
This policy.json will allow you to create and destroy infra from the Terraform mono repo.
⌨️ In GitLab create an access token for your user (not project) with API READ & WRITE access and add it to your env in TF project and ~/.zshrc — GITLAB_ACCESS_TOKEN
➡️ For the Deployment in Gitlab I’m using OpenID instead of then AWS_ACCESS_KEY, which can be restricted to the gitlab.com domain.
SSL Certificate
Next step I’ll create an SSL certificate that will cover domain and sub-domains in AWS ACM : *.getthedeck.com
This is the CNAME record that we will use in Cloudflare later. You can also create it via TF, but I didn’t do it as I don’t ever need to destroy it.
Terraform
Terraform mono repo looks something like this
In short: we have a VPC with ALB and ECS clusters in the public subnets. ECS tasks load Docker images from ECR and allow connection to the RDS in the private subnet. (I intentionally didn’t use any modules to keep it super simple)
Terraform Data
A few things need to be added manually. For the existing SSL certificate a “certificate.tf” with:
data "aws_acm_certificate" "certificate" {
domain = "*.getthedeck.com"
}
I do the same for the secrets that I want to pass on to my ECS task, tho it is not secure as it will be stored in an open test in the task description, better use API 🥲:
data "aws_secretsmanager_secret" "secrets_zeus" {
arn = "arn:aws:secretsmanager:us-east-1:000000000000:secret:prod/app/name"
}
data "aws_secretsmanager_secret_version" "secrets_value_prod_zeus_main" {
secret_id = data.aws_secretsmanager_secret.secrets_manager_prod_zeus_main.id
}
locals {
secrets_map = jsondecode(data.aws_secretsmanager_secret_version.secrets_zeus.secret_string)
secret_RDS_DB_NAME = local.secrets_map["RDS_DB_NAME"]
secret_RDS_USERNAME = local.secrets_map["RDS_USERNAME"]
secret_RDS_PASSWORD = local.secrets_map["RDS_PASSWORD"]
}
Terraform State
I decided to use Gitlab for my Terraform state. After you created your Gitlab project, go to Operate -> Terraform State:
This is how init from my CI/CD looks for the Terraform:
//setup
variables:
TF_USERNAME: $GITLAB_USER_LOGIN
TF_PASSWORD: $GITLAB_ACCESS_TOKEN
TF_STATE_NAME: default
TF_ADDRESS: ${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/terraform/state/${TF_STATE_NAME}
TF_ROOT: ${CI_PROJECT_DIR}/environments/${TF_STATE_NAME}
//usage
terraform init -reconfigure \
-backend-config="address=${TF_ADDRESS}" \
-backend-config="lock_address=${TF_ADDRESS}/lock" \
-backend-config="unlock_address=${TF_ADDRESS}/lock" \
-backend-config="username=${TF_USERNAME}" \
-backend-config="password=${TF_PASSWORD}" \
-backend-config="lock_method=POST" \
-backend-config="unlock_method=DELETE" \
-backend-config="retry_wait_min=5"
$GITLAB_ACCESS_TOKEN is your GitLab Account token, that we created before.
Since we want to run terraform from CI/CD and from the local machine, which in my case is Mac & Linux, after you add providers need to run locks for all platforms and commit updated “.terraform.lock.hcl” to the repo:
terraform providers lock \
-platform=darwin_amd64 \
-platform=linux_amd64 \
-platform=darwin_arm64 \
-platform=linux_arm64
ECS Task
Our ECS task running on the ECS cluster has 2 services in the “container_definitions”:
- The first image runs a DB migration using pre-compiled Golang-migrate. It will exit after applying migration. To allow service to proceed after the task ends, you need to mark it as non-essential explicitly
essential = false
2. The second one is the Rust app and it depends on the first “SUCCESS”
depends_on = [
{
containerName = var.ecs_db_migaration
condition = "SUCCESS"
}
],
RDS
Postgres RDS is located in the private subnet that allows ingress traffic only to port 5432 from the service security group.
Note: in case you want to connect to DB for testing, you can add just one flag in the Terraform and it will become public:
publicly_accessible = true
Terraform output
We will need outputs to upload our Docker image to ECR
ECR_NAME =`000000000000.dkr.ecr.us-east-1.amazonaws.com/zeus`
aws ecr get-login-password --region $AWS_REGION --profile <your profile>| docker login --username AWS --password-stdin $ECR_NAME
docker build --platform=linux/amd64 -t zeus -f Dockerfile.prd .
docker tag zeus:latest $ECR_NAME
docker push $ECR_NAME
✍️ Note: Your Docker image tag should match your ECR repository name.
You can also get the cluster name and service name to trigger the deployment of the service from the output:
// first
terraform init //....
ecs_cluster_name=$(terraform output -json ecs_cluster_name | tr -d '"')
ecs_service_name=$(terraform output -json ecs_service_name | tr -d '"')
// then
aws ecs update-service --cluster "$ecs_cluster_name" --service "$ecs_service_name" --force-new-deployment
Docker Images
Rust Service
I like Rust ❤️. But I would not lie, most of the time, it is very hard, but when it finally compiles, I’m not worried that something will go wrong — “0 runtime anxiety” 😎
My Project: Zeus, has everything to kick off a new project:
- Clean architecture build around separate use cases
- Postgres DB integration
- User management, JWT session management, Middleware
- LLM APIs 🦄: OpenAI, Cloudflare AI, Gemini 🦄
- Useful Google APIs: Places & Vision
- ACTIX web server (https://actix.rs) — one of the fastest servers ⚡️
You can create an image for ECR from the .prd file:
docker build --platform=linux/amd64 -t zeus -f Dockerfile.prd .
🔐 Note: You will need to update src/repository/secrets.rs with your keys. Inject them in the ECS task, read from AWS Secret Manager or hardcode 🤫
LLM APIs🦄
Now we have many alternatives to OpenAI ChatGPT APIs, including Gemini or Cloudflare Workers AI. I’ve been playing a lot with them, especially with Cloudflare Workers AI. Despite they don’t offer big models, models like lama-2–7b work quite well in RAG. It is also available in the repo with PG Vector and Llamaindex.
Bruno
Already for a while, I moved for all my pet projects from Postman to Bruno. I love that I can keep my requests in the same repo with the source code 💪 It is still lacking some functions, but it is good enough for small projects.
Talk with your DB with LlamaIndex
I‘m also using natural language communication with DB as an alternative PGAdmin to quickly query some data. I’m using the Flask app with LlamaIndex & NLSQLTableQueryEngine. It will also require an OpenAI API key in .env:
OPENAI_API_KEY=x-y-z
The source code is in the same repo: zeus/athena. It will run with docker-compose and will be available at 127.0.0.1:3005/nlsql
DB Migration
To run DB migration scrips I’m using Golang-migrate, and I also use psql with psql-client to test it out locally. I also use migration image as a shortcut to import some .csv files 🙈
COPY migrations ./migrations
COPY import ./import
COPY scripts ./scripts
CMD ["sh", "-c", "\
if migrate -path ./migrations -database $DATABASE_URL up; then \
./scripts/run-data-import.sh; \
echo 'Import data succeeded'; \
else \
echo 'Migration failed'; \
exit 1; \
fi"]
The source code is available here.
Cloudflare
You may wonder why I used Cloudflare here. I usually use it as my registrar for the domain. It provides free proxy and caching, so also helps to save some cost and serve content faster.
Note: Cloudflare doesn’t allow you to change NS, so if you purchase a domain here, you will not be able to manage it with Route53.
Domain
Let's start with the domain, I’ll show an example with one of my domains getthedeck.com, and here is what the setup looks like.
I have 3 CNAME records:
- api.getthedeck.com — ALB for access to service
- SSL certificate ownership verification from AWS
- getthedeck.com — pointing to CloudFront distribution for my web page
On the ALB side, we have the rule to route HTTP to HTTPS, therefore we need to set SSL encryption mode to “Full”. Otherwise, it will get us into the dead loop.
You can remove this rule from Terraform since we also restricted ALB ingress to only Cloudflare IP addresses:
data "http" "cloudflare_ipv4" {
url = "https://www.cloudflare.com/ips-v4"
}
locals {
cloudflare_ipv4_cidr_blocks = [for cidr_block in split("\n", trimspace(data.http.cloudflare_ipv4.body)) : cidr_block]
}
//ALB Security group
cidr_blocks = local.cloudflare_ipv4_cidr_blocks
⚠️ Other Cloudflare apps, from the same IP range, also will be able to ping our service. We can attach some custom headers on the Cloudflare side and add a listener rule on ALB to restrict this.
GitLab CI/CD
As for CI/CD, I’m using OpenID to get credentials for my “Deployment” role and then upload the web app to S3 or force-deployment of my ECS:
.assume_role: &assume_role
- >
STS=($(aws sts assume-role-with-web-identity
--role-arn="${ROLE_ARN}"
--role-session-name "GitLabRunner-${CI_PROJECT_ID}-${CI_PIPELINE_ID}"
--web-identity-token $ID_TOKEN
--duration-seconds 3600
--query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]'
--output text))
- export AWS_ACCESS_KEY_ID="${STS[0]}"
- export AWS_SECRET_ACCESS_KEY="${STS[1]}"
- export AWS_SESSION_TOKEN="${STS[2]}"
deploy:
stage: deploy
image:
name: amazon/aws-cli:latest
entrypoint:
- '/usr/bin/env'
id_tokens:
ID_TOKEN:
aud: the_deck_web_app_s3
script:
- *assume_role
# Web App
- aws s3 sync build/ s3://$S3_BUCKET
# Rust Service
- aws ecs update-service --cluster $ECS_CLUSTER_NAME --service $ECS_SERVICE_NAME --force-new-deployment
rules:
- if: $CI_COMMIT_TAG
when: on_success
- if: '$CI_COMMIT_REF_NAME == "main"'
when: manual
🐌 In the past I also used Bitbucket and GitHub for my pet project, but I’m slowly moving everything to Gitlab.
Github
I have opensource early versions, so they might not be exactly what I used 🫣 tho it is a good place to start:
- Zeus — Rust services: https://github.com/xajik/zeus-ai-api-gateway
- Dazhbog — Terraform: https://github.com/xajik/dazhbog-tf
- Veles — DB Migration: https://github.com/xajik/veles-db-migration
What’s next
It would be substantially cheaper to have a self-hosted DB instead of using RDS, tho I don’t want to spend time on backups… we will see.
If I wasn’t so keen on exploring AWS further, I’m confident I could achieve better pricing without sacrificing performance by switching to Digital Ocean 💡