Terraform AWS Multi-VPC Architecture Guide
Terraform AWS Multi-VPC Architecture Guide
This guide walks through building a production-ready multi-tier AWS architecture using Terraform, featuring VPC peering, Auto Scaling with Spot instances, RDS, and ECS Fargate for log aggregation.
Architecture Overview
We’ll build a secure, scalable architecture with network isolation between frontend and backend tiers:
┌─────────────────────────────────────────────────────────────────┐
│ INTERNET │
└────────────────────────────┬────────────────────────────────────┘
│
┌────────▼────────┐
│ Internet GW │
└────────┬────────┘
│
┌────────────────────────────▼──────────────────────────────┐
│ PUBLIC VPC (10.0.0.0/16) │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Public Subnet 1 (AZ-a) Public Subnet 2 (AZ-b)│ │
│ │ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ ALB │◄─────────►│ ALB │ │ │
│ │ │ (10.0.1.0/24)│ │ (10.0.2.0/24)│ │ │
│ │ └──────┬───────┘ └──────┬───────┘ │ │
│ │ │ │ │ │
│ │ ┌──────▼───────┐ ┌──────▼───────┐ │ │
│ │ │ Frontend │ │ Jumphost │ │ │
│ │ │ EC2 │ │ EC2 │ │ │
│ │ └──────────────┘ └──────────────┘ │ │
│ └──────────────────────────────────────────────────┘ │
└────────────────────────┬──────────────────────────────────┘
│
┌───────▼────────┐
│ VPC Peering │
└───────┬────────┘
│
┌────────────────────────▼──────────────────────────────────┐
│ PRIVATE VPC (10.1.0.0/16) │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Private Subnet 1 (AZ-a) Private Subnet 2 (AZ-b)│ │
│ │ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Backend EC2 │ │ Backend EC2 │ │ │
│ │ │ (ASG+Spot) │◄────────►│ (ASG+Spot) │ │ │
│ │ │(10.1.10.0/24)│ │(10.1.11.0/24)│ │ │
│ │ └──────┬───────┘ └──────┬───────┘ │ │
│ │ │ │ │ │
│ │ │ ┌──────────────────┐ │ │ │
│ │ └───► ECS Fargate ◄──┘ │ │
│ │ │ (Log Collector) │ │ │
│ │ └──────────────────┘ │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────┐ │ │
│ │ │ DB Subnet 1 (AZ-a) DB Subnet 2 (AZ-b) │ │ │
│ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │
│ │ │ │ RDS Primary │◄───►│ RDS Standby │ │ │ │
│ │ │ │ (Writer) │ │ (Multi-AZ) │ │ │ │
│ │ │ │(10.1.20.0/24)│ │(10.1.21.0/24)│ │ │ │
│ │ │ └──────────────┘ └──────────────┘ │ │ │
│ │ └─────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────┘
Key Components
- Public VPC - Internet-facing tier
- Application Load Balancer (ALB)
- Frontend EC2 or Jumphost
- Internet Gateway for outbound traffic
- Private VPC - Backend tier (isolated)
- Auto Scaling Group with Spot instances
- RDS PostgreSQL (Multi-AZ)
- ECS Fargate for log aggregation
- VPC Peering - Secure connection between VPCs
- Security Groups - Fine-grained access control
- Cost Optimization - Spot instances, right-sizing
Prerequisites
# Install Terraform
brew install terraform # macOS
# or download from: https://www.terraform.io/downloads
# Verify installation
terraform version
# Configure AWS credentials
aws configure
# AWS Access Key ID: YOUR_KEY
# AWS Secret Access Key: YOUR_SECRET
# Default region: us-east-1
# Default output format: json
Project Structure
Organize your Terraform code for maintainability:
terraform-aws-multi-vpc/
├── main.tf # Main configuration
├── variables.tf # Input variables
├── outputs.tf # Output values
├── terraform.tfvars # Variable values (don't commit secrets!)
├── providers.tf # Provider configuration
├── modules/
│ ├── vpc/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── security-groups/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── compute/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ └── database/
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
└── README.md
For this guide, we’ll use a simpler single-file approach for clarity.
Step 1: Provider Configuration
Create providers.tf:
# providers.tf
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
# Optional: Remote state storage
# backend "s3" {
# bucket = "my-terraform-state-bucket"
# key = "multi-vpc/terraform.tfstate"
# region = "us-east-1"
# }
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Environment = var.environment
Project = "multi-vpc-architecture"
ManagedBy = "Terraform"
}
}
}
Step 2: Variables Configuration
Create variables.tf:
# variables.tf
variable "aws_region" {
description = "AWS region for resources"
type = string
default = "us-east-1"
}
variable "environment" {
description = "Environment name"
type = string
default = "production"
}
variable "project_name" {
description = "Project name for resource naming"
type = string
default = "multi-vpc"
}
# VPC CIDR blocks
variable "public_vpc_cidr" {
description = "CIDR block for public VPC"
type = string
default = "10.0.0.0/16"
}
variable "private_vpc_cidr" {
description = "CIDR block for private VPC"
type = string
default = "10.1.0.0/16"
}
# Availability Zones
variable "availability_zones" {
description = "List of availability zones"
type = list(string)
default = ["us-east-1a", "us-east-1b"]
}
# EC2 Configuration
variable "instance_type" {
description = "EC2 instance type"
type = string
default = "t3.medium"
}
variable "spot_max_price" {
description = "Maximum price for spot instances"
type = string
default = "0.05" # Adjust based on current spot prices
}
variable "asg_min_size" {
description = "Minimum size of Auto Scaling Group"
type = number
default = 2
}
variable "asg_max_size" {
description = "Maximum size of Auto Scaling Group"
type = number
default = 6
}
variable "asg_desired_capacity" {
description = "Desired capacity of Auto Scaling Group"
type = number
default = 2
}
# RDS Configuration
variable "db_instance_class" {
description = "RDS instance class"
type = string
default = "db.t3.medium"
}
variable "db_name" {
description = "Database name"
type = string
default = "appdb"
}
variable "db_username" {
description = "Database master username"
type = string
default = "admin"
sensitive = true
}
variable "db_password" {
description = "Database master password"
type = string
sensitive = true
}
variable "db_allocated_storage" {
description = "Allocated storage for RDS (GB)"
type = number
default = 100
}
Create terraform.tfvars:
# terraform.tfvars
aws_region = "us-east-1"
environment = "production"
project_name = "my-app"
# Database credentials (use AWS Secrets Manager in production!)
db_username = "admin"
db_password = "ChangeMe123!" # NEVER commit real passwords!
# Spot instance pricing (check current spot prices)
spot_max_price = "0.05"
Step 3: Public VPC Configuration
Create vpc-public.tf:
# vpc-public.tf
# Public VPC - For frontend/jumphost
resource "aws_vpc" "public" {
cidr_block = var.public_vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.project_name}-public-vpc"
Tier = "public"
}
}
# Internet Gateway for public VPC
resource "aws_internet_gateway" "public" {
vpc_id = aws_vpc.public.id
tags = {
Name = "${var.project_name}-public-igw"
}
}
# Public Subnets (for ALB and frontend EC2)
resource "aws_subnet" "public" {
count = length(var.availability_zones)
vpc_id = aws_vpc.public.id
cidr_block = cidrsubnet(var.public_vpc_cidr, 8, count.index + 1)
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.project_name}-public-subnet-${count.index + 1}"
Tier = "public"
}
}
# Route Table for public subnets
resource "aws_route_table" "public" {
vpc_id = aws_vpc.public.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.public.id
}
# Route to private VPC via peering
route {
cidr_block = var.private_vpc_cidr
vpc_peering_connection_id = aws_vpc_peering_connection.public_to_private.id
}
tags = {
Name = "${var.project_name}-public-rt"
}
}
# Associate route table with public subnets
resource "aws_route_table_association" "public" {
count = length(aws_subnet.public)
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
Step 4: Private VPC Configuration
Create vpc-private.tf:
# vpc-private.tf
# Private VPC - For backend applications and database
resource "aws_vpc" "private" {
cidr_block = var.private_vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.project_name}-private-vpc"
Tier = "private"
}
}
# Private Subnets for application tier
resource "aws_subnet" "private_app" {
count = length(var.availability_zones)
vpc_id = aws_vpc.private.id
cidr_block = cidrsubnet(var.private_vpc_cidr, 8, count.index + 10)
availability_zone = var.availability_zones[count.index]
tags = {
Name = "${var.project_name}-private-app-subnet-${count.index + 1}"
Tier = "private-app"
}
}
# Private Subnets for database tier
resource "aws_subnet" "private_db" {
count = length(var.availability_zones)
vpc_id = aws_vpc.private.id
cidr_block = cidrsubnet(var.private_vpc_cidr, 8, count.index + 20)
availability_zone = var.availability_zones[count.index]
tags = {
Name = "${var.project_name}-private-db-subnet-${count.index + 1}"
Tier = "private-db"
}
}
# Route Table for private subnets
resource "aws_route_table" "private" {
vpc_id = aws_vpc.private.id
# Route to public VPC via peering
route {
cidr_block = var.public_vpc_cidr
vpc_peering_connection_id = aws_vpc_peering_connection.public_to_private.id
}
tags = {
Name = "${var.project_name}-private-rt"
}
}
# Associate route table with private app subnets
resource "aws_route_table_association" "private_app" {
count = length(aws_subnet.private_app)
subnet_id = aws_subnet.private_app[count.index].id
route_table_id = aws_route_table.private.id
}
# Associate route table with private DB subnets
resource "aws_route_table_association" "private_db" {
count = length(aws_subnet.private_db)
subnet_id = aws_subnet.private_db[count.index].id
route_table_id = aws_route_table.private.id
}
# DB Subnet Group for RDS
resource "aws_db_subnet_group" "main" {
name = "${var.project_name}-db-subnet-group"
subnet_ids = aws_subnet.private_db[*].id
tags = {
Name = "${var.project_name}-db-subnet-group"
}
}
Step 5: VPC Peering Configuration
Create vpc-peering.tf:
# vpc-peering.tf
# VPC Peering Connection between public and private VPCs
resource "aws_vpc_peering_connection" "public_to_private" {
vpc_id = aws_vpc.public.id
peer_vpc_id = aws_vpc.private.id
auto_accept = true
tags = {
Name = "${var.project_name}-public-to-private-peering"
}
}
# Note: Routes are defined in the VPC route tables above
Step 6: Security Groups
Create security-groups.tf:
# security-groups.tf
# Security Group for ALB (Public)
resource "aws_security_group" "alb" {
name = "${var.project_name}-alb-sg"
description = "Security group for Application Load Balancer"
vpc_id = aws_vpc.public.id
ingress {
description = "HTTPS from internet"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "HTTP from internet"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
description = "All outbound traffic"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project_name}-alb-sg"
}
}
# Security Group for Frontend EC2 (Public VPC)
resource "aws_security_group" "frontend" {
name = "${var.project_name}-frontend-sg"
description = "Security group for frontend EC2 instances"
vpc_id = aws_vpc.public.id
ingress {
description = "HTTP from ALB"
from_port = 80
to_port = 80
protocol = "tcp"
security_groups = [aws_security_group.alb.id]
}
ingress {
description = "SSH from specific IP (replace with your IP)"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"] # CHANGE THIS to your IP!
}
egress {
description = "All outbound traffic"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project_name}-frontend-sg"
}
}
# Security Group for Backend EC2 (Private VPC)
resource "aws_security_group" "backend" {
name = "${var.project_name}-backend-sg"
description = "Security group for backend EC2 instances"
vpc_id = aws_vpc.private.id
ingress {
description = "HTTP from public VPC"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = [var.public_vpc_cidr]
}
ingress {
description = "Custom app port from public VPC"
from_port = 8080
to_port = 8080
protocol = "tcp"
cidr_blocks = [var.public_vpc_cidr]
}
egress {
description = "All outbound traffic"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project_name}-backend-sg"
}
}
# Security Group for RDS
resource "aws_security_group" "rds" {
name = "${var.project_name}-rds-sg"
description = "Security group for RDS database"
vpc_id = aws_vpc.private.id
ingress {
description = "PostgreSQL from backend instances"
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.backend.id]
}
egress {
description = "All outbound traffic"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project_name}-rds-sg"
}
}
# Security Group for ECS Fargate
resource "aws_security_group" "ecs_fargate" {
name = "${var.project_name}-ecs-fargate-sg"
description = "Security group for ECS Fargate tasks"
vpc_id = aws_vpc.private.id
ingress {
description = "Allow traffic from backend instances"
from_port = 24224 # Fluent Bit default port
to_port = 24224
protocol = "tcp"
security_groups = [aws_security_group.backend.id]
}
egress {
description = "All outbound traffic"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project_name}-ecs-fargate-sg"
}
}
Step 7: Application Load Balancer
Create alb.tf:
# alb.tf
# Application Load Balancer in public VPC
resource "aws_lb" "main" {
name = "${var.project_name}-alb"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.alb.id]
subnets = aws_subnet.public[*].id
enable_deletion_protection = false # Set to true in production
tags = {
Name = "${var.project_name}-alb"
}
}
# Target Group for frontend instances
resource "aws_lb_target_group" "frontend" {
name = "${var.project_name}-frontend-tg"
port = 80
protocol = "HTTP"
vpc_id = aws_vpc.public.id
health_check {
enabled = true
healthy_threshold = 2
interval = 30
matcher = "200"
path = "/health"
port = "traffic-port"
protocol = "HTTP"
timeout = 5
unhealthy_threshold = 2
}
tags = {
Name = "${var.project_name}-frontend-tg"
}
}
# ALB Listener (HTTP)
resource "aws_lb_listener" "http" {
load_balancer_arn = aws_lb.main.arn
port = "80"
protocol = "HTTP"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.frontend.arn
}
}
# Optional: HTTPS Listener (requires ACM certificate)
# resource "aws_lb_listener" "https" {
# load_balancer_arn = aws_lb.main.arn
# port = "443"
# protocol = "HTTPS"
# ssl_policy = "ELBSecurityPolicy-2016-08"
# certificate_arn = aws_acm_certificate.main.arn
#
# default_action {
# type = "forward"
# target_group_arn = aws_lb_target_group.frontend.arn
# }
# }
Step 8: EC2 Launch Template and Auto Scaling Group
Create compute.tf:
# compute.tf
# Data source for latest Amazon Linux 2023 AMI
data "aws_ami" "amazon_linux_2023" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["al2023-ami-*-x86_64"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
}
# IAM Role for EC2 instances
resource "aws_iam_role" "ec2_role" {
name = "${var.project_name}-ec2-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}
]
})
}
# Attach policies for CloudWatch, SSM, ECR
resource "aws_iam_role_policy_attachment" "ec2_cloudwatch" {
role = aws_iam_role.ec2_role.name
policy_arn = "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy"
}
resource "aws_iam_role_policy_attachment" "ec2_ssm" {
role = aws_iam_role.ec2_role.name
policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}
# Instance Profile
resource "aws_iam_instance_profile" "ec2_profile" {
name = "${var.project_name}-ec2-profile"
role = aws_iam_role.ec2_role.name
}
# Launch Template for Backend EC2 (Spot Instances)
resource "aws_launch_template" "backend" {
name_prefix = "${var.project_name}-backend-"
image_id = data.aws_ami.amazon_linux_2023.id
instance_type = var.instance_type
iam_instance_profile {
name = aws_iam_instance_profile.ec2_profile.name
}
vpc_security_group_ids = [aws_security_group.backend.id]
# Spot instance configuration
instance_market_options {
market_type = "spot"
spot_options {
max_price = var.spot_max_price
spot_instance_type = "one-time"
}
}
user_data = base64encode(<<-EOF
#!/bin/bash
# Update system
yum update -y
# Install Docker
yum install -y docker
systemctl start docker
systemctl enable docker
# Install CloudWatch agent
wget https://s3.amazonaws.com/amazoncloudwatch-agent/amazon_linux/amd64/latest/amazon-cloudwatch-agent.rpm
rpm -U ./amazon-cloudwatch-agent.rpm
# Configure Fluent Bit for log forwarding
cat > /etc/fluent-bit/fluent-bit.conf <<EOL
[SERVICE]
Flush 5
Daemon Off
Log_Level info
[INPUT]
Name tail
Path /var/log/app/*.log
Parser json
Tag app.logs
[OUTPUT]
Name forward
Match *
Host ${aws_service_discovery_service.fluent.name}.${aws_service_discovery_private_dns_namespace.main.name}
Port 24224
EOL
# Start application (example)
mkdir -p /var/log/app
docker run -d \
-p 8080:8080 \
-v /var/log/app:/var/log/app \
--name backend-app \
your-backend-app:latest
# Send success signal
echo "Instance initialized successfully" > /var/log/user-data.log
EOF
)
tag_specifications {
resource_type = "instance"
tags = {
Name = "${var.project_name}-backend-instance"
}
}
}
# Auto Scaling Group for Backend
resource "aws_autoscaling_group" "backend" {
name = "${var.project_name}-backend-asg"
vpc_zone_identifier = aws_subnet.private_app[*].id
min_size = var.asg_min_size
max_size = var.asg_max_size
desired_capacity = var.asg_desired_capacity
launch_template {
id = aws_launch_template.backend.id
version = "$Latest"
}
health_check_type = "EC2"
health_check_grace_period = 300
tag {
key = "Name"
value = "${var.project_name}-backend-asg"
propagate_at_launch = true
}
tag {
key = "Environment"
value = var.environment
propagate_at_launch = true
}
}
# Auto Scaling Policy (Target Tracking - CPU)
resource "aws_autoscaling_policy" "backend_cpu" {
name = "${var.project_name}-backend-cpu-scaling"
autoscaling_group_name = aws_autoscaling_group.backend.name
policy_type = "TargetTrackingScaling"
target_tracking_configuration {
predefined_metric_specification {
predefined_metric_type = "ASGAverageCPUUtilization"
}
target_value = 70.0
}
}
# Frontend EC2 Instance (Single instance for demo)
resource "aws_instance" "frontend" {
ami = data.aws_ami.amazon_linux_2023.id
instance_type = "t3.small"
subnet_id = aws_subnet.public[0].id
vpc_security_group_ids = [aws_security_group.frontend.id]
iam_instance_profile = aws_iam_instance_profile.ec2_profile.name
user_data = base64encode(<<-EOF
#!/bin/bash
yum update -y
yum install -y httpd
systemctl start httpd
systemctl enable httpd
# Simple health check page
echo "<h1>Frontend Server</h1>" > /var/www/html/index.html
echo "OK" > /var/www/html/health
EOF
)
tags = {
Name = "${var.project_name}-frontend"
}
}
# Register frontend instance with target group
resource "aws_lb_target_group_attachment" "frontend" {
target_group_arn = aws_lb_target_group.frontend.arn
target_id = aws_instance.frontend.id
port = 80
}
Step 9: RDS PostgreSQL Database
Create rds.tf:
# rds.tf
# RDS PostgreSQL with Multi-AZ
resource "aws_db_instance" "main" {
identifier = "${var.project_name}-db"
engine = "postgres"
engine_version = "15.4"
instance_class = var.db_instance_class
allocated_storage = var.db_allocated_storage
storage_type = "gp3"
storage_encrypted = true
db_name = var.db_name
username = var.db_username
password = var.db_password
db_subnet_group_name = aws_db_subnet_group.main.name
vpc_security_group_ids = [aws_security_group.rds.id]
# Multi-AZ for high availability
multi_az = true
# Backup configuration
backup_retention_period = 7
backup_window = "03:00-04:00"
maintenance_window = "mon:04:00-mon:05:00"
# Performance Insights
enabled_cloudwatch_logs_exports = ["postgresql", "upgrade"]
performance_insights_enabled = true
performance_insights_retention_period = 7
# Disable deletion protection for demo (enable in production!)
deletion_protection = false
skip_final_snapshot = true
tags = {
Name = "${var.project_name}-postgresql"
}
}
Step 10: ECS Fargate for Log Aggregation
Create ecs-fargate.tf:
# ecs-fargate.tf
# ECS Cluster
resource "aws_ecs_cluster" "main" {
name = "${var.project_name}-cluster"
setting {
name = "containerInsights"
value = "enabled"
}
tags = {
Name = "${var.project_name}-ecs-cluster"
}
}
# CloudWatch Log Group for ECS
resource "aws_cloudwatch_log_group" "ecs_logs" {
name = "/ecs/${var.project_name}-fluent-bit"
retention_in_days = 7
tags = {
Name = "${var.project_name}-ecs-logs"
}
}
# IAM Role for ECS Task Execution
resource "aws_iam_role" "ecs_task_execution_role" {
name = "${var.project_name}-ecs-task-execution-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ecs-tasks.amazonaws.com"
}
}
]
})
}
resource "aws_iam_role_policy_attachment" "ecs_task_execution_role_policy" {
role = aws_iam_role.ecs_task_execution_role.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
# IAM Role for ECS Task
resource "aws_iam_role" "ecs_task_role" {
name = "${var.project_name}-ecs-task-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ecs-tasks.amazonaws.com"
}
}
]
})
}
# Policy to allow writing to CloudWatch Logs
resource "aws_iam_role_policy" "ecs_task_cloudwatch" {
name = "${var.project_name}-ecs-cloudwatch-policy"
role = aws_iam_role.ecs_task_role.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
]
Resource = "*"
}
]
})
}
# ECS Task Definition for Fluent Bit
resource "aws_ecs_task_definition" "fluent_bit" {
family = "${var.project_name}-fluent-bit"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = "512"
memory = "1024"
execution_role_arn = aws_iam_role.ecs_task_execution_role.arn
task_role_arn = aws_iam_role.ecs_task_role.arn
container_definitions = jsonencode([
{
name = "fluent-bit"
image = "public.ecr.aws/aws-observability/aws-for-fluent-bit:latest"
portMappings = [
{
containerPort = 24224
protocol = "tcp"
}
]
environment = [
{
name = "AWS_REGION"
value = var.aws_region
}
]
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = aws_cloudwatch_log_group.ecs_logs.name
"awslogs-region" = var.aws_region
"awslogs-stream-prefix" = "fluent-bit"
}
}
firelensConfiguration = {
type = "fluentbit"
options = {
"config-file-type" = "file"
"config-file-value" = "/fluent-bit/etc/fluent-bit.conf"
}
}
}
])
tags = {
Name = "${var.project_name}-fluent-bit-task"
}
}
# Service Discovery Namespace
resource "aws_service_discovery_private_dns_namespace" "main" {
name = "${var.project_name}.local"
vpc = aws_vpc.private.id
tags = {
Name = "${var.project_name}-service-discovery"
}
}
# Service Discovery Service
resource "aws_service_discovery_service" "fluent" {
name = "fluent-bit"
dns_config {
namespace_id = aws_service_discovery_private_dns_namespace.main.id
dns_records {
ttl = 10
type = "A"
}
routing_policy = "MULTIVALUE"
}
health_check_custom_config {
failure_threshold = 1
}
tags = {
Name = "${var.project_name}-fluent-bit-discovery"
}
}
# ECS Service for Fluent Bit
resource "aws_ecs_service" "fluent_bit" {
name = "${var.project_name}-fluent-bit-service"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.fluent_bit.arn
desired_count = 2
launch_type = "FARGATE"
network_configuration {
subnets = aws_subnet.private_app[*].id
security_groups = [aws_security_group.ecs_fargate.id]
assign_public_ip = false
}
service_registries {
registry_arn = aws_service_discovery_service.fluent.arn
}
tags = {
Name = "${var.project_name}-fluent-bit-service"
}
}
Step 11: Outputs
Create outputs.tf:
# outputs.tf
output "alb_dns_name" {
description = "DNS name of the Application Load Balancer"
value = aws_lb.main.dns_name
}
output "public_vpc_id" {
description = "ID of the public VPC"
value = aws_vpc.public.id
}
output "private_vpc_id" {
description = "ID of the private VPC"
value = aws_vpc.private.id
}
output "rds_endpoint" {
description = "RDS instance endpoint"
value = aws_db_instance.main.endpoint
sensitive = true
}
output "ecs_cluster_name" {
description = "Name of the ECS cluster"
value = aws_ecs_cluster.main.name
}
output "frontend_instance_id" {
description = "ID of the frontend EC2 instance"
value = aws_instance.frontend.id
}
output "backend_asg_name" {
description = "Name of the backend Auto Scaling Group"
value = aws_autoscaling_group.backend.name
}
Deployment Steps
Initialize Terraform
# Navigate to project directory
cd terraform-aws-multi-vpc
# Initialize Terraform (downloads providers)
terraform init
Validate Configuration
# Validate syntax
terraform validate
# Format code
terraform fmt -recursive
Plan Deployment
# See what will be created
terraform plan
# Save plan to file
terraform plan -out=tfplan
Apply Configuration
# Apply changes
terraform apply
# Or apply saved plan
terraform apply tfplan
# Type 'yes' when prompted
Verify Resources
# Check outputs
terraform output
# Get ALB DNS
terraform output alb_dns_name
# Get RDS endpoint
terraform output -raw rds_endpoint
Access Application
# Get ALB URL
ALB_URL=$(terraform output -raw alb_dns_name)
# Test frontend
curl http://$ALB_URL
# Wait for DNS propagation (may take a few minutes)
Common Issues and Troubleshooting
Issue 1: “Error creating VPC Peering Connection”
Symptoms:
Error: error creating VPC Peering Connection: InvalidVpcPeeringConnectionID.NotFound
Cause: VPCs not in the same region or auto_accept failed.
Solution:
# Ensure both VPCs are in same region
# Check if you need separate accepter configuration
resource "aws_vpc_peering_connection_accepter" "peer" {
vpc_peering_connection_id = aws_vpc_peering_connection.public_to_private.id
auto_accept = true
}
Issue 2: “Spot Instance Interrupted”
Symptoms:
EC2 instances in ASG keep terminating
CloudWatch logs: "Spot instance interrupted"
Cause: Spot price exceeded max price or capacity unavailable.
Solution:
# Option 1: Increase max price
variable "spot_max_price" {
default = "0.10" # Check current spot prices
}
# Option 2: Use multiple instance types
resource "aws_launch_template" "backend" {
instance_type = var.instance_type
# Add mixed instances policy
instance_requirements {
memory_mib {
min = 4096
}
vcpu_count {
min = 2
}
}
}
# Option 3: Mix spot and on-demand
resource "aws_autoscaling_group" "backend" {
mixed_instances_policy {
instances_distribution {
on_demand_base_capacity = 1
on_demand_percentage_above_base_capacity = 25
spot_allocation_strategy = "capacity-optimized"
}
launch_template {
launch_template_specification {
launch_template_id = aws_launch_template.backend.id
}
override {
instance_type = "t3.medium"
}
override {
instance_type = "t3a.medium"
}
}
}
}
Issue 3: “RDS Connection Timeout”
Symptoms:
Application can't connect to RDS
Error: "Connection timed out"
Diagnosis:
# From backend EC2 instance
nc -zv <rds-endpoint> 5432
# Check security group rules
aws ec2 describe-security-groups --group-ids <rds-sg-id>
# Check route tables
aws ec2 describe-route-tables --filters "Name=vpc-id,Values=<vpc-id>"
Solutions:
# 1. Verify security group allows traffic
resource "aws_security_group" "rds" {
ingress {
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.backend.id] # ✓ Correct
# cidr_blocks = ["10.1.0.0/16"] # ✗ Less secure
}
}
# 2. Ensure RDS is in private subnets
resource "aws_db_instance" "main" {
db_subnet_group_name = aws_db_subnet_group.main.name
publicly_accessible = false # Important!
}
# 3. Check DNS resolution
resource "aws_vpc" "private" {
enable_dns_hostnames = true # Must be true
enable_dns_support = true # Must be true
}
Issue 4: “ECS Task Failed to Start”
Symptoms:
ECS tasks stuck in PENDING state
Events: "CannotPullContainerError"
Diagnosis:
# Check ECS task status
aws ecs describe-tasks --cluster <cluster-name> --tasks <task-arn>
# Check CloudWatch logs
aws logs tail /ecs/<project>-fluent-bit --follow
Solutions:
# 1. Ensure NAT Gateway or VPC endpoints for ECR
# Option A: NAT Gateway (costs $)
resource "aws_nat_gateway" "private" {
allocation_id = aws_eip.nat.id
subnet_id = aws_subnet.public[0].id
}
# Option B: VPC Endpoints (no internet needed)
resource "aws_vpc_endpoint" "ecr_api" {
vpc_id = aws_vpc.private.id
service_name = "com.amazonaws.${var.aws_region}.ecr.api"
vpc_endpoint_type = "Interface"
subnet_ids = aws_subnet.private_app[*].id
security_group_ids = [aws_security_group.vpc_endpoints.id]
}
resource "aws_vpc_endpoint" "ecr_dkr" {
vpc_id = aws_vpc.private.id
service_name = "com.amazonaws.${var.aws_region}.ecr.dkr"
vpc_endpoint_type = "Interface"
subnet_ids = aws_subnet.private_app[*].id
security_group_ids = [aws_security_group.vpc_endpoints.id]
}
# 2. Verify IAM role has ECR permissions
resource "aws_iam_role_policy_attachment" "ecs_task_execution_role_policy" {
role = aws_iam_role.ecs_task_execution_role.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
Issue 5: “Terraform State Lock”
Symptoms:
Error: Error acquiring the state lock
Lock Info:
ID: xxxxx
Operation: OperationTypeApply
Solution:
# Check if another terraform process is running
ps aux | grep terraform
# If stuck, force unlock (use carefully!)
terraform force-unlock <lock-id>
# Better: Use remote state with locking
# In providers.tf:
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "multi-vpc/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-state-lock"
encrypt = true
}
}
Cost Optimization Tips
1. Use Spot Instances Wisely
# Mix spot and on-demand for reliability
mixed_instances_policy {
instances_distribution {
on_demand_base_capacity = 1 # 1 on-demand always
on_demand_percentage_above_base_capacity = 25 # 25% on-demand, 75% spot
spot_allocation_strategy = "capacity-optimized"
}
}
# Use multiple instance types to increase spot availability
override {
instance_type = "t3.medium"
}
override {
instance_type = "t3a.medium" # AMD variant, usually cheaper
}
override {
instance_type = "t2.medium" # Previous generation
}
2. Right-Size Your Resources
# Start small, scale up based on metrics
variable "instance_type" {
default = "t3.small" # Instead of t3.large
}
variable "db_instance_class" {
default = "db.t3.medium" # Instead of db.r5.large
}
# Use auto-scaling to handle peaks
resource "aws_autoscaling_policy" "backend_cpu" {
target_tracking_configuration {
target_value = 70.0 # Scale at 70% CPU
}
}
3. Use VPC Endpoints Instead of NAT Gateway
# NAT Gateway costs ~$32/month + data transfer
# VPC Endpoints cost ~$7/month per endpoint (no data charges)
# For private subnets accessing AWS services:
resource "aws_vpc_endpoint" "s3" {
vpc_id = aws_vpc.private.id
service_name = "com.amazonaws.${var.aws_region}.s3"
route_table_ids = [aws_route_table.private.id]
}
4. Enable RDS Auto-Pause (for Aurora)
# If using Aurora Serverless
resource "aws_rds_cluster" "main" {
engine_mode = "serverless"
scaling_configuration {
auto_pause = true
max_capacity = 4
min_capacity = 2
seconds_until_auto_pause = 300 # Pause after 5 min idle
}
}
5. Use gp3 Instead of gp2
# gp3 is ~20% cheaper than gp2
resource "aws_db_instance" "main" {
storage_type = "gp3" # Instead of "gp2"
}
6. Set Lifecycle Rules for CloudWatch Logs
resource "aws_cloudwatch_log_group" "ecs_logs" {
retention_in_days = 7 # Instead of infinite retention
}
Current Cost Estimate
Based on us-east-1 pricing (approximate monthly costs):
| Resource | Configuration | Estimated Cost |
|---|---|---|
| EC2 (Spot) | 2x t3.medium (75% spot) | $25 |
| RDS | db.t3.medium Multi-AZ | $85 |
| ALB | 1x ALB | $16 |
| ECS Fargate | 2x tasks (512 CPU, 1GB) | $15 |
| Data Transfer | 100 GB/month | $9 |
| CloudWatch | Logs + Metrics | $5 |
| Total | ~$155/month |
Clean Up Resources
Important: Always destroy resources when done to avoid charges!
# Destroy all resources
terraform destroy
# Review what will be deleted
terraform destroy -auto-approve=false
# Target specific resources
terraform destroy -target=aws_db_instance.main
# If destroy fails, try:
# 1. Disable deletion protection
terraform apply -var="db_deletion_protection=false"
# 2. Then destroy
terraform destroy
Best Practices Summary
Security
- ✅ Use security groups with least privilege
- ✅ Enable encryption at rest (RDS, EBS)
- ✅ Use IAM roles instead of access keys
- ✅ Store secrets in AWS Secrets Manager
- ✅ Enable VPC Flow Logs
- ✅ Use private subnets for backend/database
High Availability
- ✅ Multi-AZ deployment for RDS
- ✅ Auto Scaling Groups across 2+ AZs
- ✅ Application Load Balancer with health checks
- ✅ Multiple ECS tasks for log aggregation
Operational Excellence
- ✅ Enable CloudWatch monitoring
- ✅ Use remote state with locking (S3 + DynamoDB)
- ✅ Tag all resources consistently
- ✅ Use modules for reusability
- ✅ Version your Terraform code in Git
Cost Optimization
- ✅ Use Spot instances with on-demand mix
- ✅ Right-size instances based on metrics
- ✅ Use VPC endpoints instead of NAT
- ✅ Set CloudWatch log retention
- ✅ Use gp3 storage
Next Steps
- Add HTTPS: Configure ACM certificate and HTTPS listener
- Secrets Management: Move DB credentials to AWS Secrets Manager
- Monitoring: Set up CloudWatch dashboards and alarms
- CI/CD: Integrate with GitHub Actions or GitLab CI
- Backup: Configure automated backups and disaster recovery
- WAF: Add AWS WAF for application protection
Additional Resources
- Terraform AWS Provider Documentation
- AWS VPC Peering Guide
- ECS Fargate Best Practices
- RDS Best Practices
- Spot Instance Best Practices
This guide provides a solid foundation for building production-ready AWS infrastructure with Terraform!