Prerequisites
Before deploying PowerSync on AWS ECS, ensure you have:- AWS account with permissions for EC2, ECS, ALB, IAM and Secrets Manager
- AWS CLI installed and configured
- Understanding of the deployment architecture for production vs development setup
1. PowerSync Configuration
Create yourservice.yaml configuration file following the Self-Hosted Configuration Guide.
Your configuration must include:
- Sync Streams (or legacy Sync Rules): Define which data to sync to clients
- Client Auth: Your authentication provider’s JWKS
- Source Database: Connection details for your source database
-
Monitoring: Enable the Prometheus metrics endpoint for connection-based auto-scaling (used in the Auto Scaling section):
-
Bucket Storage: Connection details for your bucket storage database. PowerSync supports MongoDB or Postgres as bucket storage databases. In this guide, we focus on MongoDB.
- MongoDB Atlas
- Self-Hosted MongoDB on EC2
For bucket storage, we recommend configuring an AWS PrivateLink to establish a secure, private connection between your ECS tasks and MongoDB Atlas that doesn’t traverse the public internet.Follow the AWS PrivateLink guide for MongoDB Atlas to configure the VPC endpoint and update your MongoDB connection string to use the private endpoint. As seen in the Secrets Manager setup, use the updated connection string in yourPS_MONGO_URIsecret.
2. VPC and Networking Setup
This guide uses bash variables throughout for easy copy-paste execution.VPC Architecture Overview
PowerSync on ECS requires a VPC with both public and private subnets:- Public subnets: Host the Application Load Balancer (ALB) and NAT Gateway with direct internet access
- Private subnets: Host ECS tasks for security, with outbound-only internet access via NAT Gateway
Check Existing Subnets
MapPublicIpOnLaunch is True, those are public subnets. Save the public subnet IDs:
Create Private Subnets
Create two private subnets in different availability zones for high availability:NAT Gateway Setup
ECS tasks in private subnets need outbound internet access for:- Pulling container images from Amazon ECR
- Fetching JWKS for authentication (if applicable in your client authentication setup)
- Connecting to external services
Create Security Groups
3. Application Load Balancer
Domain Setup
PowerSync requires a domain name for SSL certificate provisioning. You can either:- Use an existing domain by creating a Route 53 hosted zone and updating your registrar’s nameservers
- Register a new domain directly through Route 53
- Configuring DNS routing for a new domain - For existing domains
- Registering a new domain - To register through Route 53
Request SSL Certificate
For secure HTTPS connections, request an SSL certificate using AWS Certificate Manager (ACM):- External DNS Provider
- Route 53 DNS
Add the
CNAME record using your DNS provider’s management console:| Type | Name | Value | TTL |
|---|---|---|---|
CNAME | [VALIDATION_NAME] | [VALIDATION_VALUE] | 300 |
Create ALB
4. DNS Configuration
Point your domain to the load balancer:- External DNS Provider
- Using Route 53
Create a
CNAME record pointing to the ALB DNS name.| Type | Name | Value | TTL |
|---|---|---|---|
CNAME | powersync.yourdomain.com | [ALB_DNS] | 300 |
5. Secrets Manager
Store your PowerSync configuration and connection strings securely in AWS Secrets Manager. This allows you to reference them in your ECS task definition without hardcoding sensitive information.AWS Secrets Manager automatically appends a 6-character suffix to secret ARNs (e.g.,
powersync/config-AbCdEf).ECS task definitions support prefix matching, allowing you to reference secrets using just the base name:- Created as:
powersync/config-AbCdEf(with suffix) - Referenced as:
arn:aws:secretsmanager:region:account:secret:powersync/config(without suffix)
6. ECS Task Definition
The ECS task definition specifies how to run the PowerSync container, including environment variables, secrets, resource limits, and health checks.Create IAM Role
Create Cluster
Register Task Definition
The task definitions below allocate 2 vCPU and 4GB memory per container. You can adjust resources based on your workload — see Deployment Architecture for scaling guidance (recommended baseline: 1 vCPU, 2GB memory). Note that AWS Fargate enforces specific CPU/memory combinations — for example, 2 vCPU (2048 CPU units) requires at least 4GB (4096 MiB) memory.- High Availability Setup
- Basic Setup (Single Instance)
For production deployments, run separate replication and API processes to enable zero-downtime rolling updates. This allows independent scaling of API containers.Create Replication Task DefinitionCreate API Task DefinitionThe API task definition includes a CloudWatch Agent sidecar that scrapes Prometheus metrics from the PowerSync container and publishes them to CloudWatch. This enables connection-based auto-scaling.First, create the CloudWatch Agent configuration. This tells the agent to scrape the PowerSync Prometheus endpoint on Store the CloudWatch Agent config in SSM Parameter Store:Now create the API task definition with both the PowerSync container and the CloudWatch Agent sidecar:
The CloudWatch Agent sidecar adds ~256MB memory overhead. The task definition below allocates 4096MB total (shared between both containers). If you need more headroom, increase the task memory to 5120MB or 6144MB.
localhost:9090 and publish the powersync_concurrent_connections metric to CloudWatch:The Prometheus port (9090) is not exposed through the ALB — it is only accessible within the task via
localhost (ECS awsvpc networking). The CloudWatch Agent sidecar scrapes metrics locally every 30 seconds and publishes them to CloudWatch.7. Deploy ECS Service
Create the ECS service to run PowerSync tasks- High Availability Setup
- Basic Setup (Single Instance)
For production deployments, run separate replication and API processes to enable zero-downtime rolling updates. This allows independent scaling of API containers.Deploy Replication Service (1 Instance)Deploy API Service (2+ Instances)Verify HA Deployment:
Production Enhancements
For production deployments, consider adding the following enhancements:Daily Compact Job (Recommended)
PowerSync requires daily compaction to optimize bucket storage. Schedule it as an ECS task with EventBridge:Compact Job Configuration
Compact Job Configuration
Generate the compact task definition:Create IAM role for EventBridge and schedule with EventBridge (daily at 2 AM UTC):
Auto Scaling (High-Availability Setup)
PowerSync API containers are limited to 200 concurrent connections each, with a recommended target of 100 connections or less per container (see Deployment Architecture). Because PowerSync sync connections are long-lived (hours or days), CPU utilization alone may not reflect the actual connection load — a container can be near its connection limit while CPU remains relatively low. For this reason, we recommend scaling on both CPU utilization and concurrent connections.Prerequisites
Connection-based auto-scaling requires:- Prometheus metrics enabled in your
service.yaml(see Step 1): - CloudWatch Agent sidecar deployed in the API task definition (configured in Step 6). The sidecar scrapes the
powersync_concurrent_connectionsmetric from the PowerSync Prometheus endpoint and publishes it to CloudWatch under thePowerSyncnamespace. - IAM permissions for the task role to publish CloudWatch metrics (configured in Step 6).
Register Scalable Target
Set the minimum and maximum number of API tasks:min-capacity: We recommend at least 2 for high availability, ensuring your service stays available if one task fails. Auto-scaling handles load increases from there.max-capacity: Set this to the upper bound of tasks you want auto-scaling to provision.
Choosing your minimum capacity: A minimum of 2 works well for most workloads, letting auto-scaling adjust capacity as needed. However, if your traffic is very spiky (e.g., many users connecting simultaneously at a predictable time), you may want a higher
min-capacity to avoid waiting for new tasks to start. New Fargate tasks take 1-3 minutes to launch and pass health checks, so a larger baseline reduces the risk of connection overload during sudden spikes. As a guideline, each API task handles up to 200 concurrent connections (target ~100 for headroom).Scaling Policy 1: CPU Utilization
This policy scales based on average CPU utilization across API tasks:Scaling Policy 2: Concurrent Connections
This policy scales based on the average number of concurrent sync connections per task, using the custom metric published by the CloudWatch Agent sidecar:How dual policies work: Both policies operate independently — ECS scales to whichever policy demands the higher number of tasks. For example, if CPU-based scaling wants 3 tasks but connection-based scaling wants 5, ECS runs 5 tasks.
| Parameter | Value | Rationale |
|---|---|---|
TargetValue (connections) | 80 | 40% of the 200 max connection limit per container. This matches PowerSync Cloud’s scaling strategy and provides headroom before the hard limit. |
TargetValue (CPU) | 70.0 | Scale before CPU saturation impacts sync stream performance. |
ScaleOutCooldown | 120s | New Fargate tasks take 1–3 minutes to start, pass health checks, and begin accepting connections. A shorter cooldown risks triggering multiple scale-out events before the first new task is ready. |
ScaleInCooldown | 300s | Prevents rapid scale-in oscillations. When a task is removed, its clients reconnect to remaining tasks, causing a temporary connection spike. The cooldown allows this spike to settle. |
Scale-In Behavior
Scaling in (removing tasks) terminates active sync connections on the affected tasks. PowerSync client SDKs handle reconnection automatically, but there will be a brief interruption for affected clients. What happens during scale-in:- ECS deregisters the task from the ALB target group — new connections are routed to other tasks
- The ALB deregistration delay allows existing connections to drain (default: 300s). Since syncs never complete naturally, connections are forcefully closed after this timeout.
- ECS sends
SIGTERMto the container — PowerSync closes all active sync connections gracefully - After the
stopTimeoutperiod (configured to 120s in the task definition), ECS sendsSIGKILL - Disconnected clients automatically reconnect to remaining healthy tasks
Verify Auto-Scaling
After configuring both policies, verify they are active:Alternative: CPU-Only Scaling (No Custom Metrics)
Alternative: CPU-Only Scaling (No Custom Metrics)
If you prefer not to set up the CloudWatch Agent sidecar and custom Prometheus metrics, you can scale based on CPU utilization alone:This approach is simpler but less responsive to connection spikes — CPU may not increase proportionally with new sync connections. Without connection-aware scaling, consider increasing
min-capacity if your traffic is spiky, to provide a larger baseline while auto-scaling reacts.Troubleshooting
| Symptom | Solution |
|---|---|
| Tasks fail health checks | Check logs: aws logs tail /ecs/powersync --followIncrease startPeriod in health check to 120 |
| 502 Bad Gateway | Verify security groups allow ALB→ECS on port 8080 Check tasks are running: aws ecs list-tasks --cluster powersync-cluster |
| WebSocket disconnects | Verify ALB idle timeout is 3600s (set in Step 3) |
| Can’t pull image | Verify NAT Gateway exists and route table configured correctly Check NAT Gateway has internet access |
| Secrets not loaded | Check IAM role has secretsmanager:GetSecretValue permissionVerify secrets exist: aws secretsmanager list-secrets |
| Sync Rule lock errors during deploy | Using multiple instances without HA setup Use High Availability Setup for production |
| CIDR block conflicts | Adjust CIDR blocks in Step 2 to match available VPC address space |
| Certificate validation fails | Verify DNS nameservers are updated and propagated Check validation CNAME record exists in Route 53 |
| CloudWatch metric not appearing | Verify telemetry.prometheus_port: 9090 is set in service.yamlCheck CW Agent logs: aws logs tail /ecs/powersync-api/cwagent --followConfirm the SSM parameter exists: aws ssm get-parameter --name /ecs/powersync/cwagent-config |
| Connection-based scaling not triggering | Verify metric in CloudWatch: aws cloudwatch list-metrics --namespace PowerSyncCheck the scaling policy: aws application-autoscaling describe-scaling-policies --service-namespace ecsMetric may take 2-3 minutes to appear after task startup |
| Clients disconnecting during scale-in | This is expected behavior — sync connections on terminated tasks are closed and clients reconnect automatically. Increase deregistration_delay.timeout_seconds on the target group for a longer drain period |
Additional Resources
- AWS ECS Best Practices - AWS’s official guide covering security, networking, monitoring, and performance optimization for ECS deployments
- Self-Host Demo Repository - Working example implementations of PowerSync self-hosting across different platforms and configurations