Version: 2.0 Last Updated: January 29, 2026 Target Scale: 10,000+ concurrent users, 99.9% uptime
Phase 1: 0-1,000 users (Current)
Phase 2: 1,000-5,000 users (6 months)
Phase 3: 5,000-10,000 users (12 months)
Phase 4: 10,000+ users (18+ months)
┌────────────────────────────────────────────────────────────┐
│ Performance SLAs │
├────────────────────────────────────────────────────────────┤
│ Uptime: 99.9% (8.7 hours downtime/year) │
│ API Response (p50): < 100ms │
│ API Response (p95): < 300ms │
│ API Response (p99): < 500ms │
│ WebSocket Latency: < 50ms │
│ Database Query: < 10ms (simple), < 100ms (complex) │
│ Trade Execution: < 2 seconds (signal to order) │
│ Dashboard Load: < 2 seconds (first contentful paint) │
└────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────┐
│ Current Deployment (Railway/Render) │
└────────────────────────────────────────────────────────────┘
Services:
- API Gateway (1 instance) 2 vCPU, 4GB RAM
- Trading Engine (1 instance) 4 vCPU, 8GB RAM
- Dashboard API (1 instance) 2 vCPU, 4GB RAM
- PostgreSQL (managed) 4 vCPU, 8GB RAM, 100GB SSD
- Redis (managed) 2 vCPU, 4GB RAM
Capacity:
- ~500 concurrent users
- ~50 requests/second
- ~5,000 trades/day
Bottlenecks:
- Single database (no read replicas)
- No caching layer (Redis underutilized)
- Single-region deployment
- Manual scaling only
┌────────────────────────────────────────────────────────────┐
│ Microservices Deployment │
└────────────────────────────────────────────────────────────┘
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ API Gateway │ │ API Gateway │ │ API Gateway │
│ Instance 1 │ │ Instance 2 │ │ Instance 3 │
│ (2 vCPU, 4GB) │ │ (2 vCPU, 4GB) │ │ (2 vCPU, 4GB) │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
└────────────────────┴────────────────────┘
│
┌────────────────────┴────────────────────┐
│ │
┌────────▼────────┐ ┌─────────────────┐ ┌──────▼──────────┐
│ Trading Engine │ │ Trading Engine │ │ Trading Engine │
│ Worker 1 │ │ Worker 2 │ │ Worker 3 │
│ (4 vCPU, 8GB) │ │ (4 vCPU, 8GB) │ │ (4 vCPU, 8GB) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Dashboard API │ │ Dashboard API │ │ Subscription │
│ Instance 1 │ │ Instance 2 │ │ Service │
│ (2 vCPU, 4GB) │ │ (2 vCPU, 4GB) │ │ (2 vCPU, 4GB) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
┌─────────────────┐ ┌─────────────────┐
│ WebSocket Srv │ │ WebSocket Srv │
│ Instance 1 │ │ Instance 2 │
│ (2 vCPU, 4GB) │ │ (2 vCPU, 4GB) │
└─────────────────┘ └─────────────────┘
API Gateway: 3-5 instances
Trading Engine: 3-10 instances
Dashboard API: 2-4 instances
WebSocket Service: 2-5 instances
Subscription Service: 2 instances
┌────────────────────────────────────────────────────────────┐
│ Recommended Server Specs (per instance) │
├─────────────────┬──────────┬──────────┬────────┬──────────┤
│ Service │ vCPU │ Memory │ Disk │ Network │
├─────────────────┼──────────┼──────────┼────────┼──────────┤
│ API Gateway │ 2 │ 4GB │ 20GB │ 1 Gbps │
│ Trading Engine │ 4 │ 8GB │ 50GB │ 1 Gbps │
│ Dashboard API │ 2 │ 4GB │ 20GB │ 1 Gbps │
│ WebSocket │ 2 │ 4GB │ 20GB │ 1 Gbps │
│ Subscription │ 2 │ 4GB │ 20GB │ 500Mbps │
│ PostgreSQL │ 8 │ 16GB │ 500GB │ 2 Gbps │
│ Redis │ 4 │ 8GB │ 50GB │ 1 Gbps │
│ TimescaleDB │ 4 │ 16GB │ 1TB │ 2 Gbps │
└─────────────────┴──────────┴──────────┴────────┴──────────┘
CPU-based:
70% sustained for 5 minutes → scale up
Memory-based:
80% sustained for 5 minutes → scale up
Request-based:
100 requests/second per instance → scale up
Custom Metrics:
┌────────────────────────────────────────────────────────────┐
│ PostgreSQL Replication │
└────────────────────────────────────────────────────────────┘
┌─────────────────┐
│ Master DB │
│ (Writes Only) │
│ 8 vCPU, 16GB │
└────────┬────────┘
│ Replication
┌────────┴────────┐
│ │
┌─────────▼──────┐ ┌──────▼─────────┐
│ Read Replica │ │ Read Replica │
│ 1 │ │ 2 │
│ 8 vCPU, 16GB │ │ 8 vCPU, 16GB │
└────────────────┘ └────────────────┘
Read/Write Split:
- Writes → Master
- Reads → Load balanced across replicas
- Replication lag: <1 second
┌────────────────────────────────────────────────────────────┐
│ Database Sharding by User ID │
└────────────────────────────────────────────────────────────┘
Shard 1 (Users 0-2499) Shard 2 (Users 2500-4999)
┌─────────────────┐ ┌─────────────────┐
│ Master │ │ Master │
│ + 2 Replicas │ │ + 2 Replicas │
└─────────────────┘ └─────────────────┘
Shard 3 (Users 5000-7499) Shard 4 (Users 7500-9999)
┌─────────────────┐ ┌─────────────────┐
│ Master │ │ Master │
│ + 2 Replicas │ │ + 2 Replicas │
└─────────────────┘ └─────────────────┘
Sharding Key: user_id % 4
Shard Router: PgPool-II or application-level
┌────────────────────────────────────────────────────────────┐
│ Redis Cluster (6 nodes) │
└────────────────────────────────────────────────────────────┘
Master 1 (Slots 0-5461) Replica 1
┌─────────────────┐ ┌─────────────────┐
│ 4 vCPU, 8GB │──────────────│ 4 vCPU, 8GB │
└─────────────────┘ └─────────────────┘
Master 2 (Slots 5462-10922) Replica 2
┌─────────────────┐ ┌─────────────────┐
│ 4 vCPU, 8GB │──────────────│ 4 vCPU, 8GB │
└─────────────────┘ └─────────────────┘
Master 3 (Slots 10923-16383) Replica 3
┌─────────────────┐ ┌─────────────────┐
│ 4 vCPU, 8GB │──────────────│ 4 vCPU, 8GB │
└─────────────────┘ └─────────────────┘
Features:
- Automatic failover
- Data sharding across masters
- High availability
┌────────────────────────────────────────────────────────────┐
│ TimescaleDB for Trade Analytics │
└────────────────────────────────────────────────────────────┘
Hypertables:
- trades (1-day chunks)
- positions (1-day chunks)
- market_data (1-hour chunks)
Compression:
- Compress data older than 7 days
- 10x compression ratio
- Reduces storage from 1TB → 100GB
Retention Policy:
- Keep detailed data: 90 days
- Keep aggregated data: 2 years
- Delete data older than 2 years
Continuous Aggregates:
- Hourly summaries (trades, P&L)
- Daily summaries
- Weekly summaries
- Refresh every 1 hour
┌────────────────────────────────────────────────────────────┐
│ Cache Hierarchy │
└────────────────────────────────────────────────────────────┘
Level 1: Browser Cache (Client-side)
- Static assets (JS, CSS, images)
- Duration: 7 days
- Invalidation: Hash-based URLs
Level 2: CDN Cache (Cloudflare)
- API responses (GET only)
- Static files
- Duration: 5 minutes (API), 7 days (static)
- Purge on deploy
Level 3: Redis Cache (Server-side)
- User sessions
- Active positions
- Market data
- User settings
- Duration: 30 seconds - 5 minutes
Level 4: Database Query Cache
- PostgreSQL internal cache
- Query results
- Duration: Automatic
from redis import Redis
from functools import wraps
import pickle
redis_client = Redis(host='redis', port=6379, db=0)
def cache(ttl=300):
"""Cache decorator with TTL"""
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
# Generate cache key
cache_key = f"{func.__name__}:{pickle.dumps((args, kwargs))}"
# Try to get from cache
cached = redis_client.get(cache_key)
if cached:
return pickle.loads(cached)
# Execute function
result = func(*args, **kwargs)
# Store in cache
redis_client.setex(cache_key, ttl, pickle.dumps(result))
return result
return wrapper
return decorator
# Usage
@cache(ttl=60) # Cache for 60 seconds
def get_user_positions(user_id):
return db.query(Position).filter_by(user_id=user_id).all()
@cache(ttl=300) # Cache for 5 minutes
def get_user_performance(user_id, period='30d'):
return calculate_performance_metrics(user_id, period)
def invalidate_user_cache(user_id):
"""Invalidate all cache entries for a user"""
patterns = [
f"get_user_positions:{user_id}*",
f"get_user_performance:{user_id}*",
f"get_user_stats:{user_id}*"
]
for pattern in patterns:
keys = redis_client.keys(pattern)
if keys:
redis_client.delete(*keys)
# Invalidate on position change
@app.post("/api/v1/trading/close-position")
def close_position(user_id, position_id):
# Close position
broker.close_position(position_id)
# Invalidate cache
invalidate_user_cache(user_id)
return {"success": True}
┌────────────────────────────────────────────────────────────┐
│ Load Balancer Configuration │
└────────────────────────────────────────────────────────────┘
Algorithm: Round Robin (API), Least Connections (WebSocket)
Upstream: api_gateway
- api-1.nija.com:8000
- api-2.nija.com:8000
- api-3.nija.com:8000
Health Check:
- Endpoint: /health
- Interval: 10 seconds
- Timeout: 5 seconds
- Unhealthy threshold: 3 consecutive failures
Upstream: websocket
- ws-1.nija.com:8004
- ws-2.nija.com:8004
Sticky Sessions: IP hash (WebSocket)
upstream api_gateway {
least_conn;
server api-1.nija.com:8000 max_fails=3 fail_timeout=30s;
server api-2.nija.com:8000 max_fails=3 fail_timeout=30s;
server api-3.nija.com:8000 max_fails=3 fail_timeout=30s;
}
upstream websocket_servers {
ip_hash; # Sticky sessions
server ws-1.nija.com:8004 max_fails=3 fail_timeout=30s;
server ws-2.nija.com:8004 max_fails=3 fail_timeout=30s;
}
server {
listen 443 ssl http2;
server_name api.nija.com;
ssl_certificate /etc/nginx/ssl/nija.com.crt;
ssl_certificate_key /etc/nginx/ssl/nija.com.key;
# API Gateway
location /api/ {
proxy_pass http://api_gateway;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Timeouts
proxy_connect_timeout 10s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
}
# WebSocket
location /ws/ {
proxy_pass http://websocket_servers;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
# Timeouts
proxy_connect_timeout 7d;
proxy_send_timeout 7d;
proxy_read_timeout 7d;
}
}
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-gateway-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nija-api-gateway
minReplicas: 3
maxReplicas: 10
metrics:
# CPU-based scaling
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
# Memory-based scaling
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
# Request-based scaling
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 25
periodSeconds: 60
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: trading-engine-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nija-trading-engine
minReplicas: 3
maxReplicas: 20
metrics:
- type: External
external:
metric:
name: active_trading_users
selector:
matchLabels:
service: nija-trading-engine
target:
type: AverageValue
averageValue: "800" # 800 users per instance
behavior:
scaleUp:
stabilizationWindowSeconds: 120
policies:
- type: Pods
value: 2
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 600
policies:
- type: Pods
value: 1
periodSeconds: 120
┌────────────────────────────────────────────────────────────┐
│ Global Infrastructure │
└────────────────────────────────────────────────────────────┘
Primary Region: US-East (Virginia)
- Full stack (API, Trading, Database)
- PostgreSQL Master
- 60% of traffic
Secondary Region: US-West (Oregon)
- Full stack (API, Trading)
- PostgreSQL Read Replica
- 30% of traffic
Tertiary Region: EU-West (Ireland)
- API Gateway only
- PostgreSQL Read Replica
- 10% of traffic
Edge Locations: Cloudflare CDN (200+ POPs)
- Static assets
- API caching (short TTL)
- DDoS protection
┌────────────────────────────────────────────────────────────┐
│ GeoDNS Routing (Route 53 / Cloudflare) │
└────────────────────────────────────────────────────────────┘
User Location → Nearest Region
North America → US-East (primary) or US-West (failover)
Europe → EU-West (primary) or US-East (failover)
Asia → US-West (primary) or US-East (failover)
Health Checks:
- Check every 30 seconds
- Failover if unhealthy
- DNS TTL: 60 seconds
# 1. Database Query Optimization
# Bad: N+1 query problem
users = User.query.all()
for user in users:
trades = user.trades # Separate query for each user
# Good: Eager loading
users = User.query.options(joinedload(User.trades)).all()
# 2. Pagination
# Bad: Loading all data
trades = Trade.query.all() # Could be 100,000+ rows
# Good: Paginated
trades = Trade.query.paginate(page=1, per_page=50)
# 3. Field selection
# Bad: Returning entire objects
return trades
# Good: Return only needed fields
return [
{
'id': t.id,
'symbol': t.symbol,
'pnl': t.pnl
}
for t in trades
]
# Use binary protocol (MessagePack) instead of JSON
import msgpack
@websocket.on('position_update')
def send_position_update(position):
# JSON: ~500 bytes
# json_data = json.dumps(position)
# MessagePack: ~300 bytes (40% smaller)
msgpack_data = msgpack.packb(position)
websocket.send(msgpack_data, binary=True)
# Batch updates
updates_buffer = []
@websocket.on_tick()
def buffer_updates():
if len(updates_buffer) >= 10 or time_elapsed > 1.0:
websocket.send_batch(updates_buffer)
updates_buffer.clear()
from sqlalchemy.pool import QueuePool
engine = create_engine(
DATABASE_URL,
poolclass=QueuePool,
pool_size=20, # Max 20 connections per instance
max_overflow=10, # Allow 10 more during peak
pool_timeout=30, # Wait 30s for connection
pool_recycle=3600, # Recycle connections every hour
pool_pre_ping=True # Check connection before use
)
┌────────────────────────────────────────────────────────────┐
│ Monthly Cost Breakdown (AWS) │
├────────────────────────────────────────────────────────────┤
│ 1,000 users: │
│ - Compute (EC2): $800/month │
│ - Database (RDS): $400/month │
│ - Redis (ElastiCache): $200/month │
│ - CDN (CloudFront): $100/month │
│ - Load Balancer: $50/month │
│ Total: $1,550/month ($1.55/user/month) │
│ │
│ 5,000 users (with optimizations): │
│ - Compute: $2,500/month │
│ - Database: $1,200/month │
│ - Redis: $500/month │
│ - CDN: $300/month │
│ - Load Balancer: $100/month │
│ Total: $4,600/month ($0.92/user/month) │
│ │
│ 10,000 users (fully optimized): │
│ - Compute: $4,500/month │
│ - Database: $2,000/month │
│ - Redis: $800/month │
│ - CDN: $500/month │
│ - Load Balancer: $150/month │
│ Total: $7,950/month ($0.80/user/month) │
└────────────────────────────────────────────────────────────┘
1. Reserved Instances
2. Spot Instances
3. Right-Sizing
4. Storage Optimization
5. CDN Optimization
┌────────────────────────────────────────────────────────────┐
│ Backup Schedule │
├────────────────────────────────────────────────────────────┤
│ Database: │
│ - Full backup: Daily at 2 AM UTC │
│ - Incremental: Every 6 hours │
│ - Transaction log: Continuous (every 5 minutes) │
│ - Retention: 30 days │
│ │
│ Redis: │
│ - AOF (Append-Only File): Continuous │
│ - RDB snapshot: Every 6 hours │
│ - Retention: 7 days │
│ │
│ Code & Config: │
│ - Git repository (GitHub) │
│ - Docker images (ECR/Docker Hub) │
│ - Infrastructure as Code (Terraform) │
└────────────────────────────────────────────────────────────┘
RTO (Recovery Time Objective): 15 minutes RPO (Recovery Point Objective): 5 minutes
Scenario 1: Database Failure
Scenario 2: Region Failure
Scenario 3: Complete Data Loss
┌────────────────────────────────────────────────────────────┐
│ Alert Rules (PagerDuty / Slack) │
├────────────────────────────────────────────────────────────┤
│ Critical (Page immediately): │
│ - Service down (any region) │
│ - Database down │
│ - Error rate >5% │
│ - API p95 latency >1s │
│ │
│ Warning (Slack notification): │
│ - CPU >80% for 10 minutes │
│ - Memory >85% │
│ - Disk >90% │
│ - Error rate >1% │
│ │
│ Info (Log only): │
│ - Deployment completed │
│ - Auto-scaling event │
│ - Scheduled maintenance │
└────────────────────────────────────────────────────────────┘
Q1 2026: Phase 1 (0-1,000 users)
Q2 2026: Phase 2 (1,000-5,000 users)
Q3 2026: Phase 3 (5,000-10,000 users)
Q4 2026: Phase 4 (10,000+ users)
Business Metrics:
- Active users
- MRR/ARR
- Churn rate
- LTV:CAC
Technical Metrics:
- Uptime (target: 99.9%)
- API latency (p50, p95, p99)
- Error rate
- Request throughput
Infrastructure Metrics:
- CPU utilization
- Memory utilization
- Disk I/O
- Network bandwidth
- Database connections
Cost Metrics:
- Cost per user
- Cost per request
- Infrastructure spend vs revenue
Version: 2.0 Last Updated: January 29, 2026 Maintained By: NIJA Infrastructure Team