Nija

NIJA Scaling Blueprint - Infrastructure & Growth Strategy

Version: 2.0 Last Updated: January 29, 2026 Target Scale: 10,000+ concurrent users, 99.9% uptime

Overview
Current State
Horizontal Scaling
Vertical Scaling
Database Scaling
Caching Strategy
Load Balancing
Auto-Scaling
Geographic Distribution
Performance Optimization
Cost Optimization
Disaster Recovery

Overview

Scaling Goals

Phase 1: 0-1,000 users (Current)

Single-region deployment
Manual scaling
Basic monitoring

Phase 2: 1,000-5,000 users (6 months)

Multi-region deployment
Auto-scaling enabled
Advanced monitoring

Phase 3: 5,000-10,000 users (12 months)

Global CDN
Database sharding
Full redundancy

Phase 4: 10,000+ users (18+ months)

Multi-cloud deployment
Edge computing
AI-powered auto-scaling

Performance Targets

┌────────────────────────────────────────────────────────────┐
│  Performance SLAs                                          │
├────────────────────────────────────────────────────────────┤
│ Uptime:              99.9% (8.7 hours downtime/year)       │
│ API Response (p50):  < 100ms                               │
│ API Response (p95):  < 300ms                               │
│ API Response (p99):  < 500ms                               │
│ WebSocket Latency:   < 50ms                                │
│ Database Query:      < 10ms (simple), < 100ms (complex)    │
│ Trade Execution:     < 2 seconds (signal to order)         │
│ Dashboard Load:      < 2 seconds (first contentful paint)  │
└────────────────────────────────────────────────────────────┘

Current State

Infrastructure (Phase 1)

┌────────────────────────────────────────────────────────────┐
│  Current Deployment (Railway/Render)                       │
└────────────────────────────────────────────────────────────┘

Services:
  - API Gateway (1 instance)        2 vCPU, 4GB RAM
  - Trading Engine (1 instance)     4 vCPU, 8GB RAM
  - Dashboard API (1 instance)      2 vCPU, 4GB RAM
  - PostgreSQL (managed)            4 vCPU, 8GB RAM, 100GB SSD
  - Redis (managed)                 2 vCPU, 4GB RAM

Capacity:
  - ~500 concurrent users
  - ~50 requests/second
  - ~5,000 trades/day

Bottlenecks:
  - Single database (no read replicas)
  - No caching layer (Redis underutilized)
  - Single-region deployment
  - Manual scaling only

Horizontal Scaling

Microservices Architecture

┌────────────────────────────────────────────────────────────┐
│  Microservices Deployment                                  │
└────────────────────────────────────────────────────────────┘

┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│  API Gateway    │  │  API Gateway    │  │  API Gateway    │
│   Instance 1    │  │   Instance 2    │  │   Instance 3    │
│  (2 vCPU, 4GB)  │  │  (2 vCPU, 4GB)  │  │  (2 vCPU, 4GB)  │
└────────┬────────┘  └────────┬────────┘  └────────┬────────┘
         │                    │                    │
         └────────────────────┴────────────────────┘
                              │
         ┌────────────────────┴────────────────────┐
         │                                         │
┌────────▼────────┐  ┌─────────────────┐  ┌──────▼──────────┐
│ Trading Engine  │  │ Trading Engine  │  │ Trading Engine  │
│   Worker 1      │  │   Worker 2      │  │   Worker 3      │
│  (4 vCPU, 8GB)  │  │  (4 vCPU, 8GB)  │  │  (4 vCPU, 8GB)  │
└─────────────────┘  └─────────────────┘  └─────────────────┘

┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│ Dashboard API   │  │ Dashboard API   │  │ Subscription    │
│   Instance 1    │  │   Instance 2    │  │   Service       │
│  (2 vCPU, 4GB)  │  │  (2 vCPU, 4GB)  │  │  (2 vCPU, 4GB)  │
└─────────────────┘  └─────────────────┘  └─────────────────┘

┌─────────────────┐  ┌─────────────────┐
│ WebSocket Srv   │  │ WebSocket Srv   │
│   Instance 1    │  │   Instance 2    │
│  (2 vCPU, 4GB)  │  │  (2 vCPU, 4GB)  │
└─────────────────┘  └─────────────────┘

Service Replication Strategy

API Gateway: 3-5 instances

Stateless (easy to scale)
Load balanced round-robin
Scale based on CPU (>70% → add instance)

Trading Engine: 3-10 instances

User-based sharding (hash user_id % num_instances)
Each instance manages ~500-1000 users
Scale based on active users

Dashboard API: 2-4 instances

Stateless
Cache-heavy (Redis)
Scale based on request rate

WebSocket Service: 2-5 instances

Sticky sessions (user always connects to same instance)
Scale based on concurrent connections
Each instance handles ~2000 concurrent connections

Subscription Service: 2 instances

Low traffic (billing events)
High redundancy for reliability
Failover configuration

Vertical Scaling

Resource Allocation by Service

┌────────────────────────────────────────────────────────────┐
│  Recommended Server Specs (per instance)                   │
├─────────────────┬──────────┬──────────┬────────┬──────────┤
│ Service         │  vCPU    │  Memory  │  Disk  │ Network  │
├─────────────────┼──────────┼──────────┼────────┼──────────┤
│ API Gateway     │    2     │   4GB    │  20GB  │  1 Gbps  │
│ Trading Engine  │    4     │   8GB    │  50GB  │  1 Gbps  │
│ Dashboard API   │    2     │   4GB    │  20GB  │  1 Gbps  │
│ WebSocket       │    2     │   4GB    │  20GB  │  1 Gbps  │
│ Subscription    │    2     │   4GB    │  20GB  │  500Mbps │
│ PostgreSQL      │    8     │  16GB    │ 500GB  │  2 Gbps  │
│ Redis           │    4     │   8GB    │  50GB  │  1 Gbps  │
│ TimescaleDB     │    4     │  16GB    │  1TB   │  2 Gbps  │
└─────────────────┴──────────┴──────────┴────────┴──────────┘

Scaling Triggers

CPU-based:

70% sustained for 5 minutes → scale up
<30% sustained for 15 minutes → scale down

Memory-based:

80% sustained for 5 minutes → scale up
<40% sustained for 15 minutes → scale down

Request-based:

100 requests/second per instance → scale up
<20 requests/second per instance → scale down

Custom Metrics:

Active trading users > 800 per Trading Engine → add instance
WebSocket connections > 1800 per instance → add instance
Database connection pool >80% utilized → add read replica

Database Scaling

PostgreSQL Scaling Strategy

Phase 1: Master + Read Replicas

┌────────────────────────────────────────────────────────────┐
│  PostgreSQL Replication                                    │
└────────────────────────────────────────────────────────────┘

                    ┌─────────────────┐
                    │  Master DB      │
                    │  (Writes Only)  │
                    │  8 vCPU, 16GB   │
                    └────────┬────────┘
                             │ Replication
                    ┌────────┴────────┐
                    │                 │
          ┌─────────▼──────┐  ┌──────▼─────────┐
          │  Read Replica  │  │  Read Replica  │
          │       1        │  │       2        │
          │  8 vCPU, 16GB  │  │  8 vCPU, 16GB  │
          └────────────────┘  └────────────────┘

Read/Write Split:
  - Writes → Master
  - Reads → Load balanced across replicas
  - Replication lag: <1 second

Phase 2: Sharding (10,000+ users)

┌────────────────────────────────────────────────────────────┐
│  Database Sharding by User ID                              │
└────────────────────────────────────────────────────────────┘

Shard 1 (Users 0-2499)          Shard 2 (Users 2500-4999)
┌─────────────────┐              ┌─────────────────┐
│  Master         │              │  Master         │
│  + 2 Replicas   │              │  + 2 Replicas   │
└─────────────────┘              └─────────────────┘

Shard 3 (Users 5000-7499)       Shard 4 (Users 7500-9999)
┌─────────────────┐              ┌─────────────────┐
│  Master         │              │  Master         │
│  + 2 Replicas   │              │  + 2 Replicas   │
└─────────────────┘              └─────────────────┘

Sharding Key: user_id % 4
Shard Router: PgPool-II or application-level

Redis Scaling

┌────────────────────────────────────────────────────────────┐
│  Redis Cluster (6 nodes)                                   │
└────────────────────────────────────────────────────────────┘

Master 1 (Slots 0-5461)         Replica 1
┌─────────────────┐              ┌─────────────────┐
│  4 vCPU, 8GB    │──────────────│  4 vCPU, 8GB    │
└─────────────────┘              └─────────────────┘

Master 2 (Slots 5462-10922)     Replica 2
┌─────────────────┐              ┌─────────────────┐
│  4 vCPU, 8GB    │──────────────│  4 vCPU, 8GB    │
└─────────────────┘              └─────────────────┘

Master 3 (Slots 10923-16383)    Replica 3
┌─────────────────┐              ┌─────────────────┐
│  4 vCPU, 8GB    │──────────────│  4 vCPU, 8GB    │
└─────────────────┘              └─────────────────┘

Features:
  - Automatic failover
  - Data sharding across masters
  - High availability

TimescaleDB (Time-Series Data)

┌────────────────────────────────────────────────────────────┐
│  TimescaleDB for Trade Analytics                           │
└────────────────────────────────────────────────────────────┘

Hypertables:
  - trades (1-day chunks)
  - positions (1-day chunks)
  - market_data (1-hour chunks)

Compression:
  - Compress data older than 7 days
  - 10x compression ratio
  - Reduces storage from 1TB → 100GB

Retention Policy:
  - Keep detailed data: 90 days
  - Keep aggregated data: 2 years
  - Delete data older than 2 years

Continuous Aggregates:
  - Hourly summaries (trades, P&L)
  - Daily summaries
  - Weekly summaries
  - Refresh every 1 hour

Caching Strategy

Multi-Level Cache

┌────────────────────────────────────────────────────────────┐
│  Cache Hierarchy                                           │
└────────────────────────────────────────────────────────────┘

Level 1: Browser Cache (Client-side)
  - Static assets (JS, CSS, images)
  - Duration: 7 days
  - Invalidation: Hash-based URLs

Level 2: CDN Cache (Cloudflare)
  - API responses (GET only)
  - Static files
  - Duration: 5 minutes (API), 7 days (static)
  - Purge on deploy

Level 3: Redis Cache (Server-side)
  - User sessions
  - Active positions
  - Market data
  - User settings
  - Duration: 30 seconds - 5 minutes

Level 4: Database Query Cache
  - PostgreSQL internal cache
  - Query results
  - Duration: Automatic

Cache Implementation

from redis import Redis
from functools import wraps
import pickle

redis_client = Redis(host='redis', port=6379, db=0)

def cache(ttl=300):
    """Cache decorator with TTL"""
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            # Generate cache key
            cache_key = f"{func.__name__}:{pickle.dumps((args, kwargs))}"

            # Try to get from cache
            cached = redis_client.get(cache_key)
            if cached:
                return pickle.loads(cached)

            # Execute function
            result = func(*args, **kwargs)

            # Store in cache
            redis_client.setex(cache_key, ttl, pickle.dumps(result))

            return result
        return wrapper
    return decorator

# Usage
@cache(ttl=60)  # Cache for 60 seconds
def get_user_positions(user_id):
    return db.query(Position).filter_by(user_id=user_id).all()

@cache(ttl=300)  # Cache for 5 minutes
def get_user_performance(user_id, period='30d'):
    return calculate_performance_metrics(user_id, period)

Cache Invalidation

def invalidate_user_cache(user_id):
    """Invalidate all cache entries for a user"""
    patterns = [
        f"get_user_positions:{user_id}*",
        f"get_user_performance:{user_id}*",
        f"get_user_stats:{user_id}*"
    ]

    for pattern in patterns:
        keys = redis_client.keys(pattern)
        if keys:
            redis_client.delete(*keys)

# Invalidate on position change
@app.post("/api/v1/trading/close-position")
def close_position(user_id, position_id):
    # Close position
    broker.close_position(position_id)

    # Invalidate cache
    invalidate_user_cache(user_id)

    return {"success": True}

Load Balancing

Layer 4 Load Balancer (HAProxy/Nginx)

┌────────────────────────────────────────────────────────────┐
│  Load Balancer Configuration                               │
└────────────────────────────────────────────────────────────┘

Algorithm: Round Robin (API), Least Connections (WebSocket)

Upstream: api_gateway
  - api-1.nija.com:8000
  - api-2.nija.com:8000
  - api-3.nija.com:8000

Health Check:
  - Endpoint: /health
  - Interval: 10 seconds
  - Timeout: 5 seconds
  - Unhealthy threshold: 3 consecutive failures

Upstream: websocket
  - ws-1.nija.com:8004
  - ws-2.nija.com:8004

Sticky Sessions: IP hash (WebSocket)

Nginx Configuration

upstream api_gateway {
    least_conn;
    server api-1.nija.com:8000 max_fails=3 fail_timeout=30s;
    server api-2.nija.com:8000 max_fails=3 fail_timeout=30s;
    server api-3.nija.com:8000 max_fails=3 fail_timeout=30s;
}

upstream websocket_servers {
    ip_hash;  # Sticky sessions
    server ws-1.nija.com:8004 max_fails=3 fail_timeout=30s;
    server ws-2.nija.com:8004 max_fails=3 fail_timeout=30s;
}

server {
    listen 443 ssl http2;
    server_name api.nija.com;

    ssl_certificate /etc/nginx/ssl/nija.com.crt;
    ssl_certificate_key /etc/nginx/ssl/nija.com.key;

    # API Gateway
    location /api/ {
        proxy_pass http://api_gateway;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Timeouts
        proxy_connect_timeout 10s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;
    }

    # WebSocket
    location /ws/ {
        proxy_pass http://websocket_servers;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;

        # Timeouts
        proxy_connect_timeout 7d;
        proxy_send_timeout 7d;
        proxy_read_timeout 7d;
    }
}

Auto-Scaling

Kubernetes Horizontal Pod Autoscaler (HPA)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-gateway-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nija-api-gateway
  minReplicas: 3
  maxReplicas: 10
  metrics:
  # CPU-based scaling
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  # Memory-based scaling
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  # Request-based scaling
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "100"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 25
        periodSeconds: 60

Trading Engine Auto-Scaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: trading-engine-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nija-trading-engine
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: External
    external:
      metric:
        name: active_trading_users
        selector:
          matchLabels:
            service: nija-trading-engine
      target:
        type: AverageValue
        averageValue: "800"  # 800 users per instance
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 120
      policies:
      - type: Pods
        value: 2
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 600
      policies:
      - type: Pods
        value: 1
        periodSeconds: 120

Geographic Distribution

Multi-Region Deployment

┌────────────────────────────────────────────────────────────┐
│  Global Infrastructure                                     │
└────────────────────────────────────────────────────────────┘

Primary Region: US-East (Virginia)
  - Full stack (API, Trading, Database)
  - PostgreSQL Master
  - 60% of traffic

Secondary Region: US-West (Oregon)
  - Full stack (API, Trading)
  - PostgreSQL Read Replica
  - 30% of traffic

Tertiary Region: EU-West (Ireland)
  - API Gateway only
  - PostgreSQL Read Replica
  - 10% of traffic

Edge Locations: Cloudflare CDN (200+ POPs)
  - Static assets
  - API caching (short TTL)
  - DDoS protection

DNS-based Geographic Routing

┌────────────────────────────────────────────────────────────┐
│  GeoDNS Routing (Route 53 / Cloudflare)                   │
└────────────────────────────────────────────────────────────┘

User Location → Nearest Region

North America → US-East (primary) or US-West (failover)
Europe → EU-West (primary) or US-East (failover)
Asia → US-West (primary) or US-East (failover)

Health Checks:
  - Check every 30 seconds
  - Failover if unhealthy
  - DNS TTL: 60 seconds

Performance Optimization

API Response Optimization

# 1. Database Query Optimization
# Bad: N+1 query problem
users = User.query.all()
for user in users:
    trades = user.trades  # Separate query for each user

# Good: Eager loading
users = User.query.options(joinedload(User.trades)).all()

# 2. Pagination
# Bad: Loading all data
trades = Trade.query.all()  # Could be 100,000+ rows

# Good: Paginated
trades = Trade.query.paginate(page=1, per_page=50)

# 3. Field selection
# Bad: Returning entire objects
return trades

# Good: Return only needed fields
return [
    {
        'id': t.id,
        'symbol': t.symbol,
        'pnl': t.pnl
    }
    for t in trades
]

WebSocket Optimization

# Use binary protocol (MessagePack) instead of JSON
import msgpack

@websocket.on('position_update')
def send_position_update(position):
    # JSON: ~500 bytes
    # json_data = json.dumps(position)

    # MessagePack: ~300 bytes (40% smaller)
    msgpack_data = msgpack.packb(position)

    websocket.send(msgpack_data, binary=True)

# Batch updates
updates_buffer = []

@websocket.on_tick()
def buffer_updates():
    if len(updates_buffer) >= 10 or time_elapsed > 1.0:
        websocket.send_batch(updates_buffer)
        updates_buffer.clear()

Database Connection Pooling

from sqlalchemy.pool import QueuePool

engine = create_engine(
    DATABASE_URL,
    poolclass=QueuePool,
    pool_size=20,          # Max 20 connections per instance
    max_overflow=10,       # Allow 10 more during peak
    pool_timeout=30,       # Wait 30s for connection
    pool_recycle=3600,     # Recycle connections every hour
    pool_pre_ping=True     # Check connection before use
)

Cost Optimization

Resource Optimization

┌────────────────────────────────────────────────────────────┐
│  Monthly Cost Breakdown (AWS)                              │
├────────────────────────────────────────────────────────────┤
│ 1,000 users:                                               │
│   - Compute (EC2):        $800/month                       │
│   - Database (RDS):       $400/month                       │
│   - Redis (ElastiCache):  $200/month                       │
│   - CDN (CloudFront):     $100/month                       │
│   - Load Balancer:        $50/month                        │
│   Total: $1,550/month ($1.55/user/month)                   │
│                                                            │
│ 5,000 users (with optimizations):                         │
│   - Compute:              $2,500/month                     │
│   - Database:             $1,200/month                     │
│   - Redis:                $500/month                       │
│   - CDN:                  $300/month                       │
│   - Load Balancer:        $100/month                       │
│   Total: $4,600/month ($0.92/user/month)                   │
│                                                            │
│ 10,000 users (fully optimized):                           │
│   - Compute:              $4,500/month                     │
│   - Database:             $2,000/month                     │
│   - Redis:                $800/month                       │
│   - CDN:                  $500/month                       │
│   - Load Balancer:        $150/month                       │
│   Total: $7,950/month ($0.80/user/month)                   │
└────────────────────────────────────────────────────────────┘

Cost Saving Strategies

1. Reserved Instances

Purchase 1-year reserved instances
Save 30-40% on compute costs
Use for stable baseline capacity

2. Spot Instances

Use for non-critical workloads
Save up to 90% on compute
Good for backtesting, analytics

3. Right-Sizing

Monitor actual resource usage
Downsize over-provisioned instances
Typical savings: 20-30%

4. Storage Optimization

Use S3 for old backups (cheaper)
Compress time-series data
Archive logs after 90 days

5. CDN Optimization

Aggressive caching for static assets
Reduce origin requests
Typical savings: 40-60% on bandwidth

Disaster Recovery

Backup Strategy

┌────────────────────────────────────────────────────────────┐
│  Backup Schedule                                           │
├────────────────────────────────────────────────────────────┤
│ Database:                                                  │
│   - Full backup: Daily at 2 AM UTC                        │
│   - Incremental: Every 6 hours                            │
│   - Transaction log: Continuous (every 5 minutes)          │
│   - Retention: 30 days                                     │
│                                                            │
│ Redis:                                                     │
│   - AOF (Append-Only File): Continuous                    │
│   - RDB snapshot: Every 6 hours                           │
│   - Retention: 7 days                                      │
│                                                            │
│ Code & Config:                                            │
│   - Git repository (GitHub)                               │
│   - Docker images (ECR/Docker Hub)                        │
│   - Infrastructure as Code (Terraform)                    │
└────────────────────────────────────────────────────────────┘

Disaster Recovery Plan

RTO (Recovery Time Objective): 15 minutes RPO (Recovery Point Objective): 5 minutes

Scenario 1: Database Failure

Automatic failover to read replica (30 seconds)
Promote replica to master (2 minutes)
Update DNS/connection strings (3 minutes)
Total downtime: ~5 minutes

Scenario 2: Region Failure

Route 53 health check detects failure (30 seconds)
DNS failover to secondary region (60 seconds)
Manual intervention to promote replica (5 minutes)
Total downtime: ~7 minutes

Scenario 3: Complete Data Loss

Restore from latest backup (10 minutes)
Replay transaction logs to minimize data loss (5 minutes)
Total downtime: ~15 minutes
Data loss: <5 minutes

Monitoring & Alerts

┌────────────────────────────────────────────────────────────┐
│  Alert Rules (PagerDuty / Slack)                           │
├────────────────────────────────────────────────────────────┤
│ Critical (Page immediately):                               │
│   - Service down (any region)                             │
│   - Database down                                         │
│   - Error rate >5%                                        │
│   - API p95 latency >1s                                   │
│                                                            │
│ Warning (Slack notification):                             │
│   - CPU >80% for 10 minutes                               │
│   - Memory >85%                                           │
│   - Disk >90%                                             │
│   - Error rate >1%                                        │
│                                                            │
│ Info (Log only):                                          │
│   - Deployment completed                                  │
│   - Auto-scaling event                                    │
│   - Scheduled maintenance                                 │
└────────────────────────────────────────────────────────────┘

Summary

Scaling Roadmap

Q1 2026: Phase 1 (0-1,000 users)

✅ Single-region deployment (Railway/Render)
✅ Basic monitoring (Prometheus + Grafana)
✅ Manual scaling

Q2 2026: Phase 2 (1,000-5,000 users)

Move to Kubernetes (AWS EKS / GCP GKE)
Implement auto-scaling
Add read replicas
Multi-region deployment (US-East + US-West)

Q3 2026: Phase 3 (5,000-10,000 users)

Database sharding
Redis cluster
Global CDN (Cloudflare)
Edge locations (EU, Asia)

Q4 2026: Phase 4 (10,000+ users)

Multi-cloud (AWS + GCP)
Advanced auto-scaling (ML-powered)
Edge computing
Full global distribution

Key Metrics to Monitor

Business Metrics:
  - Active users
  - MRR/ARR
  - Churn rate
  - LTV:CAC

Technical Metrics:
  - Uptime (target: 99.9%)
  - API latency (p50, p95, p99)
  - Error rate
  - Request throughput

Infrastructure Metrics:
  - CPU utilization
  - Memory utilization
  - Disk I/O
  - Network bandwidth
  - Database connections

Cost Metrics:
  - Cost per user
  - Cost per request
  - Infrastructure spend vs revenue

Version: 2.0 Last Updated: January 29, 2026 Maintained By: NIJA Infrastructure Team