System Design: Real-Time Stock Trading Platform on AWS
What We’re Building
Building A real-time stock trading platform handling millions of concurrent users, process thousands of trades per second, deliver market data with sub-millisecond latency, and guarantee zero data loss—all while remaining available 24/7.
In this post, we walk through the full system design in following steps: Requirements, Capacity Estimation, API Design, Data modeling, High-level Architecture, Scalability Strategy—all grounded in AWS-native services.
Phase 1: Requirements
Functional Requirements
- Market Data Feed: Stream real-time stock prices, bid/ask spreads, and order book updates to users
- Order Management: Allow users to place, modify, and cancel buy/sell orders (market, limit, stop)
- Order Matching: Match buy and sell orders using a price-time priority engine
- Portfolio Management: Track user holdings, P&L, and transaction history
- User Authentication: Secure login, KYC verification, and session management
- Notifications: Real-time alerts for order fills, price thresholds, and account events
- Historical Data: Access to OHLCV (Open, High, Low, Close, Volume) data for charting
Non-Functional Requirements
- Latency: Order placement to acknowledgment under 10ms (P99); market data updates under 50ms end-to-end
- Throughput: Handle 100,000 orders per second at peak; ingest 1 million market data ticks per second
- Availability: 99.99% uptime during market hours (6.5 hours/day, 5 days/week)
- Consistency: Strong consistency for order state and account balances (ACID guarantees)
- Durability: Zero trade data loss; every order event must be persisted
- Scalability: Horizontal scaling to support 10x growth without redesign
- Security: End-to-end encryption, regulatory compliance (MiFID II, SEC Rule 15c3-5)
Out of Scope
- Cryptocurrency trading
- Margin lending and options/derivatives
- High-frequency trading (HFT) co-location infrastructure
Phase 2: Capacity Estimation
User Scale
| Metric | Value |
|---|---|
| Total registered users | 5,000,000 |
| Daily Active Users (DAU) | 500,000 |
| Peak concurrent users | 100,000 |
| Active sessions (market open) | 50,000 |
Traffic Estimation
Order Traffic:
- Average orders per user per day: 5
- Total daily orders: 500,000 × 5 = 2.5 million orders/day
- Average orders per second (QPS): 2.5M / 86,400 ≈ ~29 orders/sec
- Peak QPS (market open surge, 10x factor): ~290 orders/sec
- Design target with safety margin: 1,000 orders/sec
Market Data Feed:
- Tracked symbols: 10,000
- Tick updates per symbol per second: 10
- Total ticks/sec: 100,000 ticks/sec
- Peak (volatile sessions): 1,000,000 ticks/sec
Read Traffic:
- Portfolio reads: 500,000 users × 10 reads/day = 5M reads/day → ~58 reads/sec
- Price quote lookups: 500,000 users × 50 lookups/day = 25M/day → ~290 reads/sec
- Peak read QPS (combined): ~5,000 reads/sec
Storage Estimation
Order Data:
- Average order record size: 512 bytes
- Daily orders: 2.5 million
- Daily order storage: 2.5M × 512B ≈ ~1.2 GB/day
- 5-year retention: 1.2 GB × 365 × 5 ≈ ~2.2 TB
Market Tick Data:
- Average tick size: 128 bytes
- Daily ticks: 100,000 ticks/sec × 23,400 market seconds ≈ 2.3 billion ticks
- Daily storage: 2.3B × 128B ≈ ~295 GB/day
- 1-year hot storage + 10-year cold archival on S3 Glacier
User & Portfolio Data:
- Average user record: 2 KB
- 5 million users: ~10 GB (negligible)
Bandwidth Estimation
- Inbound (orders): 1,000 orders/sec × 512B = ~500 KB/sec
- Outbound (market data push): 50,000 concurrent users × 5 updates/sec × 128B = ~32 MB/sec
- Peak outbound: ~320 MB/sec (factoring 10x burst)
Infrastructure Estimates
| Component | Count | Justification |
|---|---|---|
| API Gateway + Load Balancer | 3 (multi-AZ) | 5,000 peak read/write QPS |
| Order Service (ECS) | 10 tasks | 1,000 orders/sec with headroom |
| Market Data Service (ECS) | 20 tasks | 1M ticks/sec fan-out |
| Matching Engine (EC2 c6i) | 2 (primary + hot standby) | Single-threaded, low-latency |
| Redis Cluster (ElastiCache) | 6 nodes (3 primary, 3 replica) | 5,000 read/sec, sub-ms latency |
| Amazon MSK (Kafka) | 6 brokers | 1M events/sec durability |
Phase 3: API Design
REST APIs (via Amazon API Gateway)
Order Management
POST /v1/orders
Body: { symbol, side, type, quantity, price?, timeInForce }
Response: { orderId, status, timestamp }
GET /v1/orders/{orderId}
Response: { orderId, symbol, side, status, filledQty, avgPrice }
DELETE /v1/orders/{orderId}
Response: { orderId, status: "CANCELLED" }
GET /v1/orders?status=OPEN&symbol=AAPL
Response: { orders: [...] }
Portfolio
GET /v1/portfolio
Response: { holdings: [...], cashBalance, totalValue, dayPnL }
GET /v1/portfolio/history?from=2026-01-01&to=2026-03-01
Response: { trades: [...] }
Market Data (REST)
GET /v1/quotes/{symbol}
Response: { symbol, bid, ask, last, volume, timestamp }
GET /v1/history/{symbol}?interval=1d&from=...&to=...
Response: { candles: [{ open, high, low, close, volume, timestamp }] }
WebSocket API (via AWS API Gateway WebSocket)
Real-time price streaming and order status updates are delivered via persistent WebSocket connections.
// Subscribe to price updates
{ "action": "subscribe", "symbols": ["AAPL", "TSLA", "NVDA"] }
// Server push — market data tick
{ "type": "TICK", "symbol": "AAPL", "price": 213.45, "bid": 213.44, "ask": 213.46, "ts": 1741180800000 }
// Server push — order fill notification
{ "type": "ORDER_UPDATE", "orderId": "ord_123", "status": "FILLED", "avgPrice": 213.45, "filledQty": 10 }
Phase 4: Data Modeling
Order Table (Amazon Aurora PostgreSQL)
CREATE TABLE orders (
order_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL,
symbol VARCHAR(10) NOT NULL,
side ENUM('BUY', 'SELL') NOT NULL,
order_type ENUM('MARKET', 'LIMIT', 'STOP') NOT NULL,
status ENUM('PENDING', 'OPEN', 'PARTIALLY_FILLED', 'FILLED', 'CANCELLED') NOT NULL,
quantity DECIMAL(18, 8) NOT NULL,
filled_qty DECIMAL(18, 8) DEFAULT 0,
price DECIMAL(18, 4),
avg_fill_price DECIMAL(18, 4),
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_symbol ON orders(symbol, status);
CREATE INDEX idx_orders_created ON orders(created_at DESC);
Portfolio / Holdings (Amazon DynamoDB)
{
"PK": "USER#user_id_xyz",
"SK": "HOLDING#AAPL",
"symbol": "AAPL",
"quantity": 150,
"avgCostBasis": 198.32,
"lastUpdated": "2026-03-05T10:30:00Z"
}
DynamoDB is chosen for portfolio reads due to its single-digit millisecond latency at scale and ability to handle 50,000 concurrent reads without capacity planning headaches.
Market Tick Data (Amazon Timestream)
measure_name: "price_tick"
dimensions: { symbol: "AAPL", exchange: "NASDAQ" }
time: 2026-03-05T14:30:00.123Z
measures: { price: 213.45, bid: 213.44, ask: 213.46, volume: 1240 }
Amazon Timestream is purpose-built for time-series data, providing automatic tiering from in-memory hot storage to magnetic storage for historical data—ideal for the billions of daily ticks generated by this platform.
Order Book (Amazon ElastiCache for Redis)
Key: ORDERBOOK:AAPL:BUY (Sorted Set — price as score)
Key: ORDERBOOK:AAPL:SELL (Sorted Set — price as score)
ZADD ORDERBOOK:AAPL:BUY 213.44 "order_id_abc|qty:100"
ZRANGE ORDERBOOK:AAPL:BUY 0 9 WITHSCORES # Top 10 bids
The in-memory order book in Redis enables the matching engine to retrieve the best bid/ask in O(log N) time, a core requirement for low-latency matching.
Phase 5: High-Level Architecture
Core Components
┌─────────────────────────────────────────────────────────────────┐
│ AWS Cloud (Multi-AZ) │
│ │
│ ┌─────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ Route53 │───▶│ CloudFront + │───▶│ API Gateway │ │
│ │ DNS │ │ WAF │ │ (REST + WebSocket) │ │
│ └─────────┘ └──────────────┘ └───────────┬──────────┘ │
│ │ │
│ ┌────────────────────┬──────────────────┘ │
│ ▼ ▼ │
│ ┌─────────────┐ ┌────────────────┐ │
│ │ Order │ │ Market Data │ │
│ │ Service │ │ Service (ECS) │ │
│ │ (ECS + ALB) │ │ │ │
│ └──────┬──────┘ └───────┬────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────────────────────────────────┐ │
│ │ Amazon MSK (Kafka) │ │
│ │ Topics: orders, trades, ticks, notifications│ │
│ └──────────────┬──────────────────────────────┘ │
│ │ │
│ ┌────────────┼──────────────┐ │
│ ▼ ▼ ▼ │
│ ┌───────────┐ ┌───────────┐ ┌──────────────────┐ │
│ │ Matching │ │ Portfolio │ │ Notification │ │
│ │ Engine │ │ Service │ │ Service │ │
│ │ (EC2 c6i) │ │ (ECS) │ │ (Lambda + SNS) │ │
│ └─────┬─────┘ └─────┬─────┘ └──────────────────┘ │
│ │ │ │
│ ┌────▼────┐ ┌────▼──────────────────────────────────┐ │
│ │ Redis │ │ Aurora PostgreSQL │ DynamoDB │ │
│ │ (Order │ │ (Orders, Trades) │ (Portfolio, │ │
│ │ Book) │ │ │ User Data) │ │
│ └─────────┘ └─────────────────────┴───────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Amazon Timestream (Tick History + OHLCV Data) │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ S3 + Glacier (Long-term Trade Archive, Compliance) │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
AWS Services Breakdown
Compute & Orchestration
- Amazon ECS (Fargate): Runs Order Service, Market Data Service, Portfolio Service as containerized microservices with auto-scaling
- EC2 c6i instances: Dedicated compute-optimized instances for the matching engine, requiring consistent low-latency CPU performance rather than serverless variability
- AWS Lambda: Event-driven notification delivery, trade confirmations, and compliance event logging
Messaging & Streaming
- Amazon MSK (Managed Kafka): The central event bus. All order events, trade fills, and market ticks flow through MSK topics, providing durability, replay capability, and decoupling between services
- Amazon SNS + SQS: Fan-out push notifications (email, SMS, mobile push) for order fills and alerts
Data Storage
- Amazon Aurora PostgreSQL (Multi-AZ): ACID-compliant storage for orders and trades—the source of truth for all financial transactions
- Amazon DynamoDB: Sub-millisecond portfolio reads with global tables for multi-region active-active reads
- Amazon ElastiCache for Redis: In-memory order book for the matching engine; also serves as session cache and leaderboard
- Amazon Timestream: Time-series market tick data with automatic data lifecycle management
- Amazon S3 + Glacier: Long-term archival of trade history and compliance reports at low cost
Networking & Edge
- Amazon CloudFront + AWS WAF: CDN for static assets; WAF protects against DDoS, SQL injection, and rate-limit abuse
- Amazon Route 53: DNS with health check-based failover between regions
- API Gateway (WebSocket): Manages 50,000+ concurrent WebSocket connections for real-time market data push
Security & Compliance
- AWS IAM + Cognito: Authentication, authorization, and user identity management
- AWS KMS: Encryption at rest for all sensitive financial data
- AWS CloudTrail + Security Hub: Audit trail for every API call; required for SEC and MiFID II compliance
Phase 6: Deep Dive — The Matching Engine
The matching engine is the most critical and latency-sensitive component of the system. It must process orders in strict price-time priority.
Design Principles
- Single-threaded per symbol: Each stock symbol runs its own matching engine thread to eliminate lock contention. A single c6i instance can handle 500+ symbols with independent threads.
- In-memory order book: The buy and sell queues live entirely in RAM (Redis Sorted Sets for the distributed view, plus local in-process maps for the hot path).
- Event-sourced state: Every state change (order placed, matched, cancelled) is published to MSK. The engine can reconstruct any historical order book state by replaying events.
Order Flow
1. User submits order via REST API
2. Order Service validates (sufficient balance, valid symbol, risk checks)
3. Order Service publishes ORDER_RECEIVED event to MSK topic: "orders"
4. Matching Engine consumes event, checks order book
5a. If match found → publish TRADE_EXECUTED to MSK "trades" topic
5b. If no match → place order in Redis order book, publish ORDER_OPEN event
6. Portfolio Service consumes TRADE_EXECUTED → update holdings, debit/credit balance
7. Notification Service consumes TRADE_EXECUTED → push WebSocket notification to user
8. Aurora write: persist final order state and trade record
Matching Algorithm
def match_order(incoming_order, order_book):
if incoming_order.side == "BUY":
best_asks = order_book.get_asks() # sorted ascending by price
for ask in best_asks:
if ask.price <= incoming_order.price: # limit price check
execute_trade(incoming_order, ask)
if incoming_order.is_filled():
break
# Remaining unfilled qty → rests on order book
if incoming_order.remaining_qty > 0:
order_book.add(incoming_order)
Phase 7: Scalability & Reliability Strategy
Horizontal Scaling
- Order Service: ECS task auto-scaling based on CPU > 70% or SQS queue depth > 1,000 messages. Scales from 5 to 50 tasks within 90 seconds.
- Market Data Fan-out: Kafka consumer groups allow adding more market data service tasks without rebalancing risk—each task picks up additional symbol partitions automatically.
- WebSocket connections: API Gateway WebSocket handles connection state, removing the need for sticky sessions on the application layer. Connection state is stored in DynamoDB, allowing any backend task to push to any client.
Fault Tolerance
- Multi-AZ deployment: All ECS services, Aurora (synchronous standby), and MSK brokers span 3 Availability Zones. An AZ failure causes < 30 seconds of disruption.
- Matching engine hot standby: A passive replica consumes all events from MSK and maintains a synchronized in-memory order book, ready to take over within 5 seconds via Route 53 DNS failover.
- Circuit breakers: Each microservice implements circuit breaker patterns using AWS App Mesh, preventing cascade failures when downstream services are degraded.
Multi-Region Strategy
For disaster recovery and reduced latency for global users:
| Region | Role | RPO | RTO |
|---|---|---|---|
| us-east-1 | Primary (active) | 0 seconds | — |
| us-west-2 | Hot standby | < 5 seconds | < 30 seconds |
| eu-west-1 | Read replica + EU users | < 10 seconds | < 60 seconds |
DynamoDB Global Tables provide active-active replication for portfolio data. Aurora Global Database replicates to the standby region with < 1 second lag. MSK MirrorMaker 2 keeps Kafka topics synchronized across regions.
Caching Strategy
| Layer | Technology | TTL | Purpose |
|---|---|---|---|
| Quote cache | Redis | 500ms | Latest prices for REST polling |
| Order book snapshot | Redis | Real-time | Matching engine hot path |
| Historical OHLCV | CloudFront | 1 minute | Chart data CDN edge caching |
| User session | ElastiCache | 30 minutes | Auth token validation |
Phase 8: Monitoring & Observability
A trading platform demands more than basic uptime monitoring—every millisecond of latency and every failed order must be captured and acted upon.
- Amazon CloudWatch: Custom dashboards for order throughput (orders/sec), matching latency (P50/P95/P99), and Kafka consumer lag
- AWS X-Ray: Distributed tracing across microservices to pinpoint latency sources in the order flow
- Service Level Objectives (SLOs): Automated alarms trigger when P99 order acknowledgment latency exceeds 10ms or order fill rate drops below 99.5%
- Dead Letter Queues: Every MSK consumer sends failed messages to SQS DLQs, ensuring no trade event is silently lost
- Compliance Audit Trail: CloudTrail logs every API call; AWS Config tracks infrastructure changes—both required for regulatory reporting
Conclusion: Lessons from the Design
Designing a real-time stock trading platform forces every hard trade-off in distributed systems into the spotlight. Strong consistency (Aurora ACID for trades) versus high availability (DynamoDB for portfolios). Low latency (Redis order book) versus durability (MSK event log). Single-threaded simplicity (matching engine) versus horizontal scalability (everything else).
The architecture presented here leans into AWS managed services to minimize operational overhead while meeting the stringent latency, consistency, and compliance requirements of financial markets. The event-driven backbone through Amazon MSK is the key architectural decision—it decouples services, provides replay capability for disaster recovery, and creates a natural audit trail for every trade.
The best trading platforms are not just fast—they are provably correct, observable at every layer, and designed to fail gracefully. Build those properties in from day one, and scale becomes an engineering problem rather than a business risk. 🚀



