Design Uber/Ride Sharing
Uber কেন Unique? Real-Time Geospatial Challenge
Uber-এ সবচেয়ে কঠিন সমস্যা হলো real-time geospatial matching — কোটি কোটি moving drivers-এর location track করুন এবং milliseconds-এ কাছের driver-কে rider-এর সাথে match করুন। এটা regular database problem না — এটা location-aware computing।
📌 Core Challenge
Geospatial Problem: Driver location প্রতি 4 sec update হয়। Rider request করলেন 5km radius-এর মধ্যে available drivers খুঁজতে হবে। Standard SQL database-এ lat/lng দিয়ে এই query করা impractical। Special data structure দরকার — Geohash বা QuadTree।
Uber Architecture Overview — High Level Flow
Features কী কী?
✅ Functional Requirements
- →Rider: ride request করা
- →Driver: nearby riders দেখা
- →Real-time location tracking
- →Driver-Rider matching
- →Fare calculation
- →Surge pricing
- →Trip history
- →Payment processing
⚡ Non-Functional Requirements
- →10M+ daily active users
- →Location update every 4 sec
- →Match within 1 minute
- →1km precision
- →Low latency (< 200ms match)
- →99.99% availability
Uber-এর Numbers — Back-of-Envelope
🔢 Location Update Load
5M active drivers × 1 update/4sec = 1.25 million location updates/second। এটা massive write throughput। Standard relational DB handle করতে পারবেন না। Specialized time-series storage দরকার।
Storage Calculation
Uber Architecture — Services Breakdown
Uber microservices-এ built। প্রতিটা core function আলাদা service। Key services হলো: Location Service, Geospatial Index, Trip Service, Matching Service, Surge Pricing, Notification, এবং Payment।
Driver App — GPS Update (প্রতি 4 sec)
Driver app WebSocket connection maintain করে। প্রতি 4 second এ lat/lng Location Service-এ push করে। Location Service Redis GEO-তে update করে।
Rider App — Ride Request
Rider pickup/destination enter করে। Trip Service request receive করে। Geospatial Index query করে 5km radius-এর available drivers খোঁজে।
Matching Service — Best Driver Select
Nearby drivers-এর list থেকে multi-factor scoring করে best driver বেছে নেয়। Distance, rating, acceptance rate — সব factor consider করে। WebSocket দিয়ে driver-কে request push।
Driver Response — Accept/Decline (15 sec window)
Driver 15 sec-এর মধ্যে accept/decline করে। Accept = trip starts। Decline/timeout = next best driver try। Driver acceptance rate track হয় future matching-এ।
Trip in Progress — Real-time Tracking
Trip active থাকলে driver location rider app-এ real-time দেখায়। Surge pricing continuously calculate হয়। Trip complete হলে Payment service charge করে।
Tech Stack
Core Services
Geospatial & Data
Real-time & Analytics
Geohash — Location Encode করার উপায়
পৃথিবীর যেকোনো location-কে একটা string-এ encode করা যায় — এটাই Geohash। Geohash-এর length precision নির্ধারণ করে।
| Geohash Length | Cell Size (approx) | Use Case |
|---|---|---|
| 1 character | 5000 × 5000 km | Country level |
| 4 characters | 40 × 20 km | City level |
| 6 characters | 1.2 × 0.6 km | Neighborhood (Uber uses ~6) |
| 8 characters | 38 × 19 meters | Street level |
| 12 characters | ~3.7 cm | Exact location |
📌 Geohash Magic
Dhaka-র একটা location: hsgtn4। কাছের locations-এর geohash প্রায় same prefix হয়। "hsgtn" prefix-এর সব drivers = Dhaka-র ওই neighborhood-এ। Database-এ string prefix query করলেনই nearby drivers পানয়া যায়।
import redis
r = redis.Redis()
# Driver location update (প্রতি 4 sec)
def update_driver_location(driver_id: str, lat: float, lng: float):
# Redis GEO: GEOADD key longitude latitude member
r.geoadd("drivers:active", {driver_id: (lng, lat)})
# Driver status update
r.hset(f"driver:{driver_id}", mapping={
"lat": lat, "lng": lng,
"status": "available",
"updated_at": now()
})
# Rider request — nearby drivers খুঁজুন
def find_nearby_drivers(rider_lat: float, rider_lng: float, radius_km: float = 5):
# Redis GEORADIUS: center থেকে radius-এর মধ্যে সব drivers
nearby = r.georadius(
"drivers:active",
rider_lng, rider_lat,
radius_km,
unit='km',
withcoord=True,
withdist=True,
sort='ASC', # Closest first
count=10 # Top 10 closest
)
return [
{"driver_id": d[0], "distance_km": d[1], "location": d[2]}
for d in nearby
]
# Example output:
# [{"driver_id": "D123", "distance_km": 0.8, "location": (90.4, 23.8)},
# {"driver_id": "D456", "distance_km": 1.2, ...}]💡 QuadTree vs Geohash
QuadTree: Map-কে 4 quadrants-এ ভাগ করুন, প্রতিটাকে আবার 4 ভাগ। Dense area = deeper split। Driver খুঁজতে tree traverse করুন।
Geohash: Location string-এ encode করুন। Prefix = neighborhood। Simpler implementation।
Uber internally custom Geohash-based system (H3) use করে।
Geohash Visualization — Dhaka City
Driver কীভাবে Select করা হয়?
শুধু কাছের driver খুঁজলেই হবে না — সেরা driver বেছে নিতে হবে। Uber matching algorithm multiple factors consider করে।
| Factor | Weight | কেন? |
|---|---|---|
| Distance to rider | Most important | Faster pickup |
| Driver rating | High | Quality assurance |
| Car type match | Critical | UberX vs UberXL |
| Driver acceptance rate | Medium | Reliable drivers first |
| ETA accuracy | Medium | Historical performance |
def match_driver(rider_request, nearby_drivers):
def score_driver(driver):
# Multi-factor scoring
distance_score = 1.0 / (driver["distance_km"] + 0.1)
rating_score = driver["rating"] / 5.0
acceptance_score = driver["acceptance_rate"] / 100.0
# Weighted final score
return (
distance_score * 0.5 + # 50% weight — distance most important
rating_score * 0.3 + # 30% weight
acceptance_score * 0.2 # 20% weight
)
# Car type filter করুন আগে
compatible = [
d for d in nearby_drivers
if d["car_type"] == rider_request["ride_type"]
]
if not compatible:
raise NoDriverAvailable()
# Best score = best match
best_driver = max(compatible, key=score_driver)
# Driver-কে request push করুন (WebSocket)
websocket.push(best_driver["driver_id"], {
"type": "RIDE_REQUEST",
"rider": rider_request,
"expires_in": 15 # 15 seconds to accept
})
return best_driver⚠️ Surge Pricing Formula
Surge multiplier = f(demand / supply)। একটা geohash cell-এ open_requests / available_drivers। Ratio বেশি = high surge। Dhaka rush hour-এ 2x-3x surge হতে পারে।
Price বাড়লে: (ক) More drivers come → supply বাড়ে, (খ) Some riders wait → demand কমে। Equilibrium achieve হয়।
WebSocket + Kafka — Real-time Communication
Uber-এর core real-time infrastructure দুটো technology-র উপর নির্ভর করে: WebSocket (bidirectional communication) এবং Apache Kafka (event streaming)।
Real-Time Data Flow — Location Updates
📌 WebSocket কেন HTTP Polling নয়?
HTTP Polling: Driver app প্রতি second check করলেন = 5M drivers × 1 req/sec = 5M unnecessary requests/sec।
WebSocket: Server যখন দরকার push করে। Driver app persistent connection রাখে। Zero polling overhead, instant delivery।
💡 Kafka কোথায় ব্যবহার হয়?
Location updates Kafka topic-এ publish হয়। Multiple consumers: (১) Redis GEO update করে, (২) Analytics pipeline-এ যায়, (৩) Surge pricing calculation service listen করে। Decoupled, scalable, replay করা যায়।
কোন Data কোথায়? Scale করার উপায়
| Data | Database | Why? |
|---|---|---|
| Driver real-time location | Redis GEO | In-memory, GEO commands, fast update |
| Trip data (active) | Redis / Cassandra | Hot data, fast access |
| Trip history | Cassandra | Time-series, append-only, massive scale |
| User/Driver profiles | PostgreSQL | Structured, ACID |
| Payments | PostgreSQL | Financial, ACID |
| Maps/Routes | Google Maps API | External service |
| Analytics/ML | Hadoop + Spark | Batch processing, model training |
Scaling Strategies
City-based Sharding: Uber globally 63 countries-এ। প্রতিটা city আলাদাভাবে shard করুন। Dhaka-র requests Dhaka shard-এ যায়, London-এর London-এ। Cross-city matching needed না।
Redis GEO for Location: 5M drivers × 1 update/4sec = 1.25M writes/sec। Redis in-memory, millisecond writes, GEO commands built-in। Perfect fit।
WebSocket for Real-time: Driver app-এ persistent WebSocket connection। Ride request, matching notification, trip updates — সব push। Polling impractical।
ETA Accuracy: Real-time traffic data + historical patterns। Google Maps API expensive at scale। Uber নিজেই map data build করেছেনে (Uber Movement)।
Driver Ghosting: Driver accept করে তারপর না গেলে? Timeout + automatic reassign। Penalty system। Acceptance rate track করুন matching algorithm-এ।
🎯 Interview Tips — Uber System Design
1) সবার আগে বলুন: "This is a geospatial problem" — interviewer impress হবে।
2) Redis GEO এর GEOADD এবং GEORADIUS command mention করুন।
3) City-based sharding naturally আসে — explain করুন কেন cross-city data join দরকার নেই।
4) Driver ghosting problem এবং solution mention করলেন senior-level knowledge দেখা যায়।
5) Surge pricing formula (demand/supply ratio) confidently explain করুন।
SUMMARY — আজকে যা শিখলাম
| Geospatial Approach | কীভাবে কাজ করে | Uber Use Case | Pros | Cons |
|---|---|---|---|---|
| Geohash | lat/lng → string encode। Prefix = proximity | 6-char ≈ 1.2km cell। Prefix query করুন। | Simple, string DB query, Redis built-in | Boundary problem — edge cases |
| QuadTree | Map → 4 quadrants recursively। Dense = deeper। | Alternative to Geohash। Tree traverse। | Adaptive density, dynamic split | Complex implementation, rebalancing |
| S2 Cells (Google) | Sphere → 2D projection → hierarchical cells | Used by Google Maps, ridesharing | Uniform coverage, great for global | Complex math, library dependency |
| H3 (Uber custom) | Hexagonal grid। Equal area hexagons। | Uber-এর actual system। Surge pricing। | Uniform neighbor distance, no corner distortion | Uber-specific, learning curve |
| Redis GEO | Geohash internally। GEOADD/GEORADIUS commands। | 1.25M writes/sec। In-memory। Production use। | Built-in commands, millisecond latency | Memory cost, single node limit |
🎉 Phase 4 সম্পন্ন! Real-World Systems Mastered
আপনি ৭টি বড় real-world system design সম্পন্ন করেছেনো। এগুলো সবচেয়ে popular interview questions।
Phase 5: Advanced Topics
Security in System Design · Observability & Monitoring · CI/CD & DevOps · Cost Optimization · Multi-Region Architecture · Performance Engineering