ZNYThe Complete Guide to Database Sharding Strategies in 2026 When vertical scaling hits its...
When vertical scaling hits its ceiling, sharding is the next step. But getting it wrong means migrating data under pressure with zero margin for error.
Signs you need sharding:
Single primary DB CPU > 70% sustained
DB disk I/O at capacity
Replication lag > 1 second
Table exceeds 500GB with heavy writes
Sharding too early adds complexity; too late means downtime.
Split by ID ranges or date:
-- Shard by user_id range
CREATE TABLE users_0 (CHECK (user_id BETWEEN 0 AND 999999)) INHERITS (users);
CREATE TABLE users_1 (CHECK (user_id BETWEEN 1000000 AND 1999999)) INHERITS (users);
Pros: Simple, sequential reads are efficient
Cons: Hot spots on newer shards, uneven distribution
def get_shard(user_id: int, num_shards: int = 4) -> int:
return hash(str(user_id)) % num_shards
# Consistent hashing for reshard without full remapping
def consistent_hash(key: str, nodes: list, replicas: int = 150) -> str:
circle = {}
for node in nodes:
for i in range(replicas):
circle[hash(f"{node}:{i}")] = node
sorted_keys = sorted(circle.keys())
return circle[min(sorted_keys, key=lambda k: abs(k - hash(key)))]
Pros: Even distribution, no hot spots
Cons: Range queries span multiple shards
Lookup table maps keys to shards:
CREATE TABLE shard_directory (
entity_type VARCHAR(50),
entity_id BIGINT,
shard_id INT,
PRIMARY KEY (entity_type, entity_id)
Pros: Flexible, can change assignment without rebalancing
Cons: Directory becomes a single point of failure
The hardest problem. Solutions:
Scatter-gather: Query all shards, merge results (slow)
Denormalization: Store redundant copies (complex updates)
Search engine: Use Elasticsearch for queries, DB for writes
Read replicas: Dedicated reporting replica that receives all data
Live resharding:
# Dual-write during transition
def write_sharded(key, value, phase="migrate"):
if phase in ("migrate", "write"):
old_shard = get_old_shard(key)
write_to_shard(old_shard, key, value)
if phase in ("migrate", "write"):
new_shard = get_new_shard(key)
write_to_shard(new_shard, key, value)
Citus: Native horizontal sharding for PostgreSQL
PostgreSQL FDW: Foreign data wrappers for cross-shard queries
pg_partman: Automatic partition management
Sharding is a last resort. Exhaust caching, read replicas, and vertical scaling first. When you do shard, start with hash-based and plan for at least 2x your expected data.
Need a simpler scaling path? Systeme.io handles infrastructure scaling automatically — so you can focus on your application logic.
This article contains affiliate links. If you sign up through the links above, I may earn a commission at no additional cost to you.
Get started with Systeme.io for free — All-in-one platform for building your online business with AI tools.