Database – TechSpark

WHY

In the high-stakes world of database management, sometimes the healthiest relationship is one with boundaries. If your database is starting to feel overwhelmed, sluggish, or just plain unmanageable, it might be time for the “we need to talk” conversation. But don’t worry—this isn’t a breakup; it’s a strategic restructuring that will make your relationship with your data stronger.

WHAT

Think of partitioning as sharding’s more localized cousin — a way to break up data within a single database or server to improve performance, maintainability, and query efficiency. While sharding is about splitting data across multiple nodes, partitioning is about organizing data smarter within the same node. Database partitioning is the practice of dividing a database table into smaller, more manageable segments based on defined rules while maintaining the logical appearance of a single table to applications interacting with it.

ANALOGY

Database partitioning is like organizing a clothing store where instead of piling all merchandise (data) into one massive, chaotic section, you thoughtfully arrange men’s, women’s, and children’s clothes into separate departments (partitions). Shoppers can easily find what they need, store employees can efficiently manage inventory and restocking for their specific section, and the store can expand by adding specialized sections without disrupting the existing layout – all while maintaining a seamless shopping experience.

HOW

Just as there are multiple ways to organize a closet, databases offer several partitioning strategies, each with its own strengths:

Horizontal Partitioning (Row-Based):Splits table rows across partitions based on ranges of a column value, like dividing customer records by date ranges or ID ranges.

Vertical Partitioning (Column-Based): Separates columns of a table into different partitions, typically grouping frequently accessed columns together and rarely used columns in separate partitions.

Functional Partitioning: Organizes data based on how it’s used in your application, grouping related tables or functionality together regardless of structural similarities.

List Partitioning: Divides data based on specific, discrete values in a column, such as storing customer data in different partitions based on country or region.

Hash Partitioning: Distributes rows evenly across partitions using a hash function on the partition key, ideal when natural groupings don’t exist or balanced distribution is critical.

Composite Partitioning: Combines multiple partitioning strategies, such as first partitioning by date range, then sub-partitioning each range by region or customer type.

Let’s look at an example with range partitioning the most common one

Let’s say you have a table called orders, and you want to partition it by order_date, one partition per year.

1. Create the Partitioned Table

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INT NOT NULL,
    order_date DATE NOT NULL,
    amount NUMERIC
) PARTITION BY RANGE (order_date);

2. Create Yearly Partitions

CREATE TABLE orders_2023 PARTITION OF orders
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');

CREATE TABLE orders_2024 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

CREATE TABLE orders_2025 PARTITION OF orders
    FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

3. Insert Data (Postgres Routes Automatically)

INSERT INTO orders (customer_id, order_date, amount)
VALUES 
    (101, '2023-03-15', 200.00),
    (102, '2024-07-01', 350.00),
    (103, '2025-01-20', 500.00);

4. Query Normally

SELECT * FROM orders WHERE order_date >= '2024-01-01';

PostgreSQL will automatically prune irrelevant partitions during the query for performance gains.

Now, go fix some bugs!

Definition of the word

shard /ʃɑːd/ noun

a piece of broken ceramic, metal, glass, or rock, typically having sharp edges.”shards of glass flew in all directions”

Why

In today’s data-driven world, success often means rapid growth – and with that growth comes increasingly massive datasets. Traditional database setups eventually hit performance walls: queries slow down, hardware costs skyrocket, and your once-nimble application becomes sluggish under the weight of its own data.

Analogy

The Single Database Way (One Platform):
Imagine you have just one train platform at a busy station. Every train — whether it’s going north, south, east, or west — has to use that same platform. Passengers have to wait while a train arrives, unloads, loads, and departs before the next train can come in. It gets crowded fast, delays pile up, and the station staff struggles to manage the constant flow.

The Sharding Way (Multiple Platforms):
Now imagine you build separate platforms for different train lines — one for northbound trains, one for southbound, another for eastbound, and so on. Passengers go directly to the platform for their route, trains arrive and depart simultaneously on different tracks, and everything runs much more smoothly. The workload is divided, delays are reduced, and the station is far more efficient.

How

Database sharding offers a strategic solution by horizontally partitioning your data across multiple separate database instances. Rather than scaling up a single monolithic database (expensive and eventually impossible), sharding lets you scale out by distributing the load across multiple machines.

1. Application-Level Sharding (Manual)

How it works: Your app logic decides which database to write/read from based on a sharding key (e.g., user ID).
Pros:
- Simple and flexible.
- Works with vanilla PostgreSQL.
Cons:
- App becomes tightly coupled to shard logic.
- Cross-shard queries are hard to manage.

Use case: SaaS apps with clean tenant separation.

2. Foreign Data Wrappers (FDW)

Use postgres_fdw to link multiple Postgres servers and query them as if they’re one.
Each shard is a separate PostgreSQL instance, and a coordinator node aggregates results.
Pros:
- Works with standard Postgres.
- Allows federated queries.
Cons:
- Limited optimizer support.
- Performance penalty for cross-shard joins.

3. Citus (PostgreSQL Extension)

Distributed PostgreSQL: Citus transforms Postgres into a horizontally scalable database.
Supports real-time distributed SQL queries.
Handles sharding, replication, and distributed transactions.
Pros:
- Automatic sharding and parallelism.
- Supports complex queries and joins.
Cons:
- Requires Citus installation.
- Some Postgres features may be limited.

Use case: Real-time analytics, multi-tenant apps.

4. Schema-Based Sharding

Create multiple schemas in one Postgres instance, each acting like a shard.
Good for multi-tenant apps where each tenant gets a schema.
Pros:
- Simple to manage.
- No extra tooling.
Cons:
- Doesn’t scale across machines.
- Can hit performance limits at scale.

5. Hash-Based or Range-Based Sharding (Custom)

Partition data based on a hash or range of a sharding key (like user_id) and route queries accordingly.
Can be implemented with:
- Partitioned tables (Postgres 10+).
- Application logic or proxy routing.
Pros:
- Better distribution control.
Cons:
- Requires custom coordination and routing logic.

6. Proxy-Based Sharding

Use a proxy layer (like Pgpool-II, Odyssey, or custom reverse proxies) to manage routing and connections to shards.
Can perform load balancing and failover too.
Cons:
- Adds latency and complexity.
- Doesn’t inherently solve cross-shard consistency.

Simple Example implementation with EF

using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.DependencyInjection;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;

public class Customer
{
    public int Id { get; set; }
    public string Name { get; set; }
    public string Email { get; set; }
    public DateTime CreatedAt { get; set; }
}

public class ShardDbContext : DbContext
{
    public ShardDbContext(DbContextOptions options) : base(options)
    {
    }

    public DbSet<Customer> Customers { get; set; }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Customer>()
            .HasKey(c => c.Id);
    }
}

public class ShardingManager
{
    private readonly Dictionary<int, string> _shardMap = new Dictionary<int, string>
    {

        { 1, "Server=shard1.example.com;Database=Tenant1Db;User Id=user;Password=password;" },
        { 2, "Server=shard2.example.com;Database=Tenant2Db;User Id=user;Password=password;" },
        { 3, "Server=shard3.example.com;Database=Tenant3Db;User Id=user;Password=password;" }
    };

    public string GetConnectionString(int tenantId)
    {
        if (_shardMap.TryGetValue(tenantId, out string connectionString))
        {
            return connectionString;
        }
        
        throw new KeyNotFoundException($"No shard configured for tenant {tenantId}");
    }
}

public class ShardDbContextFactory
{
    private readonly ShardingManager _shardingManager;
    
    public ShardDbContextFactory(ShardingManager shardingManager)
    {
        _shardingManager = shardingManager;
    }
    
    public ShardDbContext CreateDbContext(int tenantId)
    {
        var connectionString = _shardingManager.GetConnectionString(tenantId);
        
        var optionsBuilder = new DbContextOptionsBuilder<ShardDbContext>();
        optionsBuilder.UseNpgsql(connectionString);
        
        return new ShardDbContext(optionsBuilder.Options);
    }
}

public class CustomerRepository
{
    private readonly ShardDbContextFactory _contextFactory;
    
    public CustomerRepository(ShardDbContextFactory contextFactory)
    {
        _contextFactory = contextFactory;
    }
    
    public async Task<Customer> GetCustomerAsync(int tenantId, int customerId)
    {
        using var dbContext = _contextFactory.CreateDbContext(tenantId);
        return await dbContext.Customers.FindAsync(customerId);
    }
    
    public async Task<List<Customer>> GetAllCustomersAsync(int tenantId)
    {
        using var dbContext = _contextFactory.CreateDbContext(tenantId);
        return await dbContext.Customers.ToListAsync();
    }
    
    public async Task AddCustomerAsync(int tenantId, Customer customer)
    {
        using var dbContext = _contextFactory.CreateDbContext(tenantId);
        dbContext.Customers.Add(customer);
        await dbContext.SaveChangesAsync();
    }
    
    public async Task<List<Customer>> GetCustomersFromAllShards()
    {
        var allCustomers = new List<Customer>();
  
        foreach (var tenantId in new[] { 1, 2, 3 }) // Hardcoded for simplicity
        {
            using var dbContext = _contextFactory.CreateDbContext(tenantId);
            var customers = await dbContext.Customers.ToListAsync();
            allCustomers.AddRange(customers);
        }
        
        return allCustomers;
    }
}

public class Program
{
    public static void Main(string[] args)
    {
        var services = new ServiceCollection();        
        services.AddSingleton<ShardingManager>();
        services.AddSingleton<ShardDbContextFactory>();
        services.AddScoped<CustomerRepository>();
        
        var serviceProvider = services.BuildServiceProvider();
      
        var customerRepo = serviceProvider.GetRequiredService<CustomerRepository>();
      
        ExampleUsageAsync(customerRepo).Wait();
    }
    
    private static async Task ExampleUsageAsync(CustomerRepository repository)
    {
        int currentTenantId = 1;
        
        await repository.AddCustomerAsync(currentTenantId, new Customer
        {
            Name = "John Doe",
            Email = "john@example.com",
            CreatedAt = DateTime.UtcNow
        });
        
        var customers = await repository.GetAllCustomersAsync(currentTenantId);
        foreach (var customer in customers)
        {
            Console.WriteLine($"Customer: {customer.Name}, Email: {customer.Email}");
        }
        
        var allCustomers = await repository.GetCustomersFromAllShards();
        Console.WriteLine($"Total customers across all shards: {allCustomers.Count}");
    }
}

Category: Database

Splitting Up Without Breaking Up: Partitioning Your Database with Style

WHY