Session 4 | Notion

Sharding and replication

Step 1. A sharded, replicated fact table

CREATE TABLE IF NOT EXISTS `ontime_shard` ON CLUSTER `{cluster}` (
 `Year` UInt16,
 `Quarter` UInt8,
 ...
)
Engine=ReplicatedMergeTree(
'/clickhouse/{cluster}/tables/{shard}/{database}/ontime_shard',
'{replica}')
PARTITION BY toYYYYMM(FlightDate)
ORDER BY (FlightDate, `Year`, `Month`, DepDel15)

What does ON CLUSTER do?

ON CLUSTER executes a command over a set of nodes

Step 2. A distributed table to find data

CREATE TABLE IF NOT EXISTS ontime ON CLUSTER '{cluster}'
AS ontime_shard
ENGINE = Distributed(
 '{cluster}', currentDatabase(), ontime_shard, rand())

Step 3. A fully replicated dimension table

CREATE TABLE IF NOT EXISTS airports ON CLUSTER 'all-replicated' (
 AirportID String,
 Name String,
 ...
)
Engine=ReplicatedMergeTree(
 '/clickhouse/{cluster}/tables/all/{database}/airports',
'{replica}')
PARTITION BY tuple()
PRIMARY KEY AirportID
ORDER BY AirportID

Sharding and replication

Querying shards and replicas