Detailed storage layout within a single part /var/lib/clickhouse/data/airline/ontime
8
Adding CPUs boosts parallelized execution
9
Effect on storage is dramatic
10
Materialized views restructure/reduce data
11
Alternative pattern: Tiered storage
12
How do distributed queries work?
13
Pattern: Kafka-based ingestion pipelines
14
Alternative ingest pattern: Kafka engine
15
Pattern: Grafana visualization
16
Pattern: Operation on Kubernetes
Description:
Explore the world of open-source real-time data warehouses in this 51-minute Linux Foundation conference talk. Delve into the unique characteristics of analytic applications and SQL data warehouses, with a focus on ClickHouse and its Merge Tree table engine. Discover the intricacies of data layout, storage optimization, and parallelized execution. Learn about materialized views, tiered storage, and distributed queries. Examine patterns for Kafka-based ingestion pipelines, Grafana visualization, and operation on Kubernetes. Gain insights into breaking free from proprietary solutions and leveraging open-source technologies for efficient data warehousing and analytics.
Breaking Out of the Proprietary Cage - Real-time Data Warehouses in Open Source