Play all

Intro

Please meet Retriever

Retriever is a special purpose data store

What is Honeycomb?

How Honeycomb works

Honeycomb under the hood

Our requirements

Requirements - summary

Retriever at a glance

Retriever compared to Scuba

Architecture - write path

Architecture - read path

Data model - datasets

Data model - events

Row oriented storage

Column oriented storage

Storage Format - timestamp column

Storage Format - reading

Distributed queries

Distributed reads - calculations

Distributed reads - fanout

Detour - Kafka

Ingestion

Quota management

Fault tolerance

Failure recovery

Bootstrapping new nodes

Description:

Explore the architecture and implementation of Retriever, a custom-built distributed column store database, in this 43-minute Strange Loop Conference talk. Learn how Honeycomb addressed the challenges of understanding complex distributed systems in production by developing a low-latency, schemaless database inspired by Facebook's Scuba. Discover the design decisions behind Retriever, including its use of disk storage, efficient column-oriented storage model, and ability to handle multi-tenancy and cost constraints. Gain insights into the write and read paths, data model, storage format, distributed queries, and fault tolerance mechanisms. Understand how Retriever ingests events from Kafka, manages quotas, and handles failure recovery. Delve into the lessons learned from operating a hand-rolled database at production scale with paying customers, and see how it compares to other solutions for sub-second complex queries over large data volumes in real time.

Why We Built Our Own Distributed Column Store

Strange Loop Conference

Add to list

#Conference Talks #Strange Loop Conference #Computer Science #Distributed Systems #Software Engineering #Fault Tolerance #Data Science #Data Engineering #Data Ingestion #Programming #Databases #Database Architecture

0:00 / 0:00