Play all

Intro

SOL Use Cases @ Facebook

Towards an Unified SOL Experience

Presto and Spark Architecture

Why Presto (or Other MPPs) Doesn't Scale?

Presto Unlimited

Why Presto-on-Spark

Presto-on-Spark Design Principles

Planning

Translating to RDD

Columnar Format to Row Format Conversion

Broadcast Join

Spark DAG

Execution

Threading Model

Classloader Isolation

Current Status

Description:

Explore the architectural tradeoffs between map/reduce and parallel databases in this 25-minute conference talk from Databricks. Dive deep into the architectures of Presto and Apache Spark, focusing on key differentiators like disaggregated shuffle. Learn about the Presto-on-Spark project, a specialized Data Frame application that combines Presto's low-latency evaluation with Spark's robust execution engine. Discover the motivation, design, and current status of this initiative aimed at enabling a unified SQL experience for both interactive and batch use cases. Gain insights into Facebook's experience scaling both Presto and Spark for large-scale batch workloads, and understand the potential for greater collaboration between the Spark and Presto communities.

Presto on Apache Spark - A Tale of Two Computation Engines

Databricks

Add to list

#Data Science #Big Data #Presto #Programming #Domain-Specific Languages (DSL) #SQL #Apache Spark #Computer Science #High Performance Computing #Parallel Computing #Data Engineering #Software Engineering #Scalability #Data Processing #Batch Processing