Explore the architectural tradeoffs between map/reduce and parallel databases in this 25-minute conference talk from Databricks. Dive deep into the architectures of Presto and Apache Spark, focusing on key differentiators like disaggregated shuffle. Learn about the Presto-on-Spark project, a specialized Data Frame application that combines Presto's low-latency evaluation with Spark's robust execution engine. Discover the motivation, design, and current status of this initiative aimed at enabling a unified SQL experience for both interactive and batch use cases. Gain insights into Facebook's experience scaling both Presto and Spark for large-scale batch workloads, and understand the potential for greater collaboration between the Spark and Presto communities.
Presto on Apache Spark - A Tale of Two Computation Engines