Play all

Intro

Denis Diderot and the Diderot effect

The Diderot effect in data processing systems

The Diderot effect in Spark: Project Tungsten (2015)

The Diderot effect, revised for 2021

What's your oldest Spark application?

Abstractions can leak in performance tuning

Choosing the right partition size is difficult

Adaptive query execution: coalescing

Sidebar: some basics on joins

Adaptive query execution: partition pruning

Enabling adaptive query execution

Accelerating Spark with NVIDIA GPUs

Case study: predicting customer churn

What's next?

Description:

Explore strategies for modernizing Apache Spark applications to leverage the full potential of Spark 3.0 and beyond in this 25-minute talk by Databricks. Learn about common sources of technical debt in mature Spark applications and how to address them, discover when to replace manual configurations with Adaptive Query Execution, and understand how to optimize queries for columnar processing and GPU execution. Gain insights from concrete examples of customer churn modeling, recent experiences in modernizing Spark applications, and lessons learned from maintaining Spark extensions across multiple versions. Delve into topics such as the Diderot effect in data processing systems, Project Tungsten, adaptive query execution techniques, and accelerating Spark with NVIDIA GPUs. Acquire valuable knowledge to enhance your analytics workloads and incorporate accelerated ML training directly into your Spark applications.

Modernizing Apache Spark 3.0 Applications - Best Practices and Optimization Techniques

Databricks

Add to list

#Data Science #Big Data #Apache Spark #Computer Science #Machine Learning #Data Processing #Computer Hardware #GPU Acceleration #Software Engineering #Technical Debt

0:00 / 0:00