Главная
Study mode:
on
1
Intro
2
Spark in Workday Prism Analytics
3
Example: Data Validation
4
About Complex Plans
5
Common Subexpression Elimination (CSE)
6
CSE Benchmark
7
Logging Complex Plans (10s of MBs in Size)
8
Problems with Large Case Expressions
9
Handling Large Case Expressions in Catalyst
10
Large Case Expression Benchmark
11
Example: Generate New Filter
12
Example: Prune Redundant Filter
13
Example: New Filter on Other Side of Join
14
Current Constraint Propagation Algorithm
15
Current Algorithm Takes High Memory
16
Recall: Fix for Large Case Expressions
17
Optimized Constraint Propagation (SPARK-33152)
18
Constraint Propagation Algorithms Comparison
19
Constraint Propagation Benchmark
20
Effect on Customer Pipeline
21
Tuning Tips
22
Future Work
Description:
Explore optimization techniques for complex Apache Spark plans in this 27-minute conference talk from DAIS NA 2021. Dive into Workday's experience building analytics products with Spark, addressing challenges like compiling large-scale DataFrames and handling extensive case statements. Learn about memory-efficient plan logging, common subexpression elimination for redundant subplan removal, and rewriting Spark's constraint propagation mechanism. Discover how these enhancements improve Catalyst performance on production pipelines, and gain valuable tips for managing complex Spark plans in your own projects. The talk covers topics such as data validation, handling large case expressions, optimized constraint propagation, and future improvements in Spark optimization.

Optimizing Catalyst Optimizer for Complex Spark Plans

Databricks
Add to list
0:00 / 0:00