Explore performance troubleshooting techniques for Apache Spark in this 40-minute conference talk by Luca Canali from CERN. Dive into Spark's extensive metrics and instrumentation, including executor task metrics and the Dropwizard-based system. Learn how CERN's Hadoop and Spark service leverages these metrics for troubleshooting and measuring production workloads. Discover how to deploy a performance dashboard for Spark workloads and utilize sparkMeasure, a tool based on the Spark Listener interface. Gain insights into lessons learned and upcoming improvements in Apache Spark 3.0. Cover topics such as data analytics at the Large Hadron Collider, CERN's analytics platform, performance methodologies, and anti-patterns. Examine various ways to gather and analyze Spark metrics, including REST API and event logs. Explore the components of a Spark performance dashboard, including memory usage, executor CPU utilization, and user-defined metrics. Understand the importance of combining data with context to derive meaningful insights for optimizing Spark-based applications.
Read more
Performance Troubleshooting Using Apache Spark Metrics - Databricks Talk