Play all

Intro

databricks

Deep Dive into the New Features of Apache Spark 3.0

A Delta Lake 0.7.0 + Spark 3.0 AMA

Spark Catalyst Optimizer

Adaptive Query Execution AQE

Apache SparkTM 3.0 AQE Fundamentals

Starting with Broadcast Hash Joins

Dynamically Switching Join Strategies Apache Spark 3.0 NE Fundamentals

Dynamically Coalescing Shuffle Partitions Apache Spark 3.0 ADÉ Fundamentals

Dynamically Optimize Skew Joins

TPC-DS performance gains from AQE

Dynamic Partition Pruning: Before Optimiza

How to Use Join Hints? Broadcast Hash Join

Extensibility and Ecosystem

Data Source V2

But what happens with DML under the cover What really happens to the file system when you run delete update and merge?

Time Travel The transaction log and additive files - data versioning

Control Table History Retention

Enable DataSourceV2 and Catalog API Integration

Data Quality Framework Improved SOL DOL and DMLS and ACID Transactions are just the start

Lakehouse Paradigm Improved Performance. DW-like capabilities, on low cost cloud object stores

Try out Spark 3.0 + Delta Lake now!

Description:

Explore how Apache Spark 3.0 and Delta Lake enhance data lake reliability in this 58-minute webinar from Databricks. Learn about Apache Spark's role in big data processing, the evolution of data lake architectures, and Delta Lake's capabilities for ensuring reliable data. Discover how unified batch and streaming simplifies architectures. Dive into Spark 3.0's new features, including the Adaptive Query Execution framework for improved query performance, Dynamic Partition Pruning for faster processing in star schema designs, and accelerator-aware scheduling for GPU optimization. Examine new Pandas UDF types and function APIs, as well as enhanced monitoring capabilities. Gain insights into Delta Lake 0.7.0, the Spark Catalyst Optimizer, and the Lakehouse paradigm. Understand how ACID transactions, schema enforcement, and time travel contribute to data reliability. Learn about data quality frameworks, improved SQL capabilities, and the integration of DataSourceV2 and Catalog API.

How Apache Spark 3.0 and Delta Lake Enhance Data Lake Reliability

Databricks

Add to list

#Data Science #Big Data #Apache Spark #Business #Business Intelligence #Data Lakes #Delta Lake

0:00 / 0:00