Play all

Intro

ANSI SOL Compliance

Fail Earlier for Invalid Data

Forbid Confusing CAST

ANSI Mode GA in Spark 3.2

Unified CREATE TABLE SOL Syntax

CHAR/VARCHAR Support

More ANSI Features Coming in Spark 3.2!

Node Decommissioning

Summary

SOL Performance

Shuffle Hash Join Improvement

Partition Pruning Improvement

Predicate Pushdown Improvement

Reduce Query Compiling Latency (3.2)

Stream-stream Join

State Store for Structured Streaming

Rocks DB State Store

Add the type hints PEP 484 to PySpark!

Static Error Detection

Python Dependency Management

Visualization and Plotting

Usability Enhancements

New Utility Functions for Unix Time

New Utility Functions for Time Zone

EXPLAIN FORMMATTED

Ignore Hints

Documentation and Environments

New Doc for PySpark

Deprecations and Removals

Description:

Explore the latest advancements in Apache Spark 3.1 through this comprehensive 49-minute Databricks video. Dive deep into over 1500 resolved JIRAs, focusing on key improvements that make Spark faster, easier, and smarter. Learn about crucial SQL features for ANSI compliance, innovative streaming capabilities, and Python usability enhancements. Discover performance optimizations and new tuning techniques in the query compiler. Gain insights into upcoming major initiatives and future developments. Through examples and demos, understand important changes such as ANSI SQL mode, unified CREATE TABLE syntax, CHAR/VARCHAR support, node decommissioning, shuffle hash join improvements, partition pruning, predicate pushdown, and reduced query compiling latency. Explore advancements in stream-stream joins, state store for Structured Streaming, PySpark type hints, static error detection, Python dependency management, and new utility functions for Unix time and time zones. Familiarize yourself with usability enhancements, documentation updates, and important deprecations and removals in this essential update for Spark developers and data professionals. Read more

Deep Dive into New Features of Apache Spark 3.1

Databricks

Add to list

#Data Science #Big Data #Apache Spark #Programming #Domain-Specific Languages (DSL) #SQL #Programming Languages #Python #PySpark #Data Processing #Data Engineering

0:00 / 0:00