Play all

Intro

What is Apache Spark?

A Large Community

Apache Spark Users

Original Spark Vision

Motivation: Unification

Motivation: Concise API

How Did the Vision Hold Up?

Libraries Built on Spark

Which Libraries Do People Use?

Top Applications

Main Challenge: Functional API

Which API Call Causes Most Tickets?

Example Problem

Challenge: Data Representation

Why Structure?

DataFrames and Datasets

Execution Steps

DataFrame API

Why DataFrames?

What Structured APIs Enable

Performance

Dataset API Details

Data Sources

Data Source API

Examples

Hardware Trends

Project Tungsten

Tungsten's Compact Encoding

Space Efficiency

Runtime Code Generation

Long-Term Vision

Versioning in Spark

Major Features in 2.0

Background

Structured Streaming High-level streaming API built on DataFrames/Datasets

Structured Streaming API

Example: Batch Aggregation

Example: Continuous Aggregation

Incrementalized By Spark

Release Timeline

Conclusion

Want to Learn Apache Spark?

Description:

Explore the evolution of Apache Spark's API in this keynote presentation from Scala Days New York 2016. Dive into the upcoming features of Spark 2.0, including more declarative APIs for automatic optimizations and improved links between Scala data types and binary data formats for efficient processing. Learn about Spark's journey as a large-scale Scala project, its functional API, and its impact on distributed programming. Discover the challenges faced in API design, data representation, and performance optimization. Gain insights into DataFrames, Datasets, and Structured Streaming APIs. Understand Project Tungsten's role in improving space efficiency and runtime code generation. Get a glimpse of Spark's long-term vision and versioning strategy, and find resources to further your Apache Spark knowledge.

Spark 2.0

Scala Days Conferences

Add to list

#Conference Talks #Scala Days #Data Science #Big Data #Programming #Programming Languages #Scala #Apache Spark #Data Processing #Web Development #API Development #Data Analysis #DataFrames