Главная
Study mode:
on
1
Intro
2
Machine Learning at Facebook
3
Data Layouts (Tables and Physical Encodings)
4
Background: Apache ORC
5
How is a Feature Map Stored in ORC?
6
Introducing: ORC Flattened Map
7
Feature Reaping
8
Introducing: Aligned Table
9
Query Plan for Aligned Table
10
Reading Aligned Tables
11
End to End Performance
12
Summary
13
Future Work
Description:
Explore a 21-minute conference talk on scaling machine learning feature engineering in Apache Spark at Facebook. Dive into the implementation of Feature Injection and Feature Reaping techniques, including Spark core/SQL enhancements, indexed/aligned tables, and the new ORC FlatMap encoding. Learn about catalyst optimizations, new ORC physical encodings for feature maps, and the process of writing/committing indexed feature tables. Gain insights into Facebook's approach to improving prediction model quality through efficient data management and processing techniques in Spark.

Scaling Machine Learning Feature Engineering in Apache Spark at Facebook

Databricks
Add to list