Главная
Study mode:
on
1
Introduction
2
Agenda
3
Data Scientist Journey
4
Pandas Origins
5
Spark vs Pandas
6
Koalas
7
Pandas vs Spark
8
Koalas vs Spark
9
Internal Frame
10
Indexing
11
Default Index Types
12
Sequel query
13
Values
14
Data Visualization
15
Class
16
Plotting
17
Analysis
18
Date Range
19
Histogram
20
Outliers
21
Merge
22
Sort Values
23
Plot Koalas Numbers
24
Convert Year Month to DateTime
25
Plot Monthly Totals
26
Plot Monthly Average
27
Paralysis and Model Training
28
Filters
29
ML Flow
30
Import ML Flow
31
Pandas dataframe
32
Query runs
33
Forecast
34
Roadmap
Description:
Explore Koalas, an open-source Python package implementing the pandas API on Apache Spark, in this 58-minute hands-on tutorial. Learn how to scale pandas to big data environments, enabling a seamless transition from single-machine to distributed computing without learning a new framework. Discover Koalas' latest functionalities, including Apache Spark 3.0 integration, and its potential as a standard API for large-scale data science. Get started with Koalas, compare Pandas and Koalas APIs for DataFrame transformation and feature engineering, and understand the differences between single-machine Pandas and distributed Koalas environments. Dive into topics such as indexing, data visualization, analysis techniques, and machine learning integration using MLflow. Follow along as the tutorial covers everything from basic operations to advanced concepts like time series analysis, outlier detection, and forecasting, providing a comprehensive overview of Koalas' capabilities in the realm of big data analytics. Read more

Koalas - Scaling Pandas API on Apache Spark

Databricks
Add to list
0:00 / 0:00