Play all

Introduction

Agenda

Data Scientist Journey

Pandas Origins

Spark vs Pandas

Koalas

Pandas vs Spark

Koalas vs Spark

Internal Frame

Indexing

Default Index Types

Sequel query

Values

Data Visualization

Class

Plotting

Analysis

Date Range

Histogram

Outliers

Merge

Sort Values

Plot Koalas Numbers

Convert Year Month to DateTime

Plot Monthly Totals

Plot Monthly Average

Paralysis and Model Training

Filters

ML Flow

Import ML Flow

Pandas dataframe

Query runs

Forecast

Roadmap

Description:

Explore Koalas, an open-source Python package implementing the pandas API on Apache Spark, in this 58-minute hands-on tutorial. Learn how to scale pandas to big data environments, enabling a seamless transition from single-machine to distributed computing without learning a new framework. Discover Koalas' latest functionalities, including Apache Spark 3.0 integration, and its potential as a standard API for large-scale data science. Get started with Koalas, compare Pandas and Koalas APIs for DataFrame transformation and feature engineering, and understand the differences between single-machine Pandas and distributed Koalas environments. Dive into topics such as indexing, data visualization, analysis techniques, and machine learning integration using MLflow. Follow along as the tutorial covers everything from basic operations to advanced concepts like time series analysis, outlier detection, and forecasting, providing a comprehensive overview of Koalas' capabilities in the realm of big data analytics. Read more

Koalas - Scaling Pandas API on Apache Spark

Databricks

Add to list

#Data Science #Big Data #Apache Spark #Data Analysis #Data Visualization #Computer Science #Machine Learning #Programming #Programming Languages #Python #pandas #MLFlow

0:00 / 0:00