Главная
Study mode:
on
1
Introduction
2
Dataset options
3
Who is Jovan
4
Demo
5
Expressions
6
Data Science Example
7
Memory Map
8
Missing Values
9
Number of Passengers
10
Trip Distances
11
New York
12
New York City
13
Filter
14
Trip duration
15
Categorizing
16
Group by Standard
17
Density Maps
18
Machine Learning
19
Memory
20
PCA
21
PCA on a subsample
22
Payment type
23
String operations
24
Memory usage
25
Light GBM
26
Predict method
27
Wrappers
28
Virtual columns
29
Testing the notebook
30
Conclusion
31
Questions
Description:
Explore modern data science techniques using Vaex, a powerful DataFrame library, in this 51-minute EuroPython Conference talk. Learn how to efficiently process large datasets on personal computers by leveraging computational graphs, lazy evaluations, memory-mapped storage, and out-of-core algorithms. Discover methods for cleaning, filtering, grouping, and transforming data while visualizing and analyzing correlations. Gain insights into handling datasets with millions or billions of samples without relying on distributed computing. Follow along as the speaker demonstrates practical examples using New York City taxi data, covering topics such as expressions, memory mapping, missing values, filtering, categorizing, group operations, density maps, machine learning, and virtual columns. Understand how Vaex optimizes memory and CPU usage, enabling data scientists to work effectively on laptops or workstations with limited RAM but fast SSD storage.

Modern Data Science with Vaex - A New Approach to DataFrames and Pipelines

EuroPython Conference
Add to list