Главная
Study mode:
on
1
Intro
2
The Quest for Complete DataFrame Serialization
3
NumPy Enhancement Proposal (NEP) 1
4
Promising Performance of NPZ versus Parquet
5
Overview
6
Components of a DataFrame
7
Block-Consolidation Strategies Unconsolidated Blocks
8
Block Consolidation & Complexity
9
The NPY Format
10
Converting Contiguous Bytes to an Array
11
NPY & Object Arrays
12
NPY Versions
13
The NPZ Format
14
Encoding a DataFrame as an NPZ
15
JSON Metadata
16
NPY Performance in Numpy
17
Lies, Damned Lles, and Benchmarks
18
Nine DataFrame Fixtures
19
Memory Maps
20
Memory Mapping an Array
21
Memory Mapping a DataFrame
22
Current State
23
Future Work
24
Conclusions
Description:
Explore the potential of NumPy's NPY format as a faster alternative to Parquet for DataFrame storage in this PyCon US talk. Dive into the challenges of serializing DataFrames and learn how a custom NPZ file format with JSON metadata can offer significant performance and compatibility advantages. Examine detailed read/write performance comparisons between Parquet and NPZ across various DataFrame shapes and dtype compositions. Discover techniques for optimizing Python routines for NPY file operations and explore applications for memory-mapping complete DataFrames using NPY representation. Gain insights into improving data science workflows and reducing compute costs through this innovative approach to DataFrame storage.

Employing NumPy's NPY Format for Faster Than Parquet DataFrame Storage

PyCon US
Add to list