Explore the potential of NumPy's NPY format as a faster alternative to Parquet for DataFrame storage in this PyCon US talk. Dive into the challenges of serializing DataFrames and learn how a custom NPZ file format with JSON metadata can offer significant performance and compatibility advantages. Examine detailed read/write performance comparisons between Parquet and NPZ across various DataFrame shapes and dtype compositions. Discover techniques for optimizing Python routines for NPY file operations and explore applications for memory-mapping complete DataFrames using NPY representation. Gain insights into improving data science workflows and reducing compute costs through this innovative approach to DataFrame storage.
Employing NumPy's NPY Format for Faster Than Parquet DataFrame Storage