Play all

Intro

About Dremel

Data Lake Storage

Data Consumers

Data Warehouse

Apache Arrow

Exponential Growth

Gandiva

Performance Improvements

Cloud Cache

AeroFlight

Python Example

Spark Example

Demo Overview

Demo

Demo with Python

Description:

Explore the power of Apache Arrow in this 37-minute conference talk from Databricks. Learn how this open-source, columnar, in-memory data representation enables real-time data exchange and processing across analytical systems and data sources. Discover how Arrow simplifies and accelerates data access without the need for physical data consolidation, addressing challenges in microservices and cloud app environments. Gain insights into Arrow's integration with various open-source and commercial technologies, including GPU databases, machine learning libraries, execution engines, and visualization frameworks. Witness impressive performance improvements, such as a 50x speedup in PySpark. Understand how organizations can leverage Arrow to enable efficient data access and analysis across disparate sources without centralized data repositories. Dive into topics like Dremel, data lake storage, data consumers, data warehouses, and cloud caching. Experience a practical demonstration showcasing Arrow's capabilities with Python and Spark examples. Read more

Data Science Across Data Sources with Apache Arrow - Accelerating Analytics and Interoperability

Databricks

Add to list

#Data Science #Big Data #Apache Arrow #Programming #Cloud Computing #Microservices #Programming Languages #Python #PySpark #Business #Business Intelligence #Data Lakes #Data Processing