Play all

Intro

The Business Intelligence use case How BI tools connect to Databricks?

Data growth

Challenges and opportunities Breaking down the extract problem Problem

Fetching query results Result pagination

Importing tables Use internal compute engine

Serving results before Arrow Multiple layers of conversion

Serving results with Arrow Bring results faster to the client

Collecting results in Arrow format Tasks generate Arrow batches

Arrow batch sizing Fetching Arrow batches

Improvements with Arrow Speedups up less than 3x

Extract bottlenecks

New data extract architecture Cloud Fotch system design

Inlining small results Hybrid results

Data layout File sizing and pagination

Fetching results from URLS Parallel file downloads

Cloud Fetch performance Extract faster than BI tools can ingest

Cloud Fetch in the wild Outperforms direct fotch by an order of magnitude

Conclusions Scaled up extract workloads using cloud storage

DATA+AI SUMMIT 2022

Description:

Explore high-bandwidth connectivity with BI tools through Cloud Fetch in this 20-minute Databricks video. Learn how to overcome the data transfer bottleneck in traditional data warehouses when extracting large query results using Business Intelligence tools like Tableau and Microsoft Power BI. Discover the new parallel data fetching mechanism via cloud storage, such as AWS S3 and Azure Data Lake Storage, which can result in a 10x speed-up in extract performance. Dive into the challenges of data growth, the intricacies of result pagination, and the improvements made with Apache Arrow. Understand the new data extract architecture, including hybrid results, data layout considerations, and parallel file downloads. Gain insights into Cloud Fetch's real-world performance and its ability to scale up extract workloads using cloud storage, ultimately enabling faster data ingestion for BI tools.

Cloud Fetch: High-Bandwidth Connectivity for BI Tools - Databricks

Databricks

Add to list

#Business #Business Intelligence #Programming #Cloud Computing #Data Warehousing #Data Science #Data Extraction #Cloud Storage #Big Data #Apache Arrow

0:00 / 0:00