Главная
Study mode:
on
1
Intro
2
The Business Intelligence use case How BI tools connect to Databricks?
3
Data growth
4
Challenges and opportunities Breaking down the extract problem Problem
5
Fetching query results Result pagination
6
Importing tables Use internal compute engine
7
Serving results before Arrow Multiple layers of conversion
8
Serving results with Arrow Bring results faster to the client
9
Collecting results in Arrow format Tasks generate Arrow batches
10
Arrow batch sizing Fetching Arrow batches
11
Improvements with Arrow Speedups up less than 3x
12
Extract bottlenecks
13
New data extract architecture Cloud Fotch system design
14
Inlining small results Hybrid results
15
Data layout File sizing and pagination
16
Fetching results from URLS Parallel file downloads
17
Cloud Fetch performance Extract faster than BI tools can ingest
18
Cloud Fetch in the wild Outperforms direct fotch by an order of magnitude
19
Conclusions Scaled up extract workloads using cloud storage
20
DATA+AI SUMMIT 2022
Description:
Explore high-bandwidth connectivity with BI tools through Cloud Fetch in this 20-minute Databricks video. Learn how to overcome the data transfer bottleneck in traditional data warehouses when extracting large query results using Business Intelligence tools like Tableau and Microsoft Power BI. Discover the new parallel data fetching mechanism via cloud storage, such as AWS S3 and Azure Data Lake Storage, which can result in a 10x speed-up in extract performance. Dive into the challenges of data growth, the intricacies of result pagination, and the improvements made with Apache Arrow. Understand the new data extract architecture, including hybrid results, data layout considerations, and parallel file downloads. Gain insights into Cloud Fetch's real-world performance and its ability to scale up extract workloads using cloud storage, ultimately enabling faster data ingestion for BI tools.

Cloud Fetch: High-Bandwidth Connectivity for BI Tools - Databricks

Databricks
Add to list
0:00 / 0:00