Главная
Study mode:
on
1
Introduction
2
Agenda
3
Background
4
Highlevel diagram
5
Databricks
6
Use Cases
7
Why we use these tools
8
Partitioning
9
Dynamic Partition Pruning
10
Dynamic File Pruning
11
Delta Log
12
Delta Data Skipping
13
Bounding Box
14
Data Set
15
Zorder Data
16
Zorder by
17
Measuring File Pruning Effectiveness
18
Which columns to use
19
Geohash
20
Getchas
21
Repartition by Range
22
Review Pipeline
23
Review Results
Description:
Explore techniques for optimizing geospatial queries using dynamic file pruning in a 25-minute presentation by Databricks. Learn how to leverage z-ordering and dynamic file pruning to significantly reduce data retrieval from blob storage and improve query times, potentially by an order of magnitude. Discover specific techniques for handling petabytes of geospatial data, including data generation methods and SQL query design to ensure dynamic file pruning is included in the query plan. Examine real-world data examples, understand potential pitfalls and workarounds in the current implementation, and witness the impressive query performance achievable when properly executed. Gain insights into topics such as partitioning, dynamic partition pruning, Delta Log, Delta Data Skipping, bounding boxes, z-ordering, geohashing, and repartitioning by range. Evaluate file pruning effectiveness and learn which columns to use for optimal results in this comprehensive exploration of geospatial query optimization. Read more

Optimizing Geospatial Queries with Dynamic File Pruning in Databricks Delta

Databricks
Add to list
0:00 / 0:00