Главная
Study mode:
on
1
intro
2
preamble
3
hello
4
quix
5
quix streams
6
quix cloud
7
what is a sketch?
8
approximate answers
9
sketch characteristics
10
sketch components
11
why exact == slow
12
distributed processing
13
unique word count
14
massively parallel processing mpp
15
shuffling is slow
16
latency numbers every programmer should know
17
why sketches == fast
18
sketch design
19
sublinear data structure growth
20
mergability
21
non-additive challenges are everywhere
22
unique counts are non-additive
23
non-additive challenges solved
24
types of sketches
25
count min sketch
26
open source sketches
27
apache datasketches java, c++, python
28
datasketch extensions
29
thank you
Description:
Explore the world of sketching algorithms for big data analysis in this conference talk from Conf42 Python 2024. Dive into the concept of sketches as approximate data structures, understanding their characteristics, components, and advantages over exact computations. Learn about distributed processing challenges, the importance of sublinear data structure growth, and mergability in sketch design. Discover various types of sketches, with a focus on the Count-Min Sketch algorithm. Gain insights into open-source sketching libraries like Apache DataSketches and their extensions. Equip yourself with knowledge to tackle non-additive challenges in data processing and understand why sketches offer faster solutions for big data problems.

Sketching Algorithms: Making Sense of Big Data in a Single Stroke

Conf42
Add to list
0:00 / 0:00