Play all

Intro

PROBLEMS

WHAT TO DO?

OPTIONS?

DESIGN PRINCIPLES

KNOW YOUR DATA.

DON'T REINVENT THE WHEEL.

KEEP IT SIMPLE.

KNOW YOUR USERS.

KNOW YOUR HARDWARE.

DATA INGESTION AND STORAGE

WHAT TO STORE?

HOW TO BE RESILIENT?

HOW TO SCALE?

COMPACTED TOPICS

WINDOWED DATA

DATA MODEL

IN-MEMORY COMPUTING

RAM IS VOLATILE

ALGEBRA OF SETS

BITMAPS

BACK TO MNEMOSYNE

SPARSITY

AGGREGATIONS

WAS IT WORTH IT?

CAN YOU DO IT?

SHOULD YOU DO IT?

THANK YOU!

Description:

Explore the development of Mnemosyne, a distributed indexing layer for big data, in this 52-minute Devoxx conference talk. Dive into the challenges faced by Brandwatch Audiences product and learn how they built a system capable of handling hundreds of millions of social network profiles, billions of posts, and tens of billions of follower graph edges in real-time. Discover the fusion of succinct data structures, free text search, in-memory computing with JVM, CUDA, and Kafka to create a high-performance solution. Gain insights into CAP theorem trade-offs, brute force approaches versus indexes, data structures for sorting billions of records in milliseconds, GPU problem-solving, and JVM implementation. Understand key design principles, data ingestion and storage techniques, in-memory computing concepts, and the use of bitmaps and sparse data structures. Evaluate the project's success and consider the feasibility and advisability of undertaking similar endeavors in your own work.

Mnemosyne - A Distributed Bitmapped Indexing Layer for Big Data

Devoxx

Add to list

#Conference Talks #Devoxx #Programming #Software Development #CUDA #Computer Science #Distributed Systems #Data Structures #Data Science #Data Engineering #High Performance Computing #Parallel Computing #GPU Computing #Software Engineering #Scalability #CAP Theorem