Play all

intro

preamble

about chinmay naik

mongodb to rdbms data migration

student collection mongodb

student table postresql

student - address and phone relationships

data migration - mongodb to postresql

how mongodb json data maps to sql

inserts are cool, what about updates and deletes in mongodb?

how do we migrate data?

mongo oplog operation log

what does oplog record look like?

when are we gerring to the golang concurrency?

sequential data pipeline

mongo oplog / two oplogs / postgresql

sequential pipeline performance

perf improvemwent - let's add worker pool

worker pool

worker pools v2.0

worker pool v2.0 performance

can you guess the problem?

worker pools v2.0 - the problem

back to drawing board?

fan-out for each database

concurrent data pipeline

performance comparison

resource utilization

concurrent data pipeline - improvement

16 databases and 128 collections per db

performance comparison

final concurrent data pipeline

key takeaways

keep learning

Description:

Explore a conference talk on leveraging Go concurrency for a gigabyte-scale real-world data pipeline. Dive into the challenges of migrating data from MongoDB to a relational database system, understanding the intricacies of MongoDB's oplog, and the evolution of pipeline designs. Learn how to implement and optimize worker pools, address performance bottlenecks, and utilize fan-out techniques for each database. Discover the power of concurrent data pipelines, compare performance metrics, and gain insights into resource utilization. Examine the final optimized pipeline design handling 16 databases with 128 collections each, and extract key takeaways for building efficient, scalable data migration solutions using Go's concurrency features.

Go Concurrency Powering Gigabyte-Scale Real-World Data Pipeline

Conf42

Add to list

#Programming #Programming Languages #Go #Databases #NoSQL Databases #MongoDB #Relational Databases #PostgreSQL #Computer Science #Concurrency #Database Management #Database Migration #Data Science #Data Engineering #Data Pipelines