Explore real-time stream processing for machine learning at massive scale in this EuroPython 2020 conference talk. Gain practical insights on building scalable data streaming ML pipelines to process large datasets using Python and popular frameworks like Kafka, SpaCy, and Seldon. Follow a case study on automated content moderation of Reddit comments, handling stream data in a Kubernetes cluster. Dive into fundamental stream processing concepts such as windows, watermarking, and checkpointing. Learn to build, deploy, and monitor complex data streaming pipelines that process production incoming data in real-time. Discover best practices and tools for monitoring, as well as an overview of the machine learning workflow, including model training, evaluation, and serving with native Kafka integration.
Real Time Stream Processing for Machine Learning at Massive Scale