Tower of Babel: Making Apache Spark, Kubeflow, and Kubernetes Play Nice - Holden Karau, Netflix
Description:
Explore a conference talk that delves into the integration of Apache Spark, Kubeflow, and Kubernetes for big data matrix processing. Learn how to overcome the challenges of working with large-scale matrices that exceed the memory capacity of individual Kubernetes nodes. Discover how Apache Spark and Apache Mahout can be leveraged to distribute matrices across multiple pods and nodes, enabling processing of matrices of any dimension. Gain insights into using Kubeflow to enhance reproducibility and streamline workflows. Examine a real-world case study on denoising DICOM images of COVID patients' lungs, showcasing how these technologies can be combined to create a repeatable pipeline. Understand the potential impact of this approach in assisting doctors in resource-limited hospitals and advancing automated COVID detection research.
Tower of Babel - Making Apache Spark, Kubeflow, and Kubernetes Play Nice