Play all

Introduction

The Advanced Pipeline

Challenges

Data Quality

Division of Labor

Data Science vs Software Engineering

Data Science Engineering Principles

Ian Good

Requirements engineering

Reproducible builds

mlflow

data scientist

Jupiter

Feature Catalog

Model Libraries

Model Optimization

Resource Service Management

Wishlist

Description:

Explore the intricacies of operating deep learning pipelines using Kubeflow in this comprehensive conference talk by Jörg Schad and Gilbert Song from Mesosphere. Dive into the process of building a production-grade data science pipeline, integrating Kubeflow with open-source data, streaming, and CI/CD automation tools. Learn about essential components such as data preparation using Apache Spark or Apache Flink, data storage with HDFS and Cassandra, automation via Jenkins, and request streaming with Apache Kafka. Discover how to construct and manage a complete deep learning pipeline for multiple tenants, covering topics like data cleansing, model storage, distributed training, monitoring, and infrastructure management. Gain insights into addressing challenges in data quality, division of labor between data scientists and software engineers, and implementing data science engineering principles. Explore advanced concepts including reproducible builds, MLflow integration, feature catalogs, model libraries, and resource service management to enhance your deep learning pipeline operations. Read more

Operating Deep Learning Pipelines Anywhere Using Kubeflow

CNCF [Cloud Native Computing Foundation]

Add to list

#Conference Talks #Computer Science #Deep Learning #Data Science #Big Data #Apache Spark #DevOps #CI/CD #Database Management #Data Storage #Data Preparation #Machine Learning #Distributed Training #Kubernetes #Kubeflow