Play all

Intro

Outline

Big Data History Cont.

Big Data Stack

Big Data Trend

Benefit of Containerization

Kubernetes Architecture

Challenges

CSI(Container Storage Interface)

CSI Core Services

CSI Advance Features

Volume Lifecycle Volume Lifecycle

Controller and Node Services

Kubernetes Storages

Kubernetes CSI Support

PV, PVC and Storage Class

Package and Deployment Suggestion

Hadoop HDFS

HDFS Cluster Scale

Apache Ozone

HDFS/Ozone as PV

HDFS Characteristics as PV

HDFS NFS Gateway CSI

Ozone CSI

Resources

Description:

Explore the integration of Kubernetes with on-premises big data clusters through this conference talk. Learn about the HDFS CSI Plugin design and architecture, addressing the challenge of consuming HDFS data with Kubernetes. Discover best practices for running Spark workloads on Kubernetes with HDFS access using the CSI plugin. Examine performance comparisons between Spark on Kubernetes with HDFS and Spark on YARN with HDFS using the TPC-DS benchmark suite. Gain insights into big data history, containerization benefits, Kubernetes architecture, CSI core services, volume lifecycle management, and Hadoop HDFS characteristics as persistent volumes. Understand the potential of Kubernetes as an alternative to Hadoop YARN for resource scheduling in on-premises big data environments.

HDFS CSI Plugin: Speeding Up Kubernetes in On-Premises Big Data Clusters

Linux Foundation

Add to list

#Computer Science #DevOps #Kubernetes #Data Science #Big Data #Containerization #Hadoop #HDFS #Persistent Volumes #Apache Ozone