Explore data locality in cloud-native environments for big data workloads in this 45-minute conference talk. Learn how Kubernetes schedules workloads based on CPU and memory resources, and discover the challenges of applying this approach to stateful big data workloads. Compare data locality support across mainstream container-attached storage solutions for Kubernetes. Dive into the network topology support provided by Apache Hadoop Ozone and its application as a locality-aware container-attached storage via Ozone CSI plugin. Witness a demonstration using Spark on Kubernetes to showcase the advantages of data locality-aware scheduling with Apache Hadoop Ozone. Gain insights into the evolution of big data, locality concepts in big data processing, Hadoop HDFS locality, and the implementation of Apache HDFS and Ozone in Kubernetes environments.
Embracing Big Data Workloads in Cloud-Native Environments with Data Locality