Главная
Study mode:
on
1
Introduction
2
Industry Trends
3
AI by Enterprises
4
Storage and Compute
5
Ecosystem
6
Training and Deployment
7
Network Interfaces
8
Middleware Stack
9
Software
10
Preprocessing
11
Summary
12
Questions
13
What is GDR
14
Dual Approach
15
GPU Direct RDMA
16
Storage Needs
17
Training Methods
18
Network Usage
19
Collective Operations
20
Optane Persistent Memory
Description:
Learn about training deep learning models in cloud environments through this 56-minute webcast presented by experts from Habana (Intel) and IBM. Explore industry predictions showing deep learning's dominance in future cloud workloads, with a focus on foundation models trained using billions of parameters. Gain insights into AI adoption benefits across industries, infrastructure selection considerations for both on-premises and cloud deployments, and solution approaches for enterprise AI implementation. Discover how organizations leverage cloud-native AI software stacks like Kubernetes to manage complexity with evolving frameworks like TensorFlow and PyTorch. Examine critical aspects of operationalizing deep learning infrastructure, including scaling solutions, cost optimization, training time reduction, data storage capacity, bandwidth requirements, and additional key infrastructure selection criteria. Dive into technical topics like GPU Direct RDMA, storage needs, training methods, network usage, collective operations, and Optane Persistent Memory. Master the essentials of deep learning infrastructure design while understanding the tradeoffs between cost, performance, and flexibility in modern AI deployments. Read more

Training Deep Learning Models in the Cloud - Infrastructure Considerations and Best Practices

SNIAVideo
Add to list
0:00 / 0:00