Главная
Study mode:
on
1
Introduction
2
Dennis Axe
3
Hardware Software Interface
4
Pipeline
5
Core Architecture
6
Superlane Architecture
7
DomainSpecific Architecture
8
Data Types
9
Communication and Computation
10
Energy Difference
11
Functional Control Units
12
Superlane
13
Vector Processor
14
Memory System
15
Switch Execution Module
16
System Architecture
17
Topology
18
Packaging
19
Network
20
Normal RDMA
21
Communication model
22
Synchronous communication
Description:
Explore the convergence of AI and High-Performance Computing (HPC) through the lens of dataflow architecture in this Stanford seminar. Delve into the novel Groq architecture and tensor streaming processor (TSP), combining traditional dataflow elements with a powerful stream programming model. Discover how over 400,000 arithmetic units and a SIMD spatial microarchitecture efficiently exploit dataflow locality in deep learning models. Learn about the stream programming model, where on-chip functional units consume and produce tensor inputs, chaining outputs to minimize memory access. Understand how deterministic execution and dataflow locality simplify compiler abstraction, enabling efficient orchestration of data and instructions. Examine the extension of this model to distributed scale-out systems, creating a synchronous parallel computer illusion. Explore how the Groq parallelizing compiler leverages this programming model to auto-scale TSPs, facilitating robust numerical computations in production environments. Read more

Stanford Seminar - Dataflow for Convergence of AI and HPC - GroqChip

Stanford University
Add to list
0:00 / 0:00