CUDA Crash Course: Cache Tiled Matrix Multiplication
5
CUDA Crash Course: Why Coalescing Matters
6
CUDA Crash Course: cuBLAS Vector Add
7
CUDA Crash Course: cuBLAS Matrix Multiplication
8
CUDA Crash Course: Sum Reduction Part 1
9
CUDA Crash Course: Sum Reduction Part 2
10
CUDA Crash Course: Sum Reduction Part 3
11
CUDA Crash Course: Sum Reduction Part 4
12
CUDA Crash Course: Sum Reduction Part 5
13
CUDA Crash Course: Visual Studio 2017 Environment Setup
14
CUDA Crash Course: Programming in Linux
15
CUDA Crash Course: Video Corrections
16
CUDA Crash Course: Sum Reduction Part 6
17
CUDA Crash Course: Naive 1-D Convolution
18
CUDA Crash Course: 1-D Convolution with Constant Memory
19
CUDA Crash Course: Tiled 1-D Convolution
20
CUDA Crash Course: 1-D Convolution Cache Simplification
21
CUDA Crash Course: 2-D Convolution
22
CUDA Crash Course: Thinking Spatially
23
CUDA Crash Course: Optimizing Histogram Kernels
24
CUDA Crash Course: Comparing Matrix Multiplication Implementations
25
CUDA Crash Course: Comparing Sum Reduction Implementations
26
CUDA Crash Course: Handling Non-Perfect Input Sizes
27
CUDA Crash Course: OpenACC Matrix Multiplication
28
CUDA Crash Course: Device Properties
29
CUDA Crash Course: Profiling with clock()
30
CUDA Crash Course: GPU Performance Optimizations Part 1
Description:
Dive into a comprehensive 7-hour crash course on CUDA programming, covering essential topics from basic vector addition to advanced GPU performance optimizations. Learn to implement and optimize various algorithms including matrix multiplication, sum reduction, and convolution using CUDA. Explore unified memory, cache tiling, coalescing, and the use of libraries like cuBLAS. Gain practical experience with hands-on exercises in both Windows and Linux environments, and understand crucial concepts such as spatial thinking and handling non-perfect input sizes. Master profiling techniques and discover how to maximize GPU performance through a series of in-depth lessons and real-world examples.