Главная
Study mode:
on
1
Introduction
2
Neural network vs NLP
3
Language model
4
Memory
5
Neural Network
6
Word Embedding
7
Neural Network Size
8
General Approach
9
Pruning
10
Quantization based approaches
11
Fixed point quantization
12
Product quantization
13
Speed recognition performance
14
Binarization
15
Embedding Matrix
16
Full Precision Model
17
Two Methods
18
Results
19
Conclusion
20
Question
21
Sponsors
Description:
Explore a 32-minute conference talk from tinyML Asia 2020 focusing on structured quantization techniques for neural network language model compression. Delve into the challenges of large memory consumption in resource-constrained scenarios and discover how advanced structured quantization methods can achieve high compression ratios of 70-100 without compromising performance. Learn about various compression approaches, including pruning, fixed-point quantization, product quantization, and binarization. Examine the impact on speech recognition performance and compare results with full precision models. Gain insights into the application of these techniques to word embeddings and neural network architectures in the context of natural language processing and speech recognition.

Structured Quantization for Neural Network Language Model Compression

tinyML
Add to list