Play all

Introduction

Agenda

Hardware detector

Streaming

Subclass API

Edge cases

Quantization

Posttraining Quantization

Fake Quantization

Native Quantization

Observations

Local Selfattention

MultiHealth Selfattention

Real Model

Models

Description:

Explore on-device speech model optimization and deployment in this tinyML Summit 2022 presentation. Dive into the challenges of real-time execution on mobile hardware, focusing on latency and memory footprint constraints. Learn about streaming-aware model design using functional and subclass TensorFlow APIs, and discover various quantization techniques including post-training quantization and quantization-aware training. Compare the pros and cons of different approaches and understand selection criteria based on specific ML problems. Examine benchmarks of popular speech processing model topologies, including residual convolutional and transformer neural networks, as demonstrated on mobile devices. Gain insights into local self-attention, multi-head self-attention, and real-world model implementations to enhance your understanding of efficient on-device speech processing.

On-Device Speech Models Optimization and Deployment for Mobile Hardware

tinyML

Add to list

#Computer Science #Machine Learning #TensorFlow #Quantization #Model Optimization #Deep Learning #Self-Attention

0:00 / 0:00