Explore the fascinating world of speech separation and noise removal using deep neural networks and TensorFlow in this 40-minute conference talk. Dive into the technical aspects of solving the cocktail party effect, where humans can focus on a single voice amidst multiple speakers and background noise. Learn about preparing and augmenting data for speech separation, creating and optimizing various neural network architectures, and running networks on tiny devices. Discover the potential for real-time speech separation on small embedded platforms, envisioning future smart air pods, headsets, and hearing aids. Gain insights into the latest advances and limitations in speech separation on embedded devices, including data transformation, deep neural network models, training smaller and faster networks, and creating real-time speech separation pipelines. The presentation covers topics such as mixed sounds, masking techniques, feature engineering, model parameters, and evaluation methods. Explore related use cases like "Looking to Listen" and speech-to-text applications, as well as the challenges and future directions in this exciting field of audio processing and machine learning.
Read more
Listening at the Cocktail Party - Deep Neural Networks for Speech Separation