Schedule as of Oct 11, 2022 - subject to change

Default Time Zone is EDT - Eastern Daylight Time

Back To Schedule
Thursday, October 20 • 11:30am - 12:30pm
Teaching AI to hear like we do: psychoacoustics in machine learning

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

The AES Technical Committee on Machine Learning and Artifical Intelligence [TC-MLAI] has recognized that there is a lack of understanding and use of perceptually motivated audio loss functions.

In audio and speech coding, perceptually weighted error functions are commonly used. In audio coding it is a perceptual model which controls the quantization step size of subband signals, in speech coding it is a perceptual error weighting filter. In deep learning for audio signals a similar principle can be used of the loss function in training a network. This workshop shall give an overview of existing approaches, and outlooks of possible future directions.

A three part panel discussion to inform the AES community about this topic and where and how to use these loss functions, giving audio examples of the results and effects, an outlook for future developments, and soliciting feedback from the audience.
• Time domain vs. frequency domain metrics and loss functions
• Psychoacoustic pre- and post-filters as loss functions
• Perceptually weighted loss functions

avatar for Gerard Schuller

Gerard Schuller

Professor, Ilmenau University of Technology, Germany
Deep Learning for Audio Processing
avatar for Renato Profeta

Renato Profeta

Institut für Medientechnik, TU Ilmenau
Renato Profeta is a Ph.D. Candidate in Audio Signal Processing at the Ilmenau University of Technology.He received a Master of Engineering degree in Electrical Engineering from Kempten University of Applied Sciences and a Bachelor of Engineering in Electrical Engineering from Riga... Read More →

Bernd Edler

Audiolabs Erlangen

Martin Strauss

Audiolabs Erlangen
avatar for Stefan Goetze

Stefan Goetze

University of Sheffield
avatar for Gordon Wichern

Gordon Wichern

Principal Research Scientist - Speech and Audio Team, MERL
Audio signal processing and machine learning resarcher
avatar for George Close

George Close

PhD Student, University of Sheffield

Thursday October 20, 2022 11:30am - 12:30pm EDT