Speech Enhancement and Denoising

Technologies: Python, Pytorch

For our final project in EE 381v: Spoken Language Technologies, we experimented with removing distortion from audio using AI.

Existing models for speech enhancement perform well in noisy environments, however we noticed that the models do not handle cases where there are other forms of distortions such as reverb present. We explored several avenues to address this issue, including developing our own model, adjusting the architecture of Meta's Demucs model, and pretraining Meta's Demucs model on a dataset we created that contains noisy audio with reverb.

Although our efforts were not very successful, we learned a lot from this experience and it was fun to work on. One takeaway from our experimentation was that when we tuned the model to be better at removing distortion, it gets worse at removing additive noise such as a lawnmower running in the background or chatter from a coffee shop. We did not have enough time to explore this further for this project, but we felt like it was something worth looking into.

Learning Deep Semantics for Test Completion