Last month, an algorithm—named Benjamin, no less—wrote a screenplay. Now, researchers from MIT's Computer Science and Artificial Intelligence Laboratory have written an algorithm that can effectively "sound design" a film.

The robot, which its creators claim passes "the Turing Test for Sound," is able to crawl silent video footage and insert appropriate sound effects, such as a rustling paper bag or a finger tapping a wine glass. 


The researchers drew upon a sophisticated AI method called deep learning, which attempts to replicate human pattern-recognition processes in the neocortex, the part of the brain in which 80 percent of cognition occurs. After feeding the algorithm nearly 50,000 disparate sounds and thousands of videos, it was able to deconstruct sound properties, such as pitch and loudness, and learn to associate certain waveforms with certain visuals.

The MIT team tested the efficacy of the robot by conducting an online study in which subjects saw two videos of collisions—one with the actual recorded sound, and one with the robot's inserted sound design—and were tasked with identifying which was real. Amazingly, subjects picked the robot-generated sound over the organic sound twice as often. 

 

Screen_shot_2016-07-01_at_1Credit: MIT

Of course, Foley artists won't become redundant any time soon. The algorithm is far from perfect; it's particularly inept when it comes to erratic sounds that occur in rapid succession, such as the irregular beat of a drum, and will often miss or "hallucinate" a hit. But most importantly, the robot can only sound design "visually indicated sounds"—or in film terms, diegetic sound—which precludes the most creative aspects of the sound design process.

"From the gentle blowing of the wind to the buzzing of laptops, at any given moment there are so many ambient sounds that aren’t related to what we’re actually looking at," said PhD student Andrew Owens, a researcher on the project. "What would be really exciting is to somehow simulate sound that is less directly associated to the visuals."

Top photo credit:  / Shutterstock