August 5, 2014

Researchers Have Developed a Way to Extract Audio From Silent Videos

This is one of those times when words like amazing and mind-blowing are not hyperbole. Researchers at MIT, Microsoft, and Adobe recently joined forces to do something that seems completely impossible: they've been able to extract audio from visual information alone -- meaning they have recovered sound from videos that have no audio whatsoever.

Here's the video showing their process and how they achieved their incredible results:

And here's a breakdown from the MIT article on the subject:

“When sound hits an object, it causes the object to vibrate,” says Abe Davis, a graduate student in electrical engineering and computer science at MIT and first author on the new paper. “The motion of this vibration creates a very subtle visual signal that’s usually invisible to the naked eye. People didn’t realize that this information was there.”

Reconstructing audio from video requires that the frequency of the video samples — the number of frames of video captured per second — be higher than the frequency of the audio signal. In some of their experiments, the researchers used a high-speed camera that captured 2,000 to 6,000 frames per second. That’s much faster than the 60 frames per second possible with some smartphones, but well below the frame rates of the best commercial high-speed cameras, which can top 100,000 frames per second.

In other experiments, however, they used an ordinary digital camera. Because of a quirk in the design of most cameras’ sensors, the researchers were able to infer information about high-frequency vibrations even from video recorded at a standard 60 frames per second. While this audio reconstruction wasn’t as faithful as it was with the high-speed camera, it may still be good enough to identify the gender of a speaker in a room; the number of speakers; and even, given accurate enough information about the acoustic properties of speakers’ voices, their identities.

Obviously needing the high-speed camera for the better quality sound is kind of a deal-breaker for everyday usage, but the fact that they extracted audio from a regular DSLR  -- albeit recording at 60fps -- is absolutely insane. There is some serious math going on in their algorithm, because the vibrations from the sound are only moving the object a tenth of a micrometer (0.001 millimeters), which is imperceptible to the naked eye. We're probably quite a ways off from having software like this as a plugin in Premiere, but being able to extract audio from footage that has none could be very useful, especially if you're recording slow motion in-camera, which typically would not have any audio.

Even if we don't find useful ways to use this technique on a daily basis in our work, it's very possible it will have significant uses in other applications. To read more about the research and how this technique could be used, check out the MIT post.

Link: Extracting audio from visual information -- MIT

[via PetaPixel & Gizmodo]

Your Comment

48 Comments

Very interesting technology.
Unfortunately military and surveillance are going to have, or already have, their hands all over this.

August 5, 2014 at 4:22PM, Edited September 4, 11:56AM

0
Reply
Jonathan

Actually, the US government has already been using similar technology, I don't know if it is/was CIA, NSA, FBI, ABC, XYZ, WTF or OMG, but some agency bounced lasers off of the windows of some building, maybe an embassy or consulate, and they were able to record audio from the modulated reflection. In fact, one agency, the NSA I believe, pumps sound into the space between the panes (music I'm guessing) of their own double-paned windows to help muddle the sound. I think I remember reading that they also fill that space with nitrogen or something like that but I may be wrong on that part. There's also a fine wire mesh in there to help maintain the "Faraday cage" electromagnetic isolation.

Knock knock knock...

August 7, 2014 at 7:44PM, Edited September 4, 11:56AM

0
Reply

The air disturbances of alleged perpetrators hiding under boat cabin covers.

August 5, 2014 at 4:41PM, Edited September 4, 11:56AM

1
Reply
John Bean

Would be funny if they would extract sound from famous silent movies :)

August 5, 2014 at 4:41PM, Edited September 4, 11:56AM

0
Reply
Laurel

That's an awesome idea.

August 5, 2014 at 5:31PM, Edited September 4, 11:56AM

0
Reply

Fun idea, but I doubt that any movie old enough to be silent would have picture with sufficient temporal and/or spatial resolution for this to work.

August 5, 2014 at 7:00PM, Edited September 4, 11:56AM

7
Reply
Mike

Think of this though. You put on The Shining, and get the sound of the film set. That's a whole new dimension man.

August 5, 2014 at 7:57PM, Edited September 4, 11:56AM

0
Reply
Boracuda

The Shining wasn't a silent film.

August 5, 2014 at 8:41PM, Edited September 4, 11:56AM

8
Reply
Alexander

He is talking about recovering the film set sounds, not the diegetic sounds. Like, maybe, Kubrick giving an indication.

August 5, 2014 at 10:40PM, Edited September 4, 11:56AM

5
Reply
San

Okay now remember Jack, because this film is really about my stressful time faking the moon landing I really want to see that frustration in your performance. Got it? Good.. and action!

August 6, 2014 at 1:39PM, Edited September 4, 11:56AM

1
Reply
Snail

/facepalm

August 6, 2014 at 7:36PM, Edited September 4, 11:56AM

0
Reply
Christian Anderson

It is a great idea: anything involving Kubrick is a great idea. Do you agree, Joe Marino ?

August 6, 2014 at 5:40PM, Edited September 4, 11:56AM

0
Reply
FabDex

this won't work, as production films use professional camera's without the rolling shutter effect. (and are heavily post-processed, what could lead to the destruction of that data anyway)

August 6, 2014 at 11:19AM, Edited September 4, 11:56AM

6
Reply
Fre

and yes, old (analogue) camera's don't have the rolling shutter effect either..

August 6, 2014 at 11:25AM, Edited September 4, 11:56AM

0
Reply
Fre

OMG that's the first thing I thought of when I saw this......so while the technology is truly amazing,,,,if or when they can get audio off the old silent films, THEN I'll be even MORE impressed!!

Can you imagine the thrill & excitement of hearing, say, the off screen directions being given by the director? and the extraneous sounds that would surely have been going on in the background.. The mind boggles!!!

we can only hope!!!!!!!!!!!!!!!

August 7, 2014 at 9:24AM, Edited September 4, 11:56AM

0
Reply

There you are... A reason to want rolling shutter :D This is probably one of the most impressive things I've seen.

August 5, 2014 at 4:50PM, Edited September 4, 11:56AM

6
Reply

Literally has nothing to do with rolling shutter, if anything rolling shutter would make it harder to get a good reading.

August 6, 2014 at 3:04AM, Edited September 4, 11:56AM

0
Reply
Shaun Fontaine

Did you not watch the video?

August 6, 2014 at 10:32AM, Edited September 4, 11:56AM

6
Reply

Why do people comment on articles without reading or watching them...

August 6, 2014 at 2:28PM, Edited September 4, 11:56AM

3
Reply

Fascinating. I have read about a long-standing effort to recover sound from old artifacts such as the grooves in a piece of pottery made on a wheel, or the brush-strokes of a painting made in the presence of sound. It seems like the technique presented here might help that effort.

August 5, 2014 at 5:18PM, Edited September 4, 11:56AM

1
Reply

Then surely it would be easier to analyze video of American politicians/Presidents/Halliburton execs (with or without sound...usually irrelevant) to get a measure of the "bullshit factor"...before another military mis-adventure...

August 5, 2014 at 5:29PM, Edited September 4, 11:56AM

0
Reply
Vorrik

This is so sci-fi! Very exciting!

August 5, 2014 at 5:49PM, Edited September 4, 11:56AM

0
Reply
Dave

Wow, and here I thought recovering optical track from film scan was next to impossible trick (thanks AEO-Light app). But this?... fuuuuck.

August 5, 2014 at 6:02PM, Edited September 4, 11:56AM

0
Reply
Natt

This is interesting but you all do realize what sound below 60hz (which is the best you could get from 60fps video apparently) or even 120hz sounds like don't you? Unless your politicians and silent film stars just happened to be recorded at more like a minimum of 1000fps I don't think you're going to be too fascinated by what you hear.

August 5, 2014 at 6:20PM, Edited September 4, 11:56AM

2
Reply
Ken

Did you watch the whole video? They can recover higher frequencies if the camera has a rolling shutter.

August 5, 2014 at 6:56PM, Edited September 4, 11:56AM

4
Reply
Gabe

OK - watched the last part and yes, the rolling shutter artifact can help bring up the usable upper frequency limit to somewhere around 300Hz or more which is still pretty limited. I just thought it humorous that some people were making comments as if this was going to reveal a lot of hidden sound in existing video which was more likely recorded on equipment which would severely limit the usefulness of any audio which could be extracted. I could definitely see a use for this technology in forensics - especially where there is a need for evidence from a surveillance camera that doesn't have audio. Even the lowest cut-off frequencies might be useful in an investigation.

August 5, 2014 at 7:15PM, Edited September 4, 11:56AM

0
Reply
Ken

where do you get the 300Hz from?
if it's a 60fps video shot in 1080p, the upper limit would be 57600Hz (60*1920/2) or most likely be 32400Hz (60*1080/2), according to which way of rolling shutter the camera has..

August 6, 2014 at 11:23AM, Edited September 4, 11:56AM

2
Reply
Fre

yeah, if I was filming a head of state from a distance with a high speed camera. I could get his/her conversation?
Or how about paparazzi?

August 6, 2014 at 4:45AM, Edited September 4, 11:56AM

4
Reply
steve

So, soon they can hear you speak from outerspace? ;-)

August 5, 2014 at 7:21PM, Edited September 4, 11:56AM

0
Reply

Wow. It seems that the footage needs to contain a surface that can vibrate quickly enough to articulate a useful range of frequencies. The crisp bag seems to be a pretty good transducer. Would be interesting to see what other surfaces are capable of capturing. The next time I meet with my double agent, I'll be sure not to do it over a bag of Wotsits.

August 5, 2014 at 7:25PM, Edited September 4, 11:56AM

4
Reply
Ben

Seriously, we need to have a "bragging rights" contest just amongst NFS readers to see who can be the first one to put this scientific concept into a movie!! I figure the easiest take on it would to use it with a "CSI" theme but the possibilities are endless.

August 5, 2014 at 9:07PM, Edited September 4, 11:56AM

2
Reply
Jakartaguy

Harry Caul from The Conversation would be so into this.

August 5, 2014 at 9:46PM, Edited September 4, 11:56AM

0
Reply

Time to extract audio from original silent films and see how they really sounded like :)

August 6, 2014 at 6:37AM, Edited September 4, 11:56AM

11
Reply

Somehow I think the film grain would cause problems.

August 6, 2014 at 10:33AM, Edited September 4, 11:56AM

0
Reply

This is one of those situations where the word "can" is silly. Sure...they "can" do this, but only in the right situations that make it a useless utility.

If 1 in 1000 times, a device "can" do something, the marketing assholes will say "lets put that on the box and sell it to customers!"

August 6, 2014 at 11:22AM, Edited September 4, 11:56AM

0
Reply
Jason

"There is some serious math going on in their algorithm, because the vibrations from the sound are only moving the object a tenth of a micrometer (0.001 millimeters)"

(1) Micrometre = 10^-6m

(2) Micrometre / 10 = 10^-6m * 10^-1 = 10^-7m

(3) Millimetre = 10^-3m

(4) 0.001 millimetres = 10^-3 * 10^-3m = 10^-6m

So: Micrometre / 10 is not equal to 0.001 * millimetre

I.e. "moving the object a tenth of a micrometer (0.001 millimeters)" = false.

August 6, 2014 at 12:30PM, Edited September 4, 11:56AM

0
Reply
Myra Buttal

I wonder if it would be possible to capture an image from an audio sound wave generated in a controlled environment.

August 6, 2014 at 2:25PM, Edited September 4, 11:56AM

5
Reply
Gropius

Could this potentially extract anything from the Zapruder film?

August 6, 2014 at 3:31PM, Edited September 4, 11:56AM

0
Reply
Joe

See the sound in slow motion

https://www.youtube.com/watch?v=LGzU6agLWCQ

August 6, 2014 at 5:32PM, Edited September 4, 11:56AM

1
Reply

I didn't get this one!

August 7, 2014 at 4:54AM, Edited September 4, 11:56AM

5
Reply

I know lasers on glass have been used to detect minute vibrations. This is totally next level! Bravo!

August 6, 2014 at 7:38PM, Edited September 4, 11:56AM

0
Reply
Christian Anderson

So, have they pulled any audio off of silent film? any results yet?

August 6, 2014 at 9:00PM, Edited September 4, 11:56AM

0
Reply
J

Further evidence that this reality is all one giant matrix simulation

August 7, 2014 at 8:04PM, Edited September 4, 11:56AM

0
Reply
Jackson

Two words: Zapruder Film.

Very interesting, and this does demonstrate what many have argued, that our senses isolate and separate elements of the whole. People tripping on LSD always loved to talk about "seeing sounds" amd "hearing colors." These experiments demonstrate some truth to those ideas, the implications and possibilities are endless.

Enjoy your future Nobel Prize, kids!

August 7, 2014 at 8:18PM, Edited September 4, 11:56AM

1
Reply

If normal scenes from silent films aren't possible to extract sound from, how about slow motion shots? Anything from a silent film like 'Man With a Movie Camera' to the elevator-blood-scene from The Shining.

August 11, 2014 at 11:48PM, Edited September 4, 11:56AM

0
Reply

In old silent films the film tended to rattle in the gate which would tend to negate any true info from being recorded and once the film went thru the developer and washing most of the info would be gone.
Also you would need the original negative as the one you see in the cinema is a positive copy.
When talkies began they always recorded the sound on set as they do today because they need the sound from the actors as a guide to sync the sound which nearly always replaced in Hollywood movies the guide tracks are always kept with the original negative but never released to the public so if you wet back to listen to them you could hear famous director shouting directions to famous actors and them having hissy fits.

August 13, 2014 at 2:22PM, Edited September 4, 11:56AM

0
Reply
Jean

Personally I just want to see any/all law enforcement agencies given the software and then using it to search through child abuse videos to catch the sickos in these videos

August 17, 2014 at 12:43PM, Edited September 4, 11:56AM

0
Reply
stu

The same analysis should also mean we can make a hd version of, say, the zapruder film?
The pixel analysis should mean we can improve resolution. Also many archive films will be scrutinized, like apollo moon films, to see if they were filmed in studio or on the moon.

September 1, 2014 at 11:29AM, Edited September 4, 11:56AM

4
Reply
Björn from Sweden