Is Adobe's Project VoCo the Photoshop for Audio?
Adobe Project VoCo replicates voices to add dialogue you forgot to record.
Over the past week, Adobe has shown off a slew of new technology as part of its annual San Diego mega-conference, AdobeMax. In addition to highlighting upcoming releases, the convention at AdobeMax serves as a training center for existing software, and hosts dozens of panels throughout the course of its week-long duration.
One of the most unusual reveals Adobe has treated its attendees to is a project under development as part of a collaboration with Princeton University. Adobe Developer Zeyu Jin took the stage to introduce Project VoCo, a prototype he described as having the potential to do for audio what Photoshop does for photography.
Add text to a recording in exactly the same voice by simply selecting a clip of speech, opening up an edit box, and typing in new text.
Essentially, the software will allow you to add words to your audio recording that were never recorded. If one of your actors gives a reading that proves to be just a little off, you may now be able to tweak it, adding or replacing a word that doesn’t originally appear in the audio file.
It may sound like some sort of strange voodoo, but to prove Adobe's vision, Jin did a live demonstration of the software. He was able to add text to a recording in exactly the same voice by simply selecting a clip of speech, opening up an edit box, and typing in new text. In the words of one attendee, he "redubbed what the speaker had actually said."
In order to achieve this level of technical wizardry, all you need is around 20 minutes of recorded speech for the algorithm to kick into gear for replication purposes. It analyzes the speech, breaks it down into phonemes, transcribes it, and creates the voice model.
The tech blog Tech Crunch says the project isn't "based on traditional speech synthesis technology, but on what Adobe calls 'voice conversion.'" They go on to report that "there’s almost no manual intervention necessary. You can always correct the auto-generated transcript to improve the synthesis, but there’s no need to set timestamps, for example. The algorithms can figure that out themselves."
Project VoCo looks like it could be a real game-changer for filmmakers.
An official statement from Adobe released earlier today details the purpose of the prototype: "When recording voiceovers, dialog, and narration, people would often like to change or insert a word or a few words due to either a mistake they made or simply because they would like to change part of the narrative. We have developed a technology called Project VoCo in which you can simply type in the word or words that you would like to change or insert into the voiceover. The algorithm does the rest and makes it sound like the original speaker said those words."
To be sure, many of Adobe's prototypes haven't ever seen the light of day. But given the popularity of podcasting, interminable problems with capturing clean audio, and the monotony of ADR, Project VoCo looks like it could be a real game-changer for filmmakers. This one may indeed hit the market relatively soon.
VoCo does have some frightening, dystopian potential—you could replicate someone's voice for the sake of any number of nefarious deeds—but as avid podcasters and filmmakers ourselves, it's safe to say we're excited by what we see.