In 2019, Google released a synthetic speech database with a very specific goal: stopping audio deepfakes.
“Malicious actors may synthesize speech to try to fool voice authentication systems,” the Google News Initiative blog reported at the time. “Perhaps equally concerning, public awareness of “deep fakes” (audio or video clips generated by deep learning models) can be exploited to manipulate trust in media.”
Ironically, also in 2019, Google introduced the Translatotron artificial intelligence (AI) system to translate speech into another language. By 2021, it was clear that deepfake voice manipulation was a serious issue for anyone relying on AI to mimic speech. Google designed the Translatotron 2 to prevent voice spoofing.
Google and other tech giants are in a dilemma. AI voice brought us Alexa and Siri; it allows users to use voice to interact with their smartphones and businesses to streamline customer service.
However, many of these same companies also launched — or planned to launch — projects that made AI a little too lifelike. Someone can use this tool for harm as easily as for good. Big tech, then, mostly sidestepped these products. The companies agreed they were too dangerous, no matter how useful.
But smaller companies are just as innovative as big tech. Now that AI and machine learning is somewhat democratized, smaller tech companies are willing to take on the risks and ethical concerns of voice tech. Like it or not, the vocal deepfake is here, easy to use and going to create serious problems.
Ethics (or lack thereof) in voice AI
Some of the largest tech companies are trying to slam the brakes on AI that can mimic live people.
“There are opportunities and harms, and our job is to maximize opportunities and minimize harms,” Tracy Pizzo Frey, ethics committee member at Google, told Reuters.
It’s a tough call to make. Voice AI can be life-changing for many people. It enabled a Rollins College valedictorian, a non-speaking autistic student, to deliver her commencement speech. Or, it could simply make our own lives simpler. You might be able to screen phone calls with an AI voice assistant. Businesses rely on AI to handle customer service calls so seamlessly that the customer may never know they are talking to a machine, not a person.
Add in attackers
But threat actors can use this, too. In 2019, thieves used “voice-mimicking software to imitate a company executive’s speech” and tricked an employee into transferring nearly a quarter-million dollars to a secret bank account in Hungary, the Washington Post reported. Despite finding the request “rather strange,” the director found the voice to be very lifelike, the article reported.
It’s a familiar lament from those who were duped by scam artists. Spoofed email addresses and phone numbers con thousands of employees. The scammers have just moved on to newer tools.
Then there is the ethics around voice cloning. Is it right to use a voice — especially of someone who has died — for commercial purposes? Who owns the rights to a voice? Is it the person themselves? The family or the estate? Or is a voice up for grabs because it isn’t intellectual property? The answer is that a voice cannot be copyrighted or trademarked. Therefore, no one owns it (not even your own voice).
Voice cloning lacks the restrictions of other copyrighted and trademarked information, making it easy for businesses to use for their personal financial gain. The lack of protections around voice also makes voice cloning for deepfakes easy and profitable for threat actors.
Cybersecurity threats around vocal deepfakes
A threat actor doesn’t need much to create voice deepfakes. The technology is readily available. A few minutes of an audio recording of someone’s voice will make a rudimentary deepfake. The more audio available, the more realistic the deepfake becomes. Executives are a particularly attractive target. Recordings from webinars and videos on corporate websites and appearances on television or at conferences are widely available.
Attackers often use deepfakes in multilayered business email compromises. Attackers send a phishing email or text message to an employee, along with a deepfake voice message on the recipient’s voice mailbox. Most often, they try to send money. The email includes the specifics about how much money the victim should send and where, while the deepfake voice message provides the authorization to complete the transaction.
Threat actors are also using voice deepfakes more often to bypass voice-activated multifactor authentication. The use of voice biometrics is expected to grow 23% by 2026, thanks to increased use within the financial industry. However, as voice biometrics grow, threat actors are taking voice fakes to new levels. Attackers are faking messages from banks asking for account numbers, BankInfoSecurity reported.
How to avoid a vocal deepfake
Like all cybersecurity systems, to avoid a voice deepfake you need to take a multilayered approach. The first step is to limit the amount of recorded audio from your organization readily available online. Webinars and other recordings should have restricted access for authenticated visitor traffic only. Discourage high-level executives from video and voice recordings on social media. The less audio available, the more difficult it is to create a near-flawless deepfake.
Employees should be encouraged to follow a zero trust model on anything that doesn’t follow normal procedures. Question everything: if a boss doesn’t normally leave a voice message to follow up on an email message, for instance, verify it before taking action. If the voice seems a little off, again, verify the message.
Reconsider using voice as a stand-alone biometric authentication. Instead, use it with other authentication measures that are more difficult to spoof.
Finally, use technology to fight technology. If threat actors are using AI to create voice deepfakes, businesses should use AI and machine learning to better detect fake vocal messages.