In 2019, Google released a synthetic speech database with a very specific goal: stopping audio deepfakes.

“Malicious actors may synthesize speech to try to fool voice authentication systems,” the Google News Initiative blog reported at the time. “Perhaps equally concerning, public awareness of “deep fakes” (audio or video clips generated by deep learning models) can be exploited to manipulate trust in media.”

Ironically, also in 2019, Google introduced the Translatotron artificial intelligence (AI) system to translate speech into another language. By 2021, it was clear that deepfake voice manipulation was a serious issue for anyone relying on AI to mimic speech. Google designed the Translatotron 2 to prevent voice spoofing.

Two-Edged Sword

Google and other tech giants are in a dilemma. AI voice brought us Alexa and Siri; it allows users to use voice to interact with their smartphones and businesses to streamline customer service.

However, many of these same companies also launched — or planned to launch — projects that made AI a little too lifelike. Someone can use this tool for harm as easily as for good. Big tech, then, mostly sidestepped these products. The companies agreed they were too dangerous, no matter how useful.

But smaller companies are just as innovative as big tech. Now that AI and machine learning is somewhat democratized, smaller tech companies are willing to take on the risks and ethical concerns of voice tech. Like it or not, the vocal deepfake is here, easy to use and going to create serious problems.

Ethics (or Lack Thereof) in Voice AI

Some of the largest tech companies are trying to slam the brakes on AI that can mimic live people.

“There are opportunities and harms, and our job is to maximize opportunities and minimize harms,” Tracy Pizzo Frey, ethics committee member at Google, told Reuters.

It’s a tough call to make. Voice AI can be life-changing for many people. It enabled a Rollins College valedictorian, a non-speaking autistic student, to deliver her commencement speech. Or, it could simply make our own lives simpler. You might be able to screen phone calls with an AI voice assistant. Businesses rely on AI to handle customer service calls so seamlessly that the customer may never know they are talking to a machine, not a person.

Add in Attackers

But threat actors can use this, too. In 2019, thieves used “voice-mimicking software to imitate a company executive’s speech” and tricked an employee into transferring nearly a quarter-million dollars to a secret bank account in Hungary, the Washington Post reported. Despite finding the request “rather strange,” the director found the voice to be very lifelike, the article reported.

It’s a familiar lament from those who were duped by scam artists. Spoofed email addresses and phone numbers con thousands of employees. The scammers have just moved on to newer tools.

Then there is the ethics around voice cloning. Is it right to use a voice — especially of someone who has died — for commercial purposes? Who owns the rights to a voice? Is it the person themselves? The family or the estate? Or is a voice up for grabs because it isn’t intellectual property? The answer is that a voice cannot be copyrighted or trademarked. Therefore, no one owns it (not even your own voice).

Voice cloning lacks the restrictions of other copyrighted and trademarked information, making it easy for businesses to use for their personal financial gain. The lack of protections around voice also makes voice cloning for deepfakes easy and profitable for threat actors.

Cybersecurity Threats Around Vocal Deepfakes

A threat actor doesn’t need much to create voice deepfakes. The technology is readily available. A few minutes of an audio recording of someone’s voice will make a rudimentary deepfake. The more audio available, the more realistic the deepfake becomes. Executives are a particularly attractive target. Recordings from webinars and videos on corporate websites and appearances on television or at conferences are widely available.

Attackers often use deepfakes in multilayered business email compromises. Attackers send a phishing email or text message to an employee, along with a deepfake voice message on the recipient’s voice mailbox. Most often, they try to send money. The email includes the specifics about how much money the victim should send and where, while the deepfake voice message provides the authorization to complete the transaction.

Threat actors are also using voice deepfakes more often to bypass voice-activated multifactor authentication. The use of voice biometrics is expected to grow 23% by 2026, thanks to increased use within the financial industry. However, as voice biometrics grow,  threat actors are taking voice fakes to new levels. Attackers are faking messages from banks asking for account numbers, BankInfoSecurity reported.

How to Avoid a Vocal Deepfake

Like all cybersecurity systems, to avoid a voice deepfake you need to take a multilayered approach. The first step is to limit the amount of recorded audio from your organization readily available online. Webinars and other recordings should have restricted access for authenticated visitor traffic only. Discourage high-level executives from video and voice recordings on social media. The less audio available, the more difficult it is to create a near-flawless deepfake.

Employees should be encouraged to follow a zero trust model on anything that doesn’t follow normal procedures. Question everything: if a boss doesn’t normally leave a voice message to follow up on an email message, for instance, verify it before taking action. If the voice seems a little off, again, verify the message.

Reconsider using voice as a stand-alone biometric authentication. Instead, use it with other authentication measures that are more difficult to spoof.

Finally, use technology to fight technology. If threat actors are using AI to create voice deepfakes, businesses should use AI and machine learning to better detect fake vocal messages.

More from Incident Response

Tequila OS 2.0: The first forensic Linux distribution in Latin America

3 min read - Incident response teams are stretched thin, and the threats are only intensifying. But new tools are helping bridge the gap for cybersecurity pros in Latin America.IBM Security X-Force Threat Intelligence Index 2023 found that 12% of the security incidents X-force responded to were in Latin America. In comparison, 31% were in the Asia-Pacific, followed by Europe with 28%, North America with 25% and the Middle East with 4%. In the Latin American region, Brazil had 67% of incidents that X-Force…

Alert fatigue: A 911 cyber call center that never sleeps

4 min read - Imagine running a 911 call center where the switchboard is constantly lit up with incoming calls. The initial question, “What’s your emergency, please?” aims to funnel the event to the right responder for triage and assessment. Over the course of your shift, requests could range from soft-spoken “I’m having a heart attack” pleas to “Where’s my pizza?” freak-outs eating up important resources. Now add into the mix a volume of calls that burnout kicks in and important threats are missed.…

SIEM and SOAR in 2023: Key trends and new changes

4 min read - Security information and event management (SIEM) systems remain a key component of security operations centers (SOCs). Security orchestration, automation, and response (SOAR) frameworks, meanwhile, have emerged to fill the gap in these capabilities left by many SIEM systems. But as many companies have begun reaching the limits of SIEM and SOAR systems over the last few years, they have started turning to other solutions such as extended detection and response (XDR). But does this shift spell the end of SIEM…

X-Force releases detection & response framework for managed file transfer software

5 min read - How AI can help defenders scale detection guidance for enterprise software tools If we look back at mass exploitation events that shook the security industry like Log4j, Atlassian, and Microsoft Exchange when these solutions were actively being exploited by attackers, the exploits may have been associated with a different CVE, but the detection and response guidance being released by the various security vendors had many similarities (e.g., Log4shell vs. Log4j2 vs. MOVEit vs. Spring4Shell vs. Microsoft Exchange vs. ProxyShell vs.…