February 9, 2018 By Paul Gillin 3 min read

In the four years since Amazon introduced the Echo, the popularity of speech recognition systems has exploded. One reason is that the quality of voice recognition technology has now reached parity with humans. An estimated 27 million Echo and Google Home devices have been sold, according to Computer Intelligence Research Partners (CIRP), and the Consumer Technology Association expected another 4.4 million were sold during this past holiday season.

This surge has made speech recognition a tempting new target for cybercriminals. Thanks to encryption and tunneling, voice-activated devices are believed to be reasonably secure against compromise at the software level, but what about the commands they accept? Recent research has shown that voice recognition itself can be compromised with unsettling ease.

Subverting the Human Ear

Last summer, a group of researchers at Zhejiang University published a paper describing how popular speech recognition systems, such as Apple’s Siri and Google Now, can be activated using high frequencies that are inaudible to humans but can be picked up by electronic microphones. This technique, which the researchers dubbed DolphinAttack, works even if the microphones are wired to ignore high-frequency audio because the harmonic effect produces the same sound at other frequencies.

By boosting the power of those harmonics, researchers were able to command voice-activated assistants to do things such as visit a malicious website, initiate phone calls, send fake text messages and disable wireless communications. Their brief but unsettling demonstration video shows how this is possible.

Hijacking Speech Recognition With Hidden Commands

More recently, two researchers at the University of California, Berkeley published a report that detailed how they were able to embed commands into any kind of audio that’s recognized by Mozilla’s DeepSpeech voice-to-text translation software. The authors claimed that they were able to duplicate any type of audio waveform with 99.9 percent accuracy and transcribe it as any phrase they chose at a rate of 50 characters per second with a 100 percent success rate.

The Berkeley researchers posted samples of these “audio adversarial” clips to demonstrate how they embedded the hidden phrase, “OK Google, browse to evil.com” in the spoken passage “Without the dataset the article is useless.” It’s nearly impossible to tell the difference.

They did it with music too. The samples include a four-second clip from Verdi’s “Requiem” that masks the same command. The only difference between the two clips is a series of subtle chirps that the passive listener probably wouldn’t even notice.

The technique works because of the complex way machine learning algorithms translate speech to text, which is considerably more difficult than interpreting handwriting or images. Because of the many different ways people pronounce the same sounds, speech recognition algorithms use connectionist temporal classification (CTC) to make an educated guess about how each sound translates to a letter. Researchers were able to create an audio waveform that the machine recognized by making slight changes to the input that are nearly undetectable to the human ear. In essence, they were able to cancel out the sound the machine was supposed to hear in favor of the audio they wanted it to hear.

Don’t Panic, But Use Caution

This doesn’t mean you should go home and unplug your Alexa. Both proofs of concept have significant limitations. In the case of DolphinAttack, the audio source had to be within six feet of the target device. It’s also reasonably easy for device owners to defend against hijacks by changing their wake phrases or restricting access to critical apps.

The Berkeley researchers only tested their technique on DeepSpeech, which isn’t used by any of the major voice recognition products. They had detailed knowledge of how DeepSpeech works and the benefit of a highly controlled laboratory environment. There was also quite a bit of computational power involved in refining the audio to embed the hidden commands.

Nevertheless, these academic experiments highlighted the way malicious actors can make these techniques work in the wild. The Berkeley researchers admitted as much, noting in their report that “further work will be able to produce audio adversarial examples that are effective over the air.”

These discoveries are unsettling because voice recognition is on its way to becoming ubiquitous, not just on smartphones, but also in appliances, control devices, sensors and other Internet of Things (IoT) devices. You can imagine the chaos that an attacker could cause by broadcasting hidden commands over a public address system or hijacked TV signal, or even from a boombox in a crowded subway car.

South Park” and Burger King have already provided real-world examples of how this technique could disrupt both consumers and businesses. Their stunts were in good fun, but you can bet that cybercriminals are already thinking of ways to apply them to their own malicious schemes.

Listen to the podcast: The 5 Indisputable Facts About IoT Security

More from Artificial Intelligence

Autonomous security for cloud in AWS: Harnessing the power of AI for a secure future

3 min read - As the digital world evolves, businesses increasingly rely on cloud solutions to store data, run operations and manage applications. However, with this growth comes the challenge of ensuring that cloud environments remain secure and compliant with ever-changing regulations. This is where the idea of autonomous security for cloud (ASC) comes into play.Security and compliance aren't just technical buzzwords; they are crucial for businesses of all sizes. With data breaches and cyber threats on the rise, having systems that ensure your…

Cybersecurity Awareness Month: 5 new AI skills cyber pros need

4 min read - The rapid integration of artificial intelligence (AI) across industries, including cybersecurity, has sparked a sense of urgency among professionals. As organizations increasingly adopt AI tools to bolster security defenses, cyber professionals now face a pivotal question: What new skills do I need to stay relevant?October is Cybersecurity Awareness Month, which makes it the perfect time to address this pressing issue. With AI transforming threat detection, prevention and response, what better moment to explore the essential skills professionals might require?Whether you're…

3 proven use cases for AI in preventative cybersecurity

3 min read - IBM’s Cost of a Data Breach Report 2024 highlights a ground-breaking finding: The application of AI-powered automation in prevention has saved organizations an average of $2.2 million.Enterprises have been using AI for years in detection, investigation and response. However, as attack surfaces expand, security leaders must adopt a more proactive stance.Here are three ways how AI is helping to make that possible:1. Attack surface management: Proactive defense with AIIncreased complexity and interconnectedness are a growing headache for security teams, and…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today