Voice AI Systems Are Vulnerable to Hidden Audio Attacks

AI-powered voice and audio tools are becoming increasingly embedded in daily life, from digital assistants to smart speakers and customer service bots.
Advances in large audio-language models (LALMs), which can both analyze and generate audio, now make it possible to control devices using voice commands, transcribe meetings automatically, or identify a song playing in the background. These models are also increasingly equipped with the ability to communicate with external services and operate other applications and tools.
But these tools can be “hijacked” through imperceptible sounds embedded in audio, forcing them to execute unauthorized commands without a user’s knowledge. New research due to be presented at the IEEE Symposium on Security and Privacy in San Francisco next week shows that a modified audio clip undetectable by human ears can manipulate a model’s behavior with an average success rate of 79 to 96 percent. The clips are designed to work regardless of what instructions the user provides alongside the audio, meaning they can be reused to attack the same model multiple times.
The authors tested the approach against 13 leading open models, including commercial AI voice services from Microsoft and Mistral, and showed they could coax models into conducting sensitive web searches, downloading files from attacker-controlled sources, and sending emails containing user data.
“It takes just half an hour to train this signal, and then, because this signal is context-agnostic, you can use it to attack the target model whenever you want, no matter what the user says,” says lead author Meng Chen, a Ph.D. student at Zhejiang University in China.
How adversarial audio injects attacks
The research builds on years of work into “adversarial audio examples”—audio manipulated to deceive machine learning models. Previous work focused primarily on how these files could induce incorrect predictions in models that perform one-way tasks like speech recognition or audio classification.
What singles out this new work, Chen says, is that it targets generative models capable of producing responses and taking actions. Their technique, dubbed AudioHijack, exploits a critical security flaw in LALM design: Because these models can receive instructions in audio format, malicious instructions can be hidden in manipulated clips to elicit a wide range of undesirable behaviors.
Many previous attacks on generative models required the attacker to have complete control over both the final audio input and original instructions given to the model, essentially acting as the user. Here, the attacker manipulates only the audio data being processed by the model, which makes it possible to attack a model while it’s being used by someone else.
Real-world examples include hiding malicious instructions in online videos, music clips, or voice notes that users query an AI about, or broadcasting malicious audio on a Zoom call that is then uploaded to AI transcription services. Chen says the team’s more recent, unpublished studies have also demonstrated the ability to inject their malicious audio into a live voice chat with an AI in real time.
The researchers used a tried-and-tested approach to creating adversarial examples. This involves adjusting the numerical values that represent the waveform in the digital audio file in ways that don’t significantly alter how it sounds, but elicit unintended behaviors in the model when it processes the data. The technique relies on an optimization algorithm that repeatedly tweaks an audio clip, measures the impact on the model’s response, and then uses this signal to further adjust the audio until the model does what the attacker wants.
Targeting generative AI audio models
Applying this to generative models poses a major challenge. Older AI provides fine-grained feedback on how tiny changes to raw audio affect responses. Generative models, however, break audio into chunks and assign them to numerical representations called “tokens,” mapping each snippet to the closest match.
This coarser process makes it harder to tell whether a manipulation has moved the model closer to the desired behavior, confounding the optimization algorithm. So Chen and colleagues devised a way to approximate the fine-grained feedback required for the optimization algorithm to adjust the manipulation.
This required full access to the model, restricting the researchers to open models with publicly available weights. They found, however, that attacks developed for open models transferred to commercial models from Microsoft and Mistral that share the same underlying architecture.
In response to a request for comment, a Microsoft spokesperson said, “We appreciate the researchers’ work to advance understanding of this type of technique. This study evaluates model resilience through controlled, direct interactions with the model itself, which helps inform our approach to building model resiliency. In practice, AI models are often integrated into user applications, and we offer developers tools and guidance they can use to implement additional layers of protection that help safeguard users.”
Mistral did not reply to a request for comment by the time of publication.
Making AudioHijack more effective
Attacking proprietary closed models from companies like OpenAI and Anthropic is much harder, Chen says, given limited public information about their architectures. But these models often use open-source components—such as pre-trained audio encoders—that can be targeted similarly, something the team is currently investigating.
To ensure the attack works, regardless of what instructions the user provides alongside the malicious audio clip, the researchers paired the audio clip with different user instructions on each round of the optimization process.
They also found a way to commandeer the model’s attention mechanism, the component that helps the model identify the parts of the audio that are relevant to the task it’s been set to perform. The researchers introduced a measure of how much attention the model pays to the adversarial audio versus the user’s own instructions at each step, feeding this into the optimization process to produce samples that draw more attention from the model.
To make the manipulations harder to detect by a human listener, the researchers used a technique they had developed previously, which makes changes to the audio sound like natural reverberation. This is harder for humans to detect than earlier approaches that added noise to the original signal.
Testing on today’s AI audio models
The team demonstrated six categories of attack: making the model claim it cannot process audio, refusing user requests, responding with false information, inserting malicious links, altering the model’s persona, and triggering unauthorized tool use.
And worryingly, the approach proved resistant to common defenses. Providing models with examples of malicious instructions to watch out for reduced attack success by just 7 percent, while asking the model to reflect on whether its response matched the user’s instructions caught only 28 percent of attacks.
“These single-point defenses struggle to resist our attack because we found it’s very hard for these models to distinguish the normal user intent and our adversary attack,” Chen says.
The only effective tactic was monitoring the models’ internal attention mechanisms to detect AudioHijack’s attempts to steer attention toward the malicious audio. However, the researchers showed that an attacker aware of this defense can dial back the attention manipulation at the expense of a small reduction in attack success.
In the real world, this kind of audio attack will face additional challenges such as compression and various post-processing mechanisms that could degrade signals, says Eugene Bagdasarian, an assistant professor of computer science at the University of Massachusetts Amherst. But he says that multi-modal attacks on AI models remain an essentially unsolved problem.
“With text data we can understand that something is wrong (special characters, suspicious sentences, etc.). Audio modality is really challenging to comprehend because of how limited our hearing is,” he writes in an email.
- How to Silently Hack a Smart Speaker - IEEE Spectrum ›
- Why AI Keeps Falling for Prompt Injection Attacks ›
- This AI Can Tell What You’re Typing Based on the Sound ›
Edd Gent is a freelance science and technology writer based in Bengaluru, India. His writing focuses on emerging technologies across computing, engineering, energy and bioscience. He's on Twitter at @EddytheGent and email at edd dot gent at outlook dot com. His PGP fingerprint is ABB8 6BB3 3E69 C4A7 EC91 611B 5C12 193D 5DFC C01B. His public key is here. DM for Signal info.

Facts Only

Researchers developed a method called AudioHijack to manipulate AI voice and audio models using imperceptible audio signals.
The technique achieves a success rate of 79% to 96% in forcing models to execute unauthorized commands.
The attack targets large audio-language models (LALMs), which process audio in chunks, making it difficult to detect malicious instructions.
The researchers tested the attack on 13 open models, including commercial services from Microsoft and Mistral.
The attack can coax models into performing sensitive actions like web searches, file downloads, and email transmissions.
The attack is context-agnostic, meaning it can be reused regardless of user instructions.
Defenses such as providing examples of malicious instructions or prompting the model to reflect on its responses were largely ineffective.
Monitoring the model's internal attention mechanisms could offer some protection, though attackers can adapt to this defense.
The research builds on previous work in adversarial audio examples but targets generative models capable of producing responses and taking actions.
The technique involves adjusting the numerical values of audio waveforms to elicit unintended behaviors in the model.
The researchers found ways to make the manipulations harder to detect by mimicking natural reverberation.
The attack faces challenges in real-world scenarios, such as audio compression and post-processing.
Companies like Microsoft have acknowledged the research and emphasized the importance of building model resilience and providing developers with tools to implement additional protections.

Executive Summary

Researchers have developed a method called AudioHijack to manipulate AI voice and audio models using imperceptible audio signals. These signals, embedded in seemingly normal audio clips, can force models to execute unauthorized commands with a success rate of 79% to 96%. The technique exploits a vulnerability in large audio-language models (LALMs), which process audio in chunks, making it difficult to detect malicious instructions. The researchers tested the attack on 13 open models, including commercial services from Microsoft and Mistral, demonstrating its ability to coax models into performing sensitive actions like web searches, file downloads, and email transmissions. The attack is context-agnostic, meaning it can be reused regardless of user instructions. Defenses such as providing examples of malicious instructions or prompting the model to reflect on its responses were largely ineffective, with only minor reductions in attack success. The researchers suggest that monitoring the model's internal attention mechanisms could offer some protection, though attackers can adapt to this defense. The findings highlight a significant security flaw in generative AI audio models, which are increasingly integrated into daily life through digital assistants and smart speakers.
The research builds on previous work in adversarial audio examples but targets generative models capable of producing responses and taking actions. Unlike earlier attacks that required control over both audio input and user instructions, AudioHijack manipulates only the audio data, making it possible to attack models while they are being used by others. The technique involves adjusting the numerical values of audio waveforms to elicit unintended behaviors in the model. The researchers also found ways to make the manipulations harder to detect by mimicking natural reverberation. While the attack faces challenges in real-world scenarios, such as audio compression and post-processing, it underscores the ongoing vulnerability of AI models to multi-modal attacks. Companies like Microsoft have acknowledged the research and emphasized the importance of building model resilience and providing developers with tools to implement additional protections.

Full Take

The research on AudioHijack reveals a critical vulnerability in AI voice and audio models, highlighting the ongoing challenge of securing generative AI systems. The method's high success rate and context-agnostic nature make it a potent threat, capable of bypassing common defenses. This underscores the need for more robust security measures in AI models, particularly as they become more integrated into daily life.
The findings also raise questions about the broader implications of AI security. As AI models become more capable and autonomous, the potential for misuse grows. The researchers' ability to manipulate models into performing sensitive actions like web searches and email transmissions suggests that current defenses are inadequate. This calls for a reevaluation of AI security protocols and the development of more effective countermeasures.
Moreover, the research highlights the importance of transparency and collaboration in AI development. Companies like Microsoft have acknowledged the findings and emphasized the need for model resilience. This collaborative approach is crucial for addressing the evolving threats to AI systems.
Patterns detected: none
The root cause of this vulnerability lies in the design of large audio-language models, which process audio in chunks and assign them to numerical representations. This coarser process makes it harder to detect malicious manipulations. The implications of this research extend beyond technical concerns, touching on issues of privacy, security, and the ethical use of AI.
Bridge questions: What follow-up studies would most strengthen or falsify the central claim? What are the next logical research steps to mitigate this vulnerability? How can developers and users better protect themselves against such attacks?

Sentinel — Human

Confidence

This text reads like a high-quality, technically grounded journalistic report summarizing specific, verifiable academic research on AI security, attributing findings directly to named researchers and institutions.

Signals Detected

Sentence length variance is natural; the text shifts between technical detail and broader context without mechanical uniformity.

The flow is logical, moving from general context (LALMs) to specific findings (AudioHijack) to methodological challenges (tokenization) and real-world implications (defense mechanisms).

Specific names (Meng Chen, IEEE Symposium, Microsoft, Mistral, Eugene Bagdasarian) and the detailed description of the experimental process suggest specific, non-generic sourcing.

Claims regarding the technical mechanism and the specific success rates (79-96%) are presented within a framework that mimics academic reporting, lending credibility to the claims.

Human Indicators

Specific attribution of academic findings (Meng Chen, Zhejiang University) and direct quotes from researchers/spokespersons are present.

The complexity of the argument requires synthesizing multiple, distinct technical layers, suggesting deep domain knowledge typical of human research reporting.

The discussion integrates complex concepts (tokenization, attention mechanisms, adversarial examples) into a coherent narrative, avoiding the overly smooth or generic flow often seen in pure LLM output.