How to spot audio deepfakes (2024)

It took a New Orleans street magician only 20 minutes and $1 to create audio that sounded like President Joe Biden discouraging Democrats from voting in the New Hampshire primary.

The robocall was found to be a product of artificial intelligence, and soon after, the Federal Communications Commission banned AI-generated voices in robocalls.

But identifying AI-generated audio in the wild is easier said than done.

We tested four free online tools that claim to determine whether an audio clip is AI-generated. Only one of them signaled that the Biden-like robocall was likely AI-generated. Experts told us that AI audio detection tools are lacking in accuracy and shouldn’t be trusted on their own. Nevertheless, people can employ other techniques to spot potential misinformation.

"Audio deepfakes may be more challenging to detect than image or video deepfakes," said Manjeet Rege, director of the Center for Applied Artificial Intelligence at the University of St. Thomas. "Audio lacks the full context and visual cues of video, so believable audio alone is easier to synthesize convincingly."

The challenge of identifying audio deepfakes

Many people have called a business or government agency and heard an automatic response from a synthetic voice.

But only recently did people start using the technology to create deepfakes, said Siwei Lyu, a computer science and engineering professor at the University at Buffalo.

Lyu said audio deepfakes typically fall under two types: text-to-speech and voice conversion. The biggest difference between the two, he said, is the input. Text-to-speech systems enable computers to convert text into what sounds like a spoken voice. Voice conversion, by contrast, will take a person’s voice and manipulate it so that it sounds like another person’s voice, retaining the emotion and inhalation patterns of the original speech.

Creating deepfake audio is an attractive alternative to creating deepfake video because audio is easier and cheaper to produce.

For example, online startup ElevenLabs, offers a free plan to people seeking to convert text to speech. The startup has gained prominence in the industry and was recently valued at $1.1 billion after raising $80 million in venture capital funding. Its paid plans start at $1 per month and its products also include voice cloning, which allows users to create synthetic copies of their own voices, and a tool that can help people classify whether something is AI-generated.

"The rise of audio deepfakes opens up disturbing possibilities for spreading misinformation," Rege said. Aside from making it possible for machines to impersonate public figures, politicians and celebrities, falsified audio could also be used to trick security systems that make use of voice authentication, he said. For example, a Vice reporter in February 2023 demonstrated how he was able to trick his bank’s authentication system by calling its service line and playing clips of his AI-powered cloned voice.

Rege warned that fake audio could have implications for court cases, intelligence operations and politics. Some potential scenarios:

Rege and Lyu said synthetic audio is created using "deep learning" technology that trains AI models to learn characteristics of speech based on a large dataset of diverse speakers, voices and conversations. With this information, the technology can recreate speech.

PolitiFact identified TikTok and YouTube accounts that uploaded videos that perpetuated false narratives about the 2024 election using audio that expert analysis showed was generated by AI.

Because audio is one-dimensional and more ephemeral than images and videos, Lyu said, it’s harder to determine when it’s not real; this makes it more effective at misleading people. It’s harder to review a piece of audio to check for signs of AI generation. You can pause a video or inspect an image you encounter online. But if you pick up a call, you might not realize you’re listening to AI-generated audio or get to record it. Without a digital copy, it would be hard to analyze the audio.

Detection tools fall short

With audio deepfake technology evolving quickly, the tools designed to detect them are struggling to keep up.

"Detecting audio deepfakes is an active research area, meaning that it is currently treated as an unsolved problem," said Jennifer Williams, a lecturer at the University of Southampton who specializes in audio AI safety.

Many online tools that claim to detect AI generated voices are available only with a paid subscription or upon demo request. Others ask customers to send the audio file to an email address.

We looked for free options.

V.S. Subrahmanian, a Northwestern University computer science professor, launched his own AI audio detection experiment at the Northwestern Security & AI Lab, which he leads. The group tested 14 off-the-shelf or free, publicly available audio deepfake detection tools, he said. The research is not yet publicly available, but he said the results were discouraging.

"You cannot rely on audio deepfake detectors today and I cannot recommend one for use," Subrahmanian said.

We persevered, anyway, and found three free tools: ElevenLabs’ Speech Classifier, AI or Not and PlayHT. We also tested the DeepFake-O-Meter, which was developed by the University at Buffalo Media Forensic Lab, which Lyu heads.

For our experiment, we obtained a copy of the fake Biden robocall from the New Hampshire attorney general’s office and ran it through the four tools.

In the robocall that circulated before the Jan. 23 New Hampshire primary, a Biden-like voice told Democratic voters that voting in the primary "only enables the Republicans in their quest to elect Donald Trump again." It encouraged people not to vote until November. Soon after, security software company Pindrop said it found a 99% likelihood that the audio was created using ElevenLabs — a finding that magician Paul Carpenter later confirmed, telling NBC News that it took him less than 20 minutes and $1 to make.

ElevenLabs has its own AI speech classifier — released in 2023, before the robocall circulated — that measures the likelihood of an audio clip being created using its system. We uploaded the audio clip we obtained from the New Hampshire attorney general’s office to ElevenLabs’ speech classifier. The result? It found a 2% probability — "very unlikely" — that the audio was created with ElevenLabs.

How to spot audio deepfakes (1)

(Screenshot of ElevenLabs page)

It’s unclear why ElevenLabs returned such a low result. Pindrop also experimented using ElevenLabs’ tool and said it returned an 84% probability score that the audio was created using ElevenLabs. Lyu said audio file compression and other factors can destroy signatures or features that detectors use to detect AI generation. And we do not know whether Pindrop used the same audio file. (Pindrop has its own audio deepfake detection system, available upon demo request.)

We did the same test using AI or Not, a tool American tech company Optic developed. The result said the audio clip was "likely human."

Anatoly Kvitnitsky, AI or Not’s chief executive, told Politifact that the audio sample "had a lot of noise associated with it," making its AI origins less able to be detected without linguistic experts. "AI was only confirmed when the creator of the recording admitted to it being AI," Kvitnitsky said.

How to spot audio deepfakes (2)

(Screenshot of AI or Not page)

We also tested the audio using PlayHT, but it displayed an error message every time we upload the fake Biden audio.

How to spot audio deepfakes (3)

(Screenshot of PlayHT page)

We contacted ElevenLabs and PlayHT for comment but did not hear back.

Lyu said deepfake audio detection has fewer available services than images and videos. He said this is partly because deepfake images and videos were developed earlier.

Rege said although researchers have also developed open-source tools, the tools’ accuracy remains to be seen.

"I would say no single tool is considered fully reliable yet for the general public to detect deepfake audio," Rege said. "A combined approach using multiple detection methods is what I will advise at this stage."

The University at Buffalo Media Forensic Lab’s DeepFake-O-Meter has not officially launched, Lyu said, but people need only create an account to use it for free. The DeepFake-O-Meter returned a 69.7% likelihood score of the Biden audio being AI-generated — the most accurate result among the tools we’ve tested.

How to spot audio deepfakes (4)

(Screenshot of DeepFake-O-Meter page)

What to listen for in potential audio deepfakes

Subrahmanian said audio calls requesting money, personal information, passwords, bank codes or two factor authentication codes "should be treated with extreme caution" and warned people never to give such information over the phone.

"Urgency is a key giveaway," he said, "Scammers want you (to) do things immediately, before you have time to consult others or think more deeply about a request. Don't fall for it."

Lyu and Rege said people should watch for signs of AI-generated audio, including irregular or absent breathing noises, intentional pauses and intonations, along with inconsistent room acoustics.

They also said users should seek to verify the audio’s sources and cross-check the facts.

"Be skeptical of unsolicited audio messages or recordings, especially those claiming to be from authority figures, celebrities, or people you know," Rege said.

Use common sense approaches and ask questions like who or where the call came from and whether it is supported by independent and unrelated sources, Lyu said.

In an interview with Scientific American, Hany Farid, a University of California, Berkeley, computer science professor, also stressed provenance — or basic, trustworthy facts about a piece’s origins — when analyzing audio recordings: "Where was it recorded? When was it recorded? Who recorded it? Who leaked it to the site that originally posted it?"

When it comes to legal matters, financial transactions, or important events, Rege said, people can protect themselves by insisting on verifying identities through other secure channels beyond audio or voice.

"Healthy skepticism is warranted given how realistic this emerging technology has become," Rege said.

How to spot audio deepfakes (2024)

FAQs

How to spot AI generated deepfakes? ›

Inconsistent -- or lack of -- eye blinking

For now, AI has difficulty simulating eye blinking. As a result, deepfake algorithms tend to deliver inconsistent blinking patterns or eliminate eye blinking altogether. If you pay close attention, the eyes are the window to quickly identifying an altered video.

How to counter deepfakes? ›

Researchers are developing new ways to detect deepfakes, such as AI models that spot color abnormalities. And authentication methods, like digital watermarks, can help prove that a video or image has been altered. But simply detecting deepfakes may not be enough to prevent harm.

How to check if an audio is AI-generated? ›

Real human voices often come with subtle inconsistencies, like variations in pitch and tone, and a natural background ambiance. AI-generated voices, however, might lack these natural variances, sounding too perfect or overly consistent. Pronunciation and Cadence: Listen for unusual pronunciation or robotic cadence.

Is watching deepfake illegal? ›

The distinction here is critical: while consuming deepfake content does not typically incur legal consequences for the viewer, the production and dissemination of such content without the consent of the subjects depicted can lead to legal consequences.

What is the tool to detect deepfakes? ›

Sentinel

The system determines if the media is a deepfake or not and provides a visualization of the manipulation. Sentinel's deepfake detection technology is designed to protect the integrity of digital media.

Why do deepfake detectors fail? ›

This is because the deepfake detection models typically require the input of a certain size, whereas the original image/video may be a different size. To deal with this, pre-processing is performed, which may obscure the deepfake artifacts that the detector relies on.

Can you sue for deepfakes? ›

Laws against non-consensual p*rnography are directly applicable in cases where deepfakes are used to create explicit content without the subject's consent. However, the enforcement of these laws can be complicated by jurisdictional issues and the anonymous nature of online content distribution.

Can software detect deepfakes? ›

Deepware is advanced software that uses artificial intelligence and machine learning technologies to detect and mitigate deepfakes. It identifies videos, images, and audio files and determines if they are fake or not.

How to safeguard against deepfakes? ›

Good basic security procedures are remarkably efficient at countering Deepfake. For instance, having automatic checks built into any process for disbursing funds would have stopped many Deepfake and similar frauds. You can also: Ensure employees and family know about how Deepfaking works and the challenges it can pose.

Can AI-generated music be detected? ›

A new tool targeting AI-generated audio files

The tool, simply called AI-Generated Detector, is designed to help music industry stakeholders identify 'artificial' audio files. It targets music labels, publishers, digital distributors, rights organizations, and streaming platforms.

Can voice cloning be detected? ›

A cloned voice can be detected “manually” — without using special solutions — by a few red flags: No breath is heard during the speech. Strange artifacts, distortion, and noises. Shallow background knowledge related to a targeted person.

Is it possible to deepfake audio? ›

It is indeed possible to deepfake audio or clone voices. These systems utilize machine learning algorithms to analyze vast datasets of voice recordings. Once trained, the algorithms can generate voice audio that matches the input voice's tone, pitch, and mannerisms. This process is also known as speech synthesis.

Top Articles
25 Delicious Recipes for Brown Bananas
Lemon Blueberry Cookies - Browned Butter Blondie
Craigslist San Francisco Bay
Skylar Vox Bra Size
Asian Feels Login
Comforting Nectar Bee Swarm
Jonathan Freeman : "Double homicide in Rowan County leads to arrest" - Bgrnd Search
Athletic Squad With Poles Crossword
Lowes 385
More Apt To Complain Crossword
Bubbles Hair Salon Woodbridge Va
Citymd West 146Th Urgent Care - Nyc Photos
Local Dog Boarding Kennels Near Me
Void Touched Curio
2015 Honda Fit EX-L for sale - Seattle, WA - craigslist
The Superhuman Guide to Twitter Advanced Search: 23 Hidden Ways to Use Advanced Search for Marketing and Sales
Operation Cleanup Schedule Fresno Ca
360 Tabc Answers
Water Trends Inferno Pool Cleaner
Danforth's Port Jefferson
Aaa Saugus Ma Appointment
We Discovered the Best Snow Cone Makers for Carnival-Worthy Desserts
Bekijk ons gevarieerde aanbod occasions in Oss.
Rust Belt Revival Auctions
Walgreens 8 Mile Dequindre
Kirk Franklin Mother Debra Jones Age
Skidware Project Mugetsu
Waters Funeral Home Vandalia Obituaries
Restaurants Near Calvary Cemetery
Have you seen this child? Caroline Victoria Teague
Serenity Of Lathrop - Manteca Photos
Powerball lottery winning numbers for Saturday, September 7. $112 million jackpot
Rocketpult Infinite Fuel
Frostbite Blaster
THE 10 BEST Yoga Retreats in Konstanz for September 2024
Build-A-Team: Putting together the best Cathedral basketball team
Vision Source: Premier Network of Independent Optometrists
Craigslist Florida Trucks
Craigslist En Brownsville Texas
Сталь aisi 310s российский аналог
Levi Ackerman Tattoo Ideas
Sandra Sancc
Meee Ruh
Mlb Hitting Streak Record Holder Crossword Clue
Southwind Village, Southend Village, Southwood Village, Supervision Of Alcohol Sales In Church And Village Halls
Hy-Vee, Inc. hiring Market Grille Express Assistant Department Manager in New Hope, MN | LinkedIn
Uncle Pete's Wheeling Wv Menu
Tamilblasters.wu
Ingersoll Greenwood Funeral Home Obituaries
Zalog Forum
Generator für Fantasie-Ortsnamen: Finden Sie den perfekten Namen
What Are Routing Numbers And How Do You Find Them? | MoneyTransfers.com
Latest Posts
Article information

Author: Kerri Lueilwitz

Last Updated:

Views: 5733

Rating: 4.7 / 5 (47 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Kerri Lueilwitz

Birthday: 1992-10-31

Address: Suite 878 3699 Chantelle Roads, Colebury, NC 68599

Phone: +6111989609516

Job: Chief Farming Manager

Hobby: Mycology, Stone skipping, Dowsing, Whittling, Taxidermy, Sand art, Roller skating

Introduction: My name is Kerri Lueilwitz, I am a courageous, gentle, quaint, thankful, outstanding, brave, vast person who loves writing and wants to share my knowledge and understanding with you.