Glossary

Voice Synthesis Fraud

AI-synthesised audio that convincingly replicates a target person's voice characteristics, used for impersonation in fraud, disinformation, and identity-based attacks.

Also known as: AI voice fraud, synthetic voice fraud, voice conversion fraud

Last reviewed: 1 June 2026

Voice synthesis fraud (sometimes called AI voice fraud or synthetic voice fraud) refers to audio generated using deep learning models — typically voice conversion or text-to-speech synthesis neural networks — that mimics the acoustic properties of a real person's voice. The technology was originally developed for legitimate applications including accessibility tools, dubbing, and interactive entertainment, but has been rapidly adopted by fraudsters and disinformation actors.

Unlike a deepfake video, which requires substantial data and processing power, a convincing synthesised voice can be produced in minutes using commercially available services with only a short voice sample. The resulting audio can be inserted into a live phone call in real time using voice conversion software, making the fraud instantaneous rather than pre-recorded.

Voice synthesis fraud differs from a targeted deepfake voice clone in nuance: clone implies specific targeted replication of a known individual, while voice synthesis is the broader technology category enabling it. Both share the same fraud risks. Audio forensics is an emerging discipline attempting to detect AI-generated audio through analysis of spectral features, breathing patterns, and micro-timing anomalies that human speakers produce naturally. However, forensic detection lags behind generative model quality, making behavioural and procedural defences more reliable for now.

Examples

A fraudster uses a real-time voice conversion tool during a call to an elderly victim, speaking as the victim's adult child and requesting emergency funds be transferred to an unfamiliar account.

Voice Synthesis Fraud

Examples

Sources

Voice Synthesis Fraud

Examples

Sources