Voice Synthesis Fraud
AI-synthesised audio that convincingly replicates a target person's voice characteristics, used for impersonation in fraud, disinformation, and identity-based attacks.
Also known as: AI voice fraud, synthetic voice fraud, voice conversion fraud
Last reviewed: 1 June 2026
Voice synthesis fraud (sometimes called AI voice fraud or synthetic voice fraud) refers to audio generated using deep learning models — typically voice conversion or text-to-speech synthesis neural networks — that mimics the acoustic properties of a real person's voice. The technology was originally developed for legitimate applications including accessibility tools, dubbing, and interactive entertainment, but has been rapidly adopted by fraudsters and disinformation actors.
Unlike a deepfake video, which requires substantial data and processing power, a convincing synthesised voice can be produced in minutes using commercially available services with only a short voice sample. The resulting audio can be inserted into a live phone call in real time using voice conversion software, making the fraud instantaneous rather than pre-recorded.
Voice synthesis fraud differs from a targeted deepfake voice clone in nuance: clone implies specific targeted replication of a known individual, while voice synthesis is the broader technology category enabling it. Both share the same fraud risks. Audio forensics is an emerging discipline attempting to detect AI-generated audio through analysis of spectral features, breathing patterns, and micro-timing anomalies that human speakers produce naturally. However, forensic detection lags behind generative model quality, making behavioural and procedural defences more reliable for now.
Examples
- A fraudster uses a real-time voice conversion tool during a call to an elderly victim, speaking as the victim's adult child and requesting emergency funds be transferred to an unfamiliar account.