How does an AI voice cloning scam work?
AI voice cloning scams synthesise a convincing imitation of a known person's voice from publicly available audio, then use it to make emergency requests or authorise transactions in calls to family members or colleagues.
Last reviewed: 10 June 2026
Explanation
Modern voice synthesis tools can generate a realistic voice model from a few seconds of audio — a clip from a social media video, a voicemail message, or a public speech. The resulting synthetic voice can say anything, in real time or from a recording, and is often indistinguishable from the real person by someone who knows them well.
In the family emergency variant, the cloned voice of a child or young adult calls a parent or grandparent claiming to be stranded, hurt, or in trouble. The caller sounds exactly like the person, may reference real personal details scraped from social media, and requests urgent cash, a wire transfer, or gift card codes. A second voice — the 'police officer' or 'lawyer' — confirms the story.
In the business variant, the cloned executive voice calls a finance team member directly or is used in a video call to authorise a wire transfer. This is the deepfake CEO fraud described separately, with the addition of voice-specific synthesis rather than visual.
The technology is improving faster than detection. The reliable defence is not attempting to identify the synthetic voice but to apply a pre-agreed out-of-band verification: hang up and call back on a number you independently possess. A family 'code word' known only to members is a simple and effective safeguard.
Common red flags
- A voice call from a known person makes an urgent request out of character with how they normally communicate
- The request involves money and discourages verifying through any other channel
- The voice sounds familiar but slightly flat, lacks natural fillers, or responds with brief answers
- A third party quickly joins the call to add authority and discourage questioning
- The call comes from an unexpected number or is a voice message rather than an interactive call
What to do now
- Hang up and call the person back on their known number independently
- Establish a family or workplace 'code word' to verify identity in emergency calls
- Do not rely on caller ID or voice alone to authorise any financial request
- Report the call to your national cybercrime authority
- If money was sent, contact your bank immediately and report to fraud authorities
- Limit publicly available audio of family members on social media — especially children
Frequently asked questions
How much audio does an AI need to clone a voice?
Commercially available tools can generate convincing results from as little as three to ten seconds of audio. Longer samples improve quality but are not required.
Can voice detection software identify a synthetic voice?
Detection tools exist but are imperfect, particularly for real-time calls. Procedural verification — calling back independently — is more reliable than detection technology at this stage.
Should I remove voice content from social media?
Reducing the volume of publicly available audio reduces risk. However, voice data from past interactions (voicemails, public events) may already be accessible. The code-word protocol addresses this for family emergencies.