How does an AI voice cloning scam work?

AI voice cloning scams synthesise a convincing imitation of a known person's voice from publicly available audio, then use it to make emergency requests or authorise transactions in calls to family members or colleagues.

Last reviewed: 10 June 2026

Explanation

Modern voice synthesis tools can generate a realistic voice model from a few seconds of audio — a clip from a social media video, a voicemail message, or a public speech. The resulting synthetic voice can say anything, in real time or from a recording, and is often indistinguishable from the real person by someone who knows them well.

In the family emergency variant, the cloned voice of a child or young adult calls a parent or grandparent claiming to be stranded, hurt, or in trouble. The caller sounds exactly like the person, may reference real personal details scraped from social media, and requests urgent cash, a wire transfer, or gift card codes. A second voice — the 'police officer' or 'lawyer' — confirms the story.

In the business variant, the cloned executive voice calls a finance team member directly or is used in a video call to authorise a wire transfer. This is the deepfake CEO fraud described separately, with the addition of voice-specific synthesis rather than visual.

The technology is improving faster than detection. The reliable defence is not attempting to identify the synthetic voice but to apply a pre-agreed out-of-band verification: hang up and call back on a number you independently possess. A family 'code word' known only to members is a simple and effective safeguard.

Common red flags

A voice call from a known person makes an urgent request out of character with how they normally communicate
The request involves money and discourages verifying through any other channel
The voice sounds familiar but slightly flat, lacks natural fillers, or responds with brief answers
A third party quickly joins the call to add authority and discourage questioning
The call comes from an unexpected number or is a voice message rather than an interactive call

What to do now

Hang up and call the person back on their known number independently
Establish a family or workplace 'code word' to verify identity in emergency calls
Do not rely on caller ID or voice alone to authorise any financial request
Report the call to your national cybercrime authority
If money was sent, contact your bank immediately and report to fraud authorities
Limit publicly available audio of family members on social media — especially children

Frequently asked questions

How much audio does an AI need to clone a voice?

Commercially available tools can generate convincing results from as little as three to ten seconds of audio. Longer samples improve quality but are not required.

Can voice detection software identify a synthetic voice?

Detection tools exist but are imperfect, particularly for real-time calls. Procedural verification — calling back independently — is more reliable than detection technology at this stage.

Should I remove voice content from social media?

Reducing the volume of publicly available audio reduces risk. However, voice data from past interactions (voicemails, public events) may already be accessible. The code-word protocol addresses this for family emergencies.

Sources

Explanation

Common red flags

A voice call from a known person makes an urgent request out of character with how they normally communicate

The request involves money and discourages verifying through any other channel

The voice sounds familiar but slightly flat, lacks natural fillers, or responds with brief answers

A third party quickly joins the call to add authority and discourage questioning

The call comes from an unexpected number or is a voice message rather than an interactive call

What to do now

Hang up and call the person back on their known number independently

Establish a family or workplace 'code word' to verify identity in emergency calls

Do not rely on caller ID or voice alone to authorise any financial request

Report the call to your national cybercrime authority

If money was sent, contact your bank immediately and report to fraud authorities

Limit publicly available audio of family members on social media — especially children

Frequently asked questions

How much audio does an AI need to clone a voice?

Commercially available tools can generate convincing results from as little as three to ten seconds of audio. Longer samples improve quality but are not required.

Can voice detection software identify a synthetic voice?

Detection tools exist but are imperfect, particularly for real-time calls. Procedural verification — calling back independently — is more reliable than detection technology at this stage.

Should I remove voice content from social media?

How does an AI voice cloning scam work?

Explanation

Common red flags

What to do now

Frequently asked questions

Sources

Related pages

How does an AI voice cloning scam work?

Explanation

Common red flags

What to do now

Frequently asked questions

Sources

Related pages