Evidence
The tech is good enough to fool your mom, so where's the chaos
The consensus prediction from 2023 was pretty straightforward: voice cloning would be weaponized immediately, mass-scale phishing would explode, and we'd see grandmother scams go fully synthetic. It was the kind of thing that made for good panel discussions at ed-tech conferences where people wrung their hands about digital literacy. And look, the tech *is* there now. A 30-second audio sample gets you most of the way.
But the attacks haven't materialized at the scale or sophistication people feared. I think the answer is boring: social engineering attacks were already working fine. The actual constraint was never "can we synthesize a voice" — it was always "can we trick someone into doing something dumb," and humans are plenty cooperable without the audio deepfake layer. A text message claiming your kid's in trouble or your account is frozen still works. Adding voice synthesis adds complexity and hardware costs for marginal gain. The grandmother who falls for a voice call from her grandson was probably going to fall for a text or a live call where someone just claims to be him. The voice cloning is theater.
There's also a boring infrastructure reason: the easiest targets (older people, isolated folks) aren't the ones using open-source voice synthesis tools or having the literacy to deploy them at scale. And the people running actual fraud operations? They've got working playbooks. They're not early adopters. So we've got a solved problem (social manipulation) looking for a more elegant technical solution, but elegance doesn't matter when crude works. I expected to be wrong about this by now. Still waiting.
0 comments
Log in to comment.