Note: The views and opinions expressed in blog/editorial posts are those of the author. They do not reflect the views or opinions of Misbar.
People are drawn to love and feelings. Most of us have seen HER film, which told the story of a heartbroken Theodore Twombly who falls in love with Samantha and features an insightful and sensitive female voice. According to Theodore, she sounds like "the girl next door." Her voice was "a lifeline to the rest of the world." However, Samantha was only an advanced operating system and AI-generated her voice. In reality, she does not exist.
Artificial intelligence has led to the imitation of humans' voices and not only their facial expressions. Technology companies have recently focused on generating unique human voices using artificial intelligence to imitate mimic human speeches, focusing on feelings.
These AI-generated voices can be more trustworthy than real-life human voices. We all remember the viral deep fake video of Obama cursing President Donald Trump. The artist has revealed that the clip was fabricated using emerging video-editing technology. The video has brought attention and sparked a serious discussion about the dangers of a controversial video-generating technology that is considered "the future of fake news."
The video, originally published by BuzzFeed, revealed that Obama did not say any of the words we heard while watching the video. The voice was actually by Get Out director Jordan Peele whose spelling and mouth were digitally inserted into an original video of the former president.
Recently, the AI-Softwares have reached another level by generating audio deepfakes that could “tease or flirt with someone.” It is not related only to visual materials anymore. According to websites, an AI voice startup called Sonantic has created a synthetic voice that can express teasing and flirtation. The startup has worked on incorporating the non-speech sounds, breath tiny scoffs, and half-hidden chuckles to generate these voices.
The AI startup has tried to imitate a real-life person’s voice, using the software interface that lets users type out the speech they want to synthesize.
It is also crucial to specify the mood of the delivery and select the proper voice from a list of AI voices, most of which belong to real human actors. The software also allows the user to determine “emotional choices for delivery,” including anger, fear, happiness, and other emotions to attract the audience’s attention and to make the voice trustworthy.
AI voice synthesis can help in entertainment product making. But can be used to spread fake news as well. Artificial Intelligence allows the manipulation of sounds and vocals. It can generate non existed human voices using emotional effects. And this is what makes them persuasive.
The audio material used to be rarely discussed among deep fake subjects. The audio part was a tool for validating the authenticity of a video. We can often say that the footage is fake or generated if the audio clip does not match the lips movements or the facial expressions. But today, the authenticity of the audio is also questioned.
Audio deep fake seems to be a new way to spread fake news, especially with this number of vocals circulating that most fact-checkers could face difficulties while trying to debunk. NGOs and media outlets have often discussed the perils of deep fake in politics. With the rise of AI uses, the deepfakes are now invading the human’s emotional life with generated vocals that can carry all sorts of feelings people are chasing. The audio deep fake can be described as “the future of fake news” as there are no developed and available tools to identify the AI-generated vocals.