Microsoft recently introduced a groundbreaking AI tool called "VASA-1," which seamlessly merges a single static face image with a speech audio clip. This innovative technology generates a lifelike video where the face appears to be speaking, utilizing advanced artificial intelligence.
Microsoft indicated that the tool is capable of processing various types of images and audio inputs, generating hyper-realistic talking face video with precise lip-audio sync, lifelike facial behavior, and naturalistic head movements, generated in real-time.
The researchers at Microsoft have confirmed that the tool has been trained using AI-generated images akin to those produced by DALL·E-3. This highlights the tool's ability to create videos featuring entirely fictional virtual characters, following the creation of their images through AI generation tools.
Concerns Regarding the Tool’s Potential To Fuel the Proliferation of Misinformation
While Microsoft has clarified that the current stage and the showcased models are merely part of a research demonstration without any planned product or API release, the mere announcement of the tool's capabilities has sparked significant concerns. There are worries about its potential to create videos with meticulous details, potentially fueling the production of fabricated content and the dissemination of misinformation online, particularly during this year's election periods worldwide.
The company has reiterated its commitment to developing responsible AI aimed at advancing human well-being. It emphasized that the launch of the tool is not intended to facilitate the creation of misleading or harmful content. However, it acknowledged the potential for misuse, such as malicious impersonation and deceptive activities, highlighting the importance of using such technology responsibly.
Microsoft's failure to mention mechanisms for preventing misuse of the tool or implementing safeguards against misinformation, such as incorporating prominent watermarks in the generated videos, has intensified apprehensions surrounding the new technology. This is particularly concerning given Microsoft's prior oversight of safety and privacy issues with the AI generator DALL·E-3. The Guardian reported that "the company’s AI image generator lacks basic safeguards against creating violent and sexualized images."
Potential Risks Associated With the Use of the “VASA-1” Tool
Microsoft's new tool opens up the possibility of misuse, particularly in terms of impersonation. This tool could be utilized to exploit images without individuals' consent, leading to deceptive practices online. For instance, it could generate fake videos that convincingly mimic trusted individuals, potentially deceiving people by presenting a false reality.
Moreover, the tool could also be leveraged to create videos that promote particular narratives or disseminate misleading information, particularly during sensitive periods like elections.
During the Subcommittee meeting on April 16, U.S. Senator Richard Blumenthal, Chair of the Senate Judiciary Subcommittee on Privacy, Technology, and the Law, underscored the grave threats posed by AI, particularly through deepfake technologies. He expressed significant concerns about the potential impact of AI on elections, stating, "Deepfake images and videos are disturbingly easy for anyone to create." He highlighted the risks associated with these tools, emphasizing their potential to undermine the integrity of elections.
The U.S. committee meeting is significant in light of recent events, including the use of deepfakes to impersonate U.S. President Joe Biden during the primary elections in New Hampshire in January of last year.
To demonstrate the realism of deepfakes created through artificial intelligence, Blumenthal ran an experiment during the meeting. He asked attendees to identify which recording of his voice was AI-generated.
Blumenthal emphasized that generative AI poses a significant threat to democracy and stressed the urgent need for actions to be taken in the U.S. Congress to address and mitigate the spread of disinformation and deception facilitated by these technologies.
Challenges Posed by the ‘VASA-1’ Tool for Fact-Checkers
While fact-checkers can often detect AI-generated content, including images and videos, particularly when involving well-known individuals, by comparing them to authentic footage for discrepancies in lip movements, facial expressions, body gestures, and scrutinizing the voice, the emergence of tools like "VASA-1" presents a new challenge in the realm of fact-checking. Microsoft's detailed description of the tool, coupled with the deepfake's capability to create videos with remarkably accurate and naturalistic facial expressions, complicates the detection process, especially when the fake videos involve public figures, employees, activists, or ordinary citizens.
An illustrative example of the precision of the "VASA-1" tool can be seen in its manipulation of the Mona Lisa image. Microsoft showcased a model where the iconic painting appears to sing a portion of a rap song, demonstrating the tool's ability to generate highly realistic and nuanced movements.
In response to the potential impact of tools like "VASA-1," software engineer and writer Gergely Orosz voiced his dissent against their launch. He expressed concerns about how such tools could facilitate the creation of misleading or harmful content about real persons.
Generative AI in the 2024 Elections
Researchers in the field of misinformation are increasingly concerned about the potential misuse of AI-powered tools to generate images, audio clips, and videos, creating misleading content in a year marked by elections in multiple countries around the world. Philip Mai, a senior researcher at Social Media Lab, Ted Rogers School of Management, Toronto Metropolitan University, and co-founder of the International Conference on Social Media & Society, commented on Microsoft's new tool, stating that with just one photo and one piece of audio, Microsoft's new AI model Vasa-1 can create a human deepfake.
The researcher further added, “It seems a bit risky to be releasing an AI tool this powerful in this year of elections with half the world going to the polls.”
As the U.S. presidential elections in 2024 draw near, generative AI emerges as a significant threat to the integrity of elections. This concern is particularly heightened after revelations that a political consultant, Steve Kramer, working for a rival Democratic candidate to Biden, confessed to orchestrating the impersonation of Joe Biden. This was done through a fabricated automated call generated using AI, which was then distributed to voters with the intention of misleading them.
In his statement to U.S. media outlets, Kramer disclosed that he was able to effortlessly clone Biden's voice at a mere cost of $150. He achieved this by utilizing readily available AI software online to generate an audio file that replicated Biden's voice while reading scripted text.
Microsoft's demonstration materials vividly illustrate the tool's capabilities, showcasing AI's capacity to animate faces to sing and speak in various languages. Moreover, the tool exhibits proficiency in seamlessly integrating external image inputs and audio recordings. The resulting models are remarkably realistic, posing a challenge in differentiating them from authentic content unless we know they are fake.
The proliferation and diverse capabilities of AI tools present a challenge to fact-checkers, as their integration and potential misuse contribute to the spread of misinformation. This complexity complicates efforts to combat and debunk false information, ultimately impacting the integrity of the information space.
Additionally, Microsoft's DALL-E 3 tool stands out as one of the AI tools posing a significant challenge for fact-checkers in their battle against misinformation and in distinguishing real from fake images. This tool has the ability to generate distinct images based solely on descriptive texts outlining the subject, style, frame, and characteristics of the image.
OpenAI's Voice Engine, released last March, poses another challenge for fact-checkers. This tool can clone voices with just a 15-second audio sample, accurately reproducing the voice tone on any given text input. Furthermore, it can do so in the speaker's language or even in a different language altogether.
The tool excels at generating voices that sound natural and closely mirror the provided voice tone. Additionally, it can produce voices with varying emotional tones, enhancing its versatility and potential for misuse in creating deceptive content.
Sora AI Video Generator
In a previous report, Misbar elucidated the operational mechanism of the Sora tool and its role in spreading misinformation. OpenAI officially announced Sora on February 17, emphasizing its impact on amplifying the spread of fake news online. Sora facilitates the creation of visual images that bolster misleading claims, potentially undermining efforts to contain misinformation.
Read More
A Microsoft Report Warns of China’s Ability To Interfere in Elections Using AI
How Has Israel’s Use of AI Impacted the Lives and Narratives of Palestinians?