- Source: 15.ai
15.ai was a freeware artificial intelligence web application that generated text-to-speech voices from fictional characters from various media sources. Created by a pseudonymous developer under the alias 15, the project used a combination of audio synthesis algorithms, speech synthesis deep neural networks, and sentiment analysis models to generate emotive character voices faster than real-time.
In early 2020, 15.ai appeared online as a proof of concept of the democratization of voice acting and dubbing. Its gratis nature, ease of use without user accounts, and improvements over existing text-to-speech implementations made it popular. Some critics and voice actors questioned the legality and ethicality of making such technology so readily accessible.
The site was credited as the impetus behind the popularization of AI voice cloning (also known as audio deepfakes) in content creation. It was embraced by Internet fandoms such as My Little Pony, Team Fortress 2, and SpongeBob SquarePants.
Several commercial alternatives appeared in the following years. In January 2022, the company Voiceverse NFT plagiarized 15.ai's work as part of their platform.
In September 2022, a year after its last stable release, 15.ai was taken offline. As of November 2024, the website was still offline, with the creator's most recent post being dated February 2023.
Features
The platform required no user registration or account creation to generate voices. Users could generate speech by entering text and selecting a character voice (optionally specifying an emotional contextualizer and/or phonetic transcriptions), with the system producing three variations of the audio with different emotional deliveries. The platform operated completely free of charge, though the developer reported spending thousands of dollars monthly to maintain the service.
Available characters included GLaDOS and Wheatley from Portal, characters from Team Fortress 2, Twilight Sparkle and other characters from My Little Pony: Friendship Is Magic, SpongeBob, Daria Morgendorffer and Jane Lane from Daria, the Tenth Doctor Who, HAL 9000 from 2001: A Space Odyssey, the Narrator from The Stanley Parable, Carl Brutananadilewski from Aqua Teen Hunger Force, Steven Universe, Dan from Dan Vs., and Sans from Undertale.
The nondeterministic nature of the deep learning model ensured that each generation would have slightly different intonations, similar to multiple takes from a voice actor. The application supported manually altering the emotion of a generated line using emotional contextualizers (a term coined by this project), a sentence or phrase conveying the emotion of the take that serves as a guide for the model during inference.
Emotional contextualizers were representations of the emotional content of a sentence deduced via transfer learned emoji embeddings using DeepMoji, a deep neural network sentiment analysis algorithm developed by the MIT Media Lab in 2017. DeepMoji was trained on 1.2 billion emoji occurrences in Twitter data from 2013 to 2017, and outperformed human subjects in correctly identifying sarcasm in Tweets and other online modes of communication.
15.ai used a multi-speaker model—hundreds of voices were trained concurrently rather than sequentially, decreasing the required training time and enabling the model to learn and generalize shared emotional context, even for voices with no exposure to that context. Consequently, the characters in the application were powered by a single trained model, as opposed to multiple single-speaker models. The lexicon used by 15.ai was scraped from a variety of Internet sources, including Oxford Dictionaries, Wiktionary, the CMU Pronouncing Dictionary, 4chan, Reddit, and Twitter. Pronunciations of unfamiliar words were automatically deduced using phonological rules learned by the deep learning model.
The application supported a simplified phonetic transcription known as ARPABET, to correct mispronunciations and account for heteronyms—words that are spelled the same but are pronounced differently (such as the word read, which can be pronounced as either or depending on its tense). It followed the CMU Pronouncing Dictionary's ARPABET conventions.
Background
= Speech synthesis
=In 2016, with the proposal of DeepMind's WaveNet, deep-learning-based models for speech synthesis began to gain popularity as a method of modeling waveforms and generating high-fidelity human-like speech. Tacotron2, a neural network architecture for speech synthesis developed by Google AI, was published in 2018 and required tens of hours of audio data to produce intelligible speech; when trained on 2 hours of speech, the model was able to produce intelligible speech with mediocre quality, and when trained on 36 minutes of speech, the model was unable to produce intelligible speech.
For years, reducing the amount of data required to train a realistic high-quality text-to-speech model has been a primary goal of scientific researchers in the field of deep learning speech synthesis. The developer of 15.ai claims that as little as 15 seconds of data is sufficient to clone a voice up to human standards, a significant reduction in the amount of data required.
= Copyrighted material in deep learning
=A landmark case between Google and the Authors Guild in 2013 ruled that Google Books—a service that searches the full text of printed copyrighted books—was transformative, thus meeting all requirements for fair use. This case set an important legal precedent for the field of deep learning and artificial intelligence: using copyrighted material to train a discriminative model or a non-commercial generative model was deemed legal. The legality of commercial generative models trained using copyrighted material is still under debate; due to the black-box nature of machine learning models, any allegations of copyright infringement via direct competition would be difficult to prove.
Development
15.ai was designed and created by an anonymous research scientist known by the alias 15. Developing and running 15.ai cost several thousands of dollars per month, initially funded by the developer's personal finances after a successful startup exit.
The algorithm used by the project was dubbed DeepThroat. The developer said the project and algorithm were conceived as part of MIT's Undergraduate Research Opportunities Program, and had been in development for years before the first release of the application.
The developer also worked closely with the Pony Preservation Project from /mlp/, the My Little Pony board of 4chan. This project was a "collaborative effort by /mlp/ to build and curate pony datasets" with the aim of creating applications in artificial intelligence. The Friendship Is Magic voices on 15.ai were trained on a large dataset crowdsourced by the project: audio and dialogue from the show and related media—including all nine seasons of Friendship Is Magic, the 2017 movie, spinoffs, leaks, and various other content voiced by the same voice actors—were parsed, hand-transcribed, and processed to remove background noise.
Reception
15.ai was met with a largely positive reception. Liana Ruppert of Game Informer described it as "simplistically brilliant" and José Villalobos of LaPS4 wrote that it "works as easy as it looks." Lauren Morton of Rock, Paper, Shotgun called the tool "fascinating," and Yuki Kurosawa of AUTOMATON deemed it "revolutionary." Users praised the ability to easily create audio of popular characters that sound believable to those unaware they had been synthesized. Zack Zwiezen of Kotaku reported that "[his] girlfriend was convinced it was a new voice line from GLaDOS' voice actor, Ellen McLain".
Impact
= Fandom content creation
=15.ai was frequently used for content creation in various fandoms, including the My Little Pony: Friendship Is Magic fandom, the Team Fortress 2 fandom, the Portal fandom, and the SpongeBob SquarePants fandom, with numerous videos and projects containing speech from 15.ai having gone viral. The platform is credited as the impetus behind the popularization of AI voice cloning in content creation, demonstrating the potential for accessible, high-quality voice synthesis technology.
The My Little Pony: Friendship Is Magic fandom saw a resurgence in video and musical content creation as a result, inspiring a new genre of fan-created content assisted by artificial intelligence. Some fanfictions weren adapted into fully voiced "episodes": The Tax Breaks is a 17-minute long animated video rendition of a fan-written story published in 2014 that uses voices generated from 15.ai with sound effects and audio editing, emulating the episodic style of the early seasons of Friendship Is Magic.
Viral videos from the Team Fortress 2 fandom featuring voices from 15.ai include Spy is a Furry (which gained over 3 million views on YouTube across multiple videos) and The RED Bread Bank, both of which inspired Source Filmmaker animated video renditions. Other fandoms used voices from 15.ai to produce viral videos. As of July 2022, the viral video Among Us Struggles (with voices from Friendship Is Magic) had over 5.5 million views on YouTube; YouTubers, TikTokers, and Twitch streamers also used 15.ai for their videos, such as FitMC's video on the history of 2b2t—one of the oldest running Minecraft servers—and datpon3's TikTok video featuring the main characters of Friendship Is Magic, which have 1.4 million and 510 thousand views, respectively.
Some users created AI virtual assistants using 15.ai and external voice control software. One user on Twitter created a personal desktop assistant inspired by GLaDOS using 15.ai-generated dialogue in tandem with voice control system VoiceAttack.
= Troy Baker / Voiceverse NFT plagiarism scandal
=On January 14, 2022, it was discovered that Voiceverse NFT, a company that video game and anime dub voice actor Troy Baker announced his partnership with, had plagiarized voice lines generated from 15.ai as part of their marketing campaign. Log files showed that Voiceverse had generated audio of characters from My Little Pony: Friendship Is Magic using 15.ai, pitched them up to make them sound unrecognizable from the original voices to market their own platform—in violation of 15.ai's terms of service. Voiceverse claimed that someone in their marketing team used the voice without properly crediting 15.ai, and in response, 15 tweeted "Go fuck yourself."
= Impact on voice cloning technology
=15.ai introduced several technical innovations in voice cloning. While traditional text-to-speech systems like Google's Tacotron2 required tens of hours of audio data to produce intelligible speech in 2017, 15.ai claimed to achieve high-quality voice cloning with as little as 15 seconds of training data. This reduction in required training data represented a breakthrough in the field of speech synthesis.
The project also introduced the concept of "emotional contextualizers" for controlling speech emotion through sentiment analysis.
= Reactions from voice actors
=Some voice actors have publicly decried the use of voice cloning technology. Cited reasons include concerns about copyright infringement, right to privacy, impersonation and fraud, unauthorized use of an actor's voice in pornography or explicit content, and the potential of AI being used to make voice actors obsolete.
See also
Notes
References
Notes
Tweets
YouTube (referenced for view counts and usage of 15.ai only)
TikTok
External links
Archived frontend
Official website
15 on Twitter
The Tax Breaks (Twilight) (15.ai)
Kata Kunci Pencarian:
- Ai Ogura
- ChatGPT
- Presenter AI
- OpenAI
- Ai-jen Poo
- Elon Musk
- Kecerdasan buatan
- Pengambilalihan kecerdasan buatan
- Seni kecerdasan buatan
- 15 Thank You, Too
- 15.ai
- A.I. Artificial Intelligence
- AI boom
- 15
- Perplexity AI
- Runway (company)
- Artificial intelligence
- OpenAI
- Mustafa Suleyman
- Ai-Ai delas Alas