Research group OpenAI set the artificial intelligence community abuzz when it released a new paper on the latest version of its cutting-edge language generation system, GPT-3. The model was trained on a dataset more than 100 times larger than the already record-breaking amount of text that informed the previous version, GPT-2.
While OpenAI has yet to make the code behind GPT-3 publicly available as it did with GPT-2, the results of experiments outlined in the study show big improvements in the AI’s ability to generate realistic-sounding news articles and other text. The model was trained on nearly a trillion words in total, scraped from around the internet with a total of 175 billion parameters to GPT-2’s 1.5 billion. The whole training process cost about $12 million.
The announcement of GPT-2 last January led to a flurry of worrying headlines about its potential to be misused to create passable fake news or spam on a large scale. One agency even set up a fake blog created entirely by AI to demonstrate what that kind of abuse might look like.
Because of those fears, OpenAI opted to release the system itself in progressively bigger chunks, finally releasing the full-size version in the fall after researchers determined that the fake-news-pocalypse had not come to pass after all. Instead, GPT-2 spurred a host of creative projects, from an AI-based Dungeons and Dragons-style role-playing game with a cult following to a host of parody social media accounts and nitronet’s Super Bowl bot. Because of its unpredictability, it has yet to see much wide-scale commercial adoption, though various companies have begun to experiment with harnessing the tech for a chatbot.
Still, the latest GPT-3 paper once again points to threats such as “misinformation, spam, phishing, abuse of legal and governmental processes, fraudulent academic essay writing and social engineering pretexting.”
It’s not clear what the implications might be when and if GPT-3 is released, with a scale more than 100 times that of its predecessor. When its output was tested on a group of around 80 test subjects, fake news articles produced by the full-sized version were able to fool them about half the time. The system also performs highly on a number of standards used to assess the sophistication of natural language processing software, ranging from its ability to complete sentences to retrieving correct answers to basic questions.
Perhaps its most notable improvement is its versatility. Models such GPT-2 must be “fine-tuned” or trained on another smaller data set in order to nail a particular style, but nitronet’s ad concept generating version was trained on nearly 5,000 descriptions. Meanwhile, GPT-3 can master different imitations with only a few examples. Researchers found that the AI was able to write a passable poem in the style of Wallace Stevens having only been prompted with a a few paragraphs of his actual prose. In some cases, GPT-3 could outperform models that were fine-tuned after only having seen a single prompt.
This achievement is the latest breakthrough in a model of natural-language processing that has spurred a boom in research in the subfield called a transformer. Pioneered by Google in 2017, transformers allow for a wide array of language generation tasks through a base model that already understands the basic mechanics of language, thanks to hours of training on a massive training set. Some researchers think that transformers could lead to another AI boom around chatbots and text generation in the same way that a massive data set called ImageNet paved the way for the current AI boom in computer vision starting around 2012.