Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT 3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3 digit arithmetic. At the same time, we also identify some datasets where GPT-3’s few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.
From the abstract:
This paper is pretty dry for non-specialists; lots of benchmarks, but few examples. Apparently they will be releasing more examples soon.
...I mean, if you want examples, there are a ridiculous number of them here
Thanks. But that's a lot of unformatted text and I'm lazy so I'll wait for someone else to go through it and look for something interesting.
Formatted text
This website has the samples formatted one per page, so it should work on mobile: https://read-the-samples.netlify.app/