8 votes

Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer

1 comment

  1. skybrian
    From the blog post: [...] [...]

    From the blog post:

    In “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer”, we present a large-scale empirical survey to determine which transfer learning techniques work best and apply these insights at scale to create a new model that we call the Text-To-Text Transfer Transformer (T5). We also introduce a new open-source pre-training dataset, called the Colossal Clean Crawled Corpus (C4). The T5 model, pre-trained on C4, achieves state-of-the-art results on many NLP benchmarks while being flexible enough to be fine-tuned to a variety of important downstream tasks. In order for our results to be extended and reproduced, we provide the code and pre-trained models, along with an easy-to-use Colab Notebook to help get started.


    In our Colab demo and follow-up paper, we trained T5 to answer trivia questions in a more difficult "closed-book" setting, without access to any external knowledge. In other words, in order to answer a question T5 can only use knowledge stored in its parameters that it picked up during unsupervised pre-training. This can be considered a constrained form of open-domain question answering.


    T5 is surprisingly good at this task. The full 11-billion parameter model produces the exact text of the answer 50.1%, 37.4%, and 34.5% of the time on TriviaQA, WebQuestions, and Natural Questions, respectively. To put these results in perspective, the T5 team went head-to-head with the model in a pub trivia challenge and lost!