Looking forward to seeing these new architectures in PyTorch. Hard to keep up, as the Transformer-XL examples released by NVIDIA were only made available ~3 months ago.
Looking forward to seeing these new architectures in PyTorch. Hard to keep up, as the Transformer-XL examples released by NVIDIA were only made available ~3 months ago.
To support growing interest in long-range sequence models, we are releasing a new language modelling benchmark, PG-19, which is derived from books in the Project Gutenberg online library.
Books provide a rich context for the development of long-range memory models. We selected a subset of approximately 28,000 books from Project Gutenberg published before 1919. Unlike prior language modeling dataset releases, we apply very little pre-processing to the text. For example, we do not limit the vocabulary size of the data or censor numbers, to avoid the filtering of useful information.
[...]
We find the Compressive Transformer produces the largest performance gain in modelling long-context book text from the PG-19 benchmark. The model’s conditional samples can be used to write book-like extracts. [...]
The Compressive Transformer is able to produce narrative in a variety of styles, from multi-character dialogue, first-person diary entries, or third-person prose. Although the model does not have an understanding of language that’s grounded in the real world, or the events that take place in it – by capturing longer-range correlations, we see the emergence of more coherent text.
Looking forward to seeing these new architectures in PyTorch. Hard to keep up, as the Transformer-XL examples released by NVIDIA were only made available ~3 months ago.
From the article:
[...]