7 votes

A new model and dataset for long-range memory

2 comments

  1. udia
    Link
    Looking forward to seeing these new architectures in PyTorch. Hard to keep up, as the Transformer-XL examples released by NVIDIA were only made available ~3 months ago.

    Looking forward to seeing these new architectures in PyTorch. Hard to keep up, as the Transformer-XL examples released by NVIDIA were only made available ~3 months ago.

    2 votes
  2. skybrian
    Link
    From the article: [...]

    From the article:

    To support growing interest in long-range sequence models, we are releasing a new language modelling benchmark, PG-19, which is derived from books in the Project Gutenberg online library.

    Books provide a rich context for the development of long-range memory models. We selected a subset of approximately 28,000 books from Project Gutenberg published before 1919. Unlike prior language modeling dataset releases, we apply very little pre-processing to the text. For example, we do not limit the vocabulary size of the data or censor numbers, to avoid the filtering of useful information.

    [...]

    We find the Compressive Transformer produces the largest performance gain in modelling long-context book text from the PG-19 benchmark. The model’s conditional samples can be used to write book-like extracts. [...]

    The Compressive Transformer is able to produce narrative in a variety of styles, from multi-character dialogue, first-person diary entries, or third-person prose. Although the model does not have an understanding of language that’s grounded in the real world, or the events that take place in it – by capturing longer-range correlations, we see the emergence of more coherent text.

    1 vote