7 votes

A new model and dataset for long-range memory

Posted February 11, 2020 by skybrian

Tags: machine learning, text generation, deepmind

https://deepmind.com/blog/article/A_new_model_and_dataset_for_long-range_memory

Link information

This data is scraped automatically and may be incorrect.

Published: Jan 15 2020
Word count: 2956 words

2 comments

udia
February 11, 2020
Link
Looking forward to seeing these new architectures in PyTorch. Hard to keep up, as the Transformer-XL examples released by NVIDIA were only made available ~3 months ago.

Looking forward to seeing these new architectures in PyTorch. Hard to keep up, as the Transformer-XL examples released by NVIDIA were only made available ~3 months ago.

2 votes
skybrian (OP)
February 11, 2020
Link
From the article: [...]

From the article:

To support growing interest in long-range sequence models, we are releasing a new language modelling benchmark, PG-19, which is derived from books in the Project Gutenberg online library.

Books provide a rich context for the development of long-range memory models. We selected a subset of approximately 28,000 books from Project Gutenberg published before 1919. Unlike prior language modeling dataset releases, we apply very little pre-processing to the text. For example, we do not limit the vocabulary size of the data or censor numbers, to avoid the filtering of useful information.

[...]

We find the Compressive Transformer produces the largest performance gain in modelling long-context book text from the PG-19 benchmark. The model’s conditional samples can be used to write book-like extracts. [...]

The Compressive Transformer is able to produce narrative in a variety of styles, from multi-character dialogue, first-person diary entries, or third-person prose. Although the model does not have an understanding of language that’s grounded in the real world, or the events that take place in it – by capturing longer-range correlations, we see the emergence of more coherent text.

1 vote