Interesting papers

A fairly random set of papers that I found interesting. Updated weekly. Filter by subject, click for details

Generative DL

NL Generation

Misc

2017

September

Language as a Latent Variable: Discrete Generative Models for Sentence Compression ^link

Yishu Miao¹, Phil Blunsom²

¹University of Oxford, ²Google Deepmind

Hypothesis

A variational auto encoder can be used for inference of a generative model where the latent variable is a language itself.

Interesting methods

Generative auto encoding sentence compression model (ASC)
Other generative methods use continuous latent variables, this work uses discrete latent variables (words)
For autoencoding, rather than embedding inputs as points in a vector space, they describe them as explicit natural language sentences.
Discrete variational auto encoder is a natural fit for sentence compression.
Because it is discrete, they cannot use SGD. Instead, they use the REINFORCE algorithm to mitigate the problem of high variance during sampling-based variational inference.
In early stages it is difficult to generate reasonable compression samples possible words to be sampled from)
They use pointer networks to construct the variational distribution to combat this (which results in limiting the latent space to sequences appearing only in the source sentence -- the size of the softmax would be the words in the input sentence instead of |V|
Forced attention sentence compression (FSC)
FSC model shares the pointer network of the ASC model and combines a softmax output layer over the whole vocab. It can switch selecting sentences from the source or generating it from the background distribution.

Details

ASC consists of four recurrent networks: encoder, compressor, decoder, and language model:
Compression model: inference network
Reconstruction model: is a generative network that reconstructs the source sentences s based on the latent compressions
As the prior distribution a language model p(c) is used to regularize the latent compressions

Results

Gigaword (sentence compression/summarization task)
Around 1 point imporvement of ROUGE-1 and ROUGE-L over Nallapati 2016 work. In ROUGE-2 they are comparable.