A fairly random set of papers that I found interesting. Updated weekly. Filter by subject, click for details
Generative DL
NL Generation
Misc
2017
September
Language as a Latent Variable: Discrete Generative Models for Sentence Compression
link
Yishu Miao1, Phil Blunsom2
1University of Oxford, 2Google Deepmind
Hypothesis
- A variational auto encoder can be used for inference of a generative model where the latent variable is a language itself.
Interesting methods
- Generative auto encoding sentence compression model (ASC)
- Other generative methods use continuous latent variables, this work uses discrete latent variables (words)
- For autoencoding, rather than embedding inputs as points in a vector space, they describe them as explicit natural language sentences.
- Discrete variational auto encoder is a natural fit for sentence compression.
- Because it is discrete, they cannot use SGD. Instead, they use the REINFORCE algorithm to mitigate the problem of high variance during sampling-based variational inference.
- In early stages it is difficult to generate reasonable compression samples possible words to be sampled from)
- They use pointer networks to construct the variational distribution to combat this (which results in limiting the latent space to sequences appearing only in the source sentence -- the size of the softmax would be the words in the input sentence instead of |V|
- Forced attention sentence compression (FSC)
- FSC model shares the pointer network of the ASC model and combines a softmax output layer over the whole vocab. It can switch selecting sentences from the source or generating it from the background distribution.
Details
- ASC consists of four recurrent networks: encoder, compressor, decoder, and language model:
- Compression model: inference network
- Reconstruction model: is a generative network that reconstructs the source sentences s based on the latent compressions
- As the prior distribution a language model p(c) is used to regularize the latent compressions
Results
- Gigaword (sentence compression/summarization task)
- Around 1 point imporvement of ROUGE-1 and ROUGE-L over Nallapati 2016 work. In ROUGE-2 they are comparable.