The following pages and posts are tagged with
Title | Type | Excerpt |
---|---|---|
(ALBERT) ALBERT: A Lite BERT for Self-supervised Learning of Language Representations | Page | ALBERT introduces changes to BERT to lower the number of parameters. |
(BERT) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | Page | BERT |
(ELMo) Deep Contextualized Word Representations | Page | Sentence Embeddings made with two layers of bidirectional LSTMs. |
(The Transformer) Attention is All You Need | Page | The Transformer is an encoder-decoder architecture using multi-layer attention-only layers. |
(XLNET) XLNET: Generalized Autoregressize Pretraining for Language Understanding | Page | Paper finds that Next Sentence Prediction (NSP) from [BERT] does not necessarily improve performance. |