The following pages and posts are tagged with

TitleTypeExcerpt
(ALBERT) ALBERT: A Lite BERT for Self-supervised Learning of Language Representations Page ALBERT introduces changes to BERT to lower the number of parameters.
(BERT) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Page BERT
(ELMo) Deep Contextualized Word Representations Page Sentence Embeddings made with two layers of bidirectional LSTMs.
(The Transformer) Attention is All You Need Page The Transformer is an encoder-decoder architecture using multi-layer attention-only layers.
(XLNET) XLNET: Generalized Autoregressize Pretraining for Language Understanding Page Paper finds that Next Sentence Prediction (NSP) from [BERT] does not necessarily improve performance.