Architecture pages | Transformer Model Summaries

The following pages and posts are tagged with

Title	Type	Excerpt
(ALBERT) ALBERT: A Lite BERT for Self-supervised Learning of Language Representations	Page	ALBERT introduces changes to BERT to lower the number of parameters.
(BERT) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	Page	BERT
(ELMo) Deep Contextualized Word Representations	Page	Sentence Embeddings made with two layers of bidirectional LSTMs.
(The Transformer) Attention is All You Need	Page	The Transformer is an encoder-decoder architecture using multi-layer attention-only layers.
(XLNET) XLNET: Generalized Autoregressize Pretraining for Language Understanding	Page	Paper finds that Next Sentence Prediction (NSP) from [BERT] does not necessarily improve performance.