Chaloner’s Transformer Wiki

Tracking Transformers and BERTology.

In Progress

What you will find here:

Summaries of Transformer-based models
Summaries of methods used in Transformer models
Summaries of ablation tests
Organised lists of model computation cost
Organised lists of model size cost
Performance of models vs costs

This complements Papers with Code; we set our gaze specifically on Transformer-based models and methods, and delve deep into their ablation findings and performance-cost ratios.

Transformer Models

Complete:

Transformer
ALBERT

In Progress:

BERT
ELMo
XLNET

Planned:

DistilBERT
RoBERTa
DeBERTa
T5
GPT-1
GPT-2
GPT-3
BART

Overview Pages

Planned:

Pretraining FLOPs vs finetuning FLOPs vs model size vs Benchmark Performance
Attention Mechanisms
Pretraining Methods
Datasets
Tokenization Methods
Positional Encoding Methods
Conflicting findings from papers
Timeline