Chaloner’s Transformer Wiki
Tracking Transformers and BERTology.
In Progress
What you will find here:
- Summaries of Transformer-based models
- Summaries of methods used in Transformer models
- Summaries of ablation tests
- Organised lists of model computation cost
- Organised lists of model size cost
- Performance of models vs costs
This complements Papers with Code; we set our gaze specifically on Transformer-based models and methods, and delve deep into their ablation findings and performance-cost ratios.
Transformer Models
Complete:
In Progress:
Planned:
- DistilBERT
- RoBERTa
- DeBERTa
- T5
- GPT-1
- GPT-2
- GPT-3
- BART
Overview Pages
Planned:
- Pretraining FLOPs vs finetuning FLOPs vs model size vs Benchmark Performance
- Attention Mechanisms
- Pretraining Methods
- Datasets
- Tokenization Methods
- Positional Encoding Methods
- Conflicting findings from papers
- Timeline