Chaloner’s Transformer Wiki

Tracking Transformers and BERTology.

In Progress

What you will find here:

  • Summaries of Transformer-based models
  • Summaries of methods used in Transformer models
  • Summaries of ablation tests
  • Organised lists of model computation cost
  • Organised lists of model size cost
  • Performance of models vs costs

This complements Papers with Code; we set our gaze specifically on Transformer-based models and methods, and delve deep into their ablation findings and performance-cost ratios.

Transformer Models

Complete:

In Progress:

Planned:

  • DistilBERT
  • RoBERTa
  • DeBERTa
  • T5
  • GPT-1
  • GPT-2
  • GPT-3
  • BART

Overview Pages

Planned:

  • Pretraining FLOPs vs finetuning FLOPs vs model size vs Benchmark Performance
  • Attention Mechanisms
  • Pretraining Methods
  • Datasets
  • Tokenization Methods
  • Positional Encoding Methods
  • Conflicting findings from papers
  • Timeline