Calculating FLOPs in Transformer model training

See Appendix A of this paper.

Tags: