Since the publication of the seminal paper Attention is All You Need, the transformer architecture become one of the most important blocks for the design of neural networks architectures. From NLP to Vision, and more recently Audio and Speech, you find them everywhere. But what are Transformers? How do they work?
The amount of material that covers the transformer architecture is also a lot. Personally, as a starting point, I really like the resources that follow. The first two give a very comprehensive but easy to understand of all the concepts from scratch (both share the same title but are from different authors):
- Transformers from Scratch, from Peter Bloem
- Transformers from Scratch, from Brandon Roher
These are two great tutorials and contain everything to know about the transformer architecture. There is also a nice survey paper:
It’s naturally written in a different style but it’s a follow-up complement to the previous tutorials. For a more application perspective, in NLP or Vision, we can also find some specific surveys, depending on the area one would like to dive in:
- The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures
- Pretrained Transformers for Text Ranking: BERT and Beyond
- Transformers in Vision: A Survey
There is much more to explore, but these are a good starting point to learn and understand the transformer architecture and some of its applications.