Transformers 101

Since the publication of the seminal paper Attention is All You Need, the transformer architecture become one of the most important blocks for the design of neural networks architectures. From NLP to Vision, and more recently Audio and Speech, you find them everywhere. But what are Transformers? How do they work?

The amount of material that covers the transformer architecture is also a lot. Personally, as a starting point, I really like the resources that follow. The first two give a very comprehensive but easy to understand of all the concepts from scratch (both share the same title but are from different authors):

Transformers from Scratch, from Peter Bloem
Transformers from Scratch, from Brandon Roher

These are two great tutorials and contain everything to know about the transformer architecture. There is also a nice survey paper:

A Survey of Transformers

It’s naturally written in a different style but it’s a follow-up complement to the previous tutorials. For a more application perspective, in NLP or Vision, we can also find some specific surveys, depending on the area one would like to dive in:

There is much more to explore, but these are a good starting point to learn and understand the transformer architecture and some of its applications.

Share this:

Related

Leave a comment Cancel reply