Transformers 101

Since the publication of the seminal paper Attention is All You Need, the transformer architecture become one of the most important blocks for the design of neural networks architectures. From NLP to Vision, and more recently Audio and Speech, you find them everywhere. But what are Transformers? How do they work?

The amount of material that covers the transformer architecture is also a lot. Personally, as a starting point, I really like the resources that follow. The first two give a very comprehensive but easy to understand of all the concepts from scratch (both share the same title but are from different authors):

These are two great tutorials and contain everything to know about the transformer architecture. There is also a nice survey paper:

It’s naturally written in a different style but it’s a follow-up complement to the previous tutorials. For a more application perspective, in NLP or Vision, we can also find some specific surveys, depending on the area one would like to dive in:

There is much more to explore, but these are a good starting point to learn and understand the transformer architecture and some of its applications.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: