The Transformer: The Model Behind the Modern AI Revolution
In this chapter, we will discuss the limitations of the models we saw in the previous chapter, and how a new paradigm (first attention mechanisms and then the transformer) emerged to solve these limitations. This will enable us to understand how these models are trained and why they are so powerful. We will discuss why this paradigm has been successful and why it has made it possible to solve tasks in natural language processing (NLP) that were previously impossible. We will then see the capabilities of these models in practical application.
This chapter will clarify why contemporary LLMs are inherently based on the transformer architecture.
In this chapter, we’ll be covering the following topics:
- Exploring attention and self-attention
- Introducing the transformer model
- Training a transformer
- Exploring masked language modeling
- Visualizing internal mechanisms
- Applying a transformer ...