Discovering the evolution of LLMs
An LLM is a transformer (although different architectures are beginning to emerge today). In general, an LLM is defined as a model that has more than 10 billion parameters. Although this number may seem arbitrary, some properties emerge with scale. These models are designed to understand and generate human language, and over time, they have acquired the ability to generate code and more. To achieve this beyond parameter size, they are trained with a huge amount of data. Today’s LLMs are almost all trained on next-word prediction (autoregressive language modeling).
Parameter growth has been motivated in the transformer field by different aspects:
- Learnability: According to the scaling law, more parameters should lead to greater capabilities and a greater understanding of nuances and complexities in the data
- Expressiveness: The model can express more complex functions, thus increasing the ability to generalize and reducing the risk...