You're reading from LLM Design Patterns A Practical Guide to Building Robust and Efficient AI Systems

Product type Paperback

Published in May 2025

Publisher Packt

ISBN-13 9781836207030

Length 534 pages

Edition 1st Edition

Concepts

GPT/LLMs

Author (1):

Ken Huang

View More author details

Table of Contents (38) Chapters

Preface

1. Part 1: Introduction and Data Preparation

2. Chapter 1: Introduction to LLM Design Patterns FREE CHAPTER

3. Chapter 2: Data Cleaning for LLM Training

4. Chapter 3: Data Augmentation

5. Chapter 4: Handling Large Datasets for LLM Training

6. Chapter 5: Data Versioning

7. Chapter 6: Dataset Annotation and Labeling

8. Part 2: Training and Optimization of Large Language Models

9. Chapter 7: Training Pipeline

10. Chapter 8: Hyperparameter Tuning

11. Chapter 9: Regularization

12. Chapter 10: Checkpointing and Recovery

13. Chapter 11: Fine-Tuning

14. Chapter 12: Model Pruning

15. Chapter 13: Quantization

16. Part 3: Evaluation and Interpretation of Large Language Models

17. Chapter 14: Evaluation Metrics

18. Chapter 15: Cross-Validation

19. Chapter 16: Interpretability

20. Chapter 17: Fairness and Bias Detection

21. Chapter 18: Adversarial Robustness

22. Chapter 19: Reinforcement Learning from Human Feedback

23. Part 4: Advanced Prompt Engineering Techniques

24. Chapter 20: Chain-of-Thought Prompting

25. Chapter 21: Tree-of-Thoughts Prompting

26. Chapter 22: Reasoning and Acting

27. Chapter 23: Reasoning WithOut Observation

28. Chapter 24: Reflection Techniques

29. Chapter 25: Automatic Multi-Step Reasoning and Tool Use

30. Part 5: Retrieval and Knowledge Integration in Large Language Models

31. Chapter 26: Retrieval-Augmented Generation

32. Chapter 27: Graph-Based RAG

33. Chapter 28: Advanced RAG

34. Chapter 29: Evaluating RAG Systems

35. Chapter 30: Agentic Patterns

36. Index

Why subscribe?

37. Other Books You May Enjoy

Combining quantization with other optimization techniques

Quantization can be combined with other optimization techniques, such as pruning and knowledge distillation, to create highly efficient models that are suitable for deployment on resource-constrained devices. By leveraging multiple methods, you can significantly reduce model size while maintaining or minimally impacting performance. This is especially useful when deploying LLMs on edge devices or mobile platforms where computational and memory resources are limited.

Pruning and quantization

One of the most effective combinations is pruning followed by quantization. First, pruning removes redundant weights from the model, reducing the number of parameters. Quantization then reduces the precision of the remaining weights, which further decreases the model size and improves inference speed. Here’s an example:

import torch
import torch.nn.utils.prune as prune
import torch.quantization as quant
# Step 1: Prune the...

The rest of the chapter is locked

You're reading from LLM Design Patterns A Practical Guide to Building Robust and Efficient AI Systems

Table of Contents (38) Chapters

Combining quantization with other optimization techniques

Pruning and quantization

Authors (1)

Personalised recommendations for you

You're reading from LLM Design Patterns A Practical Guide to Building Robust and Efficient AI Systems

Table of Contents (38) Chapters

Combining quantization with other optimization techniques

Pruning and quantization

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you