Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
LLM Design Patterns

You're reading from   LLM Design Patterns A Practical Guide to Building Robust and Efficient AI Systems

Arrow left icon
Product type Paperback
Published in May 2025
Publisher Packt
ISBN-13 9781836207030
Length 534 pages
Edition 1st Edition
Concepts
Arrow right icon
Author (1):
Arrow left icon
Ken Huang Ken Huang
Author Profile Icon Ken Huang
Ken Huang
Arrow right icon
View More author details
Toc

Table of Contents (38) Chapters Close

Preface 1. Part 1: Introduction and Data Preparation
2. Chapter 1: Introduction to LLM Design Patterns FREE CHAPTER 3. Chapter 2: Data Cleaning for LLM Training 4. Chapter 3: Data Augmentation 5. Chapter 4: Handling Large Datasets for LLM Training 6. Chapter 5: Data Versioning 7. Chapter 6: Dataset Annotation and Labeling 8. Part 2: Training and Optimization of Large Language Models
9. Chapter 7: Training Pipeline 10. Chapter 8: Hyperparameter Tuning 11. Chapter 9: Regularization 12. Chapter 10: Checkpointing and Recovery 13. Chapter 11: Fine-Tuning 14. Chapter 12: Model Pruning 15. Chapter 13: Quantization 16. Part 3: Evaluation and Interpretation of Large Language Models
17. Chapter 14: Evaluation Metrics 18. Chapter 15: Cross-Validation 19. Chapter 16: Interpretability 20. Chapter 17: Fairness and Bias Detection 21. Chapter 18: Adversarial Robustness 22. Chapter 19: Reinforcement Learning from Human Feedback 23. Part 4: Advanced Prompt Engineering Techniques
24. Chapter 20: Chain-of-Thought Prompting 25. Chapter 21: Tree-of-Thoughts Prompting 26. Chapter 22: Reasoning and Acting 27. Chapter 23: Reasoning WithOut Observation 28. Chapter 24: Reflection Techniques 29. Chapter 25: Automatic Multi-Step Reasoning and Tool Use 30. Part 5: Retrieval and Knowledge Integration in Large Language Models
31. Chapter 26: Retrieval-Augmented Generation 32. Chapter 27: Graph-Based RAG 33. Chapter 28: Advanced RAG 34. Chapter 29: Evaluating RAG Systems 35. Chapter 30: Agentic Patterns 36. Index 37. Other Books You May Enjoy

Scaling annotation processes for massive language datasets

For massive datasets, consider the following strategies:

  • Distributed processing: Use libraries such as Dask or PySpark for distributed annotation processing. Dask and PySpark are powerful libraries that can be used for distributed data annotation processing, enabling teams to handle large-scale annotation tasks efficiently. These libraries allow you to parallelize annotation workflows across multiple cores or even clusters of computers, significantly speeding up the process for massive datasets. With Dask, you can scale existing Python-based annotation scripts to run on distributed systems, while PySpark offers robust data processing capabilities within the Apache Spark ecosystem. Both libraries provide familiar APIs that make it easier to transition from local annotation pipelines to distributed ones, allowing annotation teams to process and manage datasets that are too large for a single machine.
  • Active learning...
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime