SlideShare a Scribd company logo
Building an 

Multimodal Knowledge
Assistant
Jerry Liu

September 23, 2024
LlamaIndex:

Build Production LLM Apps
over Enterprise Data
LlamaIndex
LlamaIndex helps any developer
build context-augmented LLM
apps from prototype to
production.
Open-Source: Leading developer toolkit for
building production LLM apps over data.

Docs: https://docs.llamaindex.ai/

Repo: https://github.com/run-llama/llama_index

LlamaCloud: A centralized knowledge interface
for your production LLM application.


Link: https://cloud.llamaindex.ai/
Evals Agents
Vectors Semantic
Search
LLMs Chat
Raw Data Q&A
Embedding Structured
Extraction
Building a Knowledge Assistant
LlamaIndex
Goal: Build an interface that can take in any
task as input and give back an output.


Input forms: simple questions, complex
questions, research tasks


Output forms: short answer, structured
output, research report

Human:
<Question>
Answer:

Sources:...
Knowledge

Base
Human:...
Human:...
Agent:...
Agent:...
Knowledge Assistant with Basic RAG
DataProcessingandIndexing BasicRetrievalandPrompting
LlamaIndex
⚠️Naive data processing, primitive retrieval
interface

⚠️Poor query understanding/planning

⚠️No function calling or tool use

⚠️Stateless, no memory
BasicTextSplitting Top-K=5

SimpleQAPrompt
Index Response
Data
Can we do more?
LlamaIndex
There’s many questions/tasks that naive
RAG can’t give an answer to
Hallucinations

Limited time savings

Limited decision-making enhancement
How do we aim to build
a production-ready
knowledge assistant?
A Better Knowledge Assistant
LlamaIndex
High-quality Multimodal RA
Complex output generatio
Agentic reasoning over complex input
Towards a scalable, full-stack application
Advanced Data and Retrieval
Data Index
Agent Response
Data Processing
A Better Knowledge Assistant
LlamaIndex
High-quality Multimodal RA
Complex output generatio
Agentic reasoning over complex input
Towards a scalable, full-stack application
Action-Taking
Report
Generation
Data Analysis
A Better Knowledge Assistant
LlamaIndex
High-quality Multimodal RA
Complex output generatio
Agentic reasoning over complex input
Towards a scalable, full-stack application
Tool Use
Query Planning
Memory
Reflection
Tool x
Tool x
Tool ds
Tool ds
Response
Agent
Other Tools
Other Tools
Advanced RAG and
Retrieval Tool
A Better Knowledge Assistant
LlamaIndex
High-quality Multimodal RA
Complex output generatio
Agentic reasoning over complex input
Towards a scalable, full-stack application
User Message Queue
Agent 1
Agent 2
Agent 3
Orchestrator
Control Plane
Human-in-the-loop
Service
Metadata
Decides what happens
next
Setting up Multimodal RAG
Any LLM App is only as
Good as your Data
LlamaIndex
Garbage in = garbage out


Good data quality is a necessary
component of any production LLM app.
RawData DataProcessing CleanData
ProductionLLMApps
Structured Extraction
Semantic Search
Chat
Agents
Q&A
ETL for LLM
Parsin
Chunkin
Indexing
Case Study: Complex Documents
LlamaIndex
A lot of documents can be classified
as complex:
Embedded Tables, Charts, Image
Irregular Layout
Headers/Footers


Users want to ask research questions
over this data
Simple pointed question
Multi-document comparison
Research tasks


Building a production-ready
knowledge assistants over this
complex data is challenging.
Knowledge-IntensiveLLMApplications
Data FoundationModels
Developers
Sales Dev Legal Finance
An LLM-Native Document Parser
LlamaIndex
An ideal GenAI-native parser can structure complex
document data for any downstream use case.



Requirement
Parse tables accurately into text and semi-
structured representation
Parse text into semantically coherent chunk
Extract visual elements (images/diagrams/charts)
into structured formats and return image chunks
Automated metadata extraction


Non-Requirement
Extract detailed JSONs for every elemen
Extract bounding boxes
PDF
Node Node
TextChunk Tables TextChunk Diagrams
Node Node
LlamaParse
“As an AI Applied Data Scientist who was granted one of the
first ML patents in the U.S., and who is building cutting-edge AI
capabilities at one of the world’s largest Private Equity Funds, I
can confidently say that LlamaParse from LlamaIndex is
currently the best technology I have seen for parsing complex
document structures for Enterprise RAG pipelines. Its ability to
preserve nested tables, extract challenging spatial layouts, and
images is key to maintaining data integrity in advanced RAG and
agentic model building.”
Dean Barr, Applied AI Lead at Carlyle
LlamaParse

LlamaIndex
Advanced document parser specifically for
reducing LLM hallucinations
20k+
unique users
25M+
pages processed
Use Cases
LlamaIndex
Multimodal RAG Annual Reports (Tables) Excel Sheets Forms
Advanced Parsing + Advanced Indexing
LlamaIndex
You can combine parsing with hierarchical
indexing and retrieval to model
heterogeneous unstructured/tabular/
multimodal data within a document
Parse documents into elements: text
chunks, tables, images, and more
For each element, extract one or more
text representations that can be
indexed
Do recursive retrieval
PDF
Node Node
Text Chunk Tables Text Chunk Diagrams
Node Node
Multimodal RAG Pipeline
LlamaIndex
Indexin
Parse document into text and image
chunks with LlamaPars
Link each text chunk to image chunk
through metadat
Embed and index text chunks


A true multimodal RAG pipeline stores both text and image chunks for use within a multi-modal LLM
Retrieva
Retrieve text chunks by text
embedding
Feed in both text and image to
multimodal LLM during synthesis.
Multimodal RAG Pipeline
LlamaIndex
Let’s run through a demo example of
building multimodal RAG over a complex
slide deck! 


The end result is you’re able to ask
questions over visual data in the
document.


https://github.com/run-llama/
llama_parse/blob/main/examples/
multimodal/
multimodal_rag_slide_deck.ipynb
LlamaCloud: An Enterprise RAG Platform
LlamaIndex
A production-ready RAG platform that allows developers to
easily connect their unstructured data sources to LLM agent
systems.


Instant Time-to-Value for building knowledge assistant
Out-of-the-box advanced RAG capabilitie
Free up developer time to rapidly iterate on higher-
level agent use cases

State-of-the-Performance leads to increased
satisfaction and reduced compliance risk

Reduced maintenance cost once application is deployed

Enterprise-ready security like access controls



Signup: https://cloud.llamaindex.ai/ Data
Ingestion Indexing Retrieval
LlamaCloud
E2E Multimodal RAG Capabilities
LlamaIndex
Setup multimodal indexing and retrieval in minutes


Signup here: https://cloud.llamaindex.ai/
LlamaIndex
Report Generation
Automating Decision Making
LlamaIndex
Action-Taking
Agent
Report
Generation
Data Analysis
Agents should have the capability to not only generate chatbot responses,
but als
Produce knowledge wor
Take actions


Action-taking and Output Generation potentially lead to much greater ROI
in terms of time savings and capability improvement


Solution : Structured Outputs and Function Calling
Multimodal Report Generation
LlamaIndex
Generate interleaving text-and-image responses with the help
of structured outputs.


https://github.com/run-llama/llama_parse/blob/main/examples/
multimodal/multimodal_report_generation.ipynb
Output Schema
class TextBlock(BaseModel):

text: str


class ImageBlock(BaseModel):

file_path: str 


class ReportOutput(BaseModel):

blocks: ListBlock | ImageBlock]
Agentic Reasoning over Complex Inputs
Complex Inputs
LlamaIndex
Naive RAG works well for pointed questions, but fails on more complex tasks.


Summarization Questions: “Give me a summary of the entire <company> 10K annual report” 

Comparison Questions: “Compare the open-source contributions of candidate A and candidate B”

Multi-part Questions: “Tell me about the pro-X arguments in article A, and tell me about the pro-Y
arguments in article B, make a table based on our internal style guide, then generate your own
conclusion based on these facts.”

Research Tasks: “I want to create a research survey on current supervised fine-tuning techniques.
Can you help?”
Tool Use
Routing
One-Shot Query
Planning
ReAct
Conversation
Memory
Dynamic Planning +
Execution
From Simple to Advanced Agents
LlamaIndex
Simple

Lower Cost

Lower Latency
Advanced

Higher Cost

Higher Latency
Agent Ingredients Agent Ingredients
Agentic RAG
LlamaIndex
Every data interface is a tool


Use agent reasoning loops (sequential,
DAG, tree) to tackle complex tasks


End Result: Build personalized QA systems
capable of handling complex questions!
Tool Use
Query Planning
Memory
Reflection
Tool x
Tool x
Tool ds
Tool ds
Response
Agent
Other Tools
Other Tools
Advanced RAG and
Retrieval Tool
Unconstrainedvs.ConstrainedFlows
LlamaIndex
MoreConstrainedFlows

More Reliable

Less Expressive
Router Response
RAGTool
Reflection
Task
SQLTool
Unconstrainedvs.ConstrainedFlows
LlamaIndex
UnconstrainedFlows

Less Reliable

More Expressive
Task Agent
Orchestrator RAGTool
SQLTool
WebTool
Response
Agentic Orchestration Foundations
LlamaIndex
Router Response
RAG Tool
Reflection
Task
SQL Tool
LlamaIndex Workflows
We believe an agent orchestration framework should have the following properties

Event-Driven: Model each step as listening to input events and emitting output events

Composable: Piece together granular workflows into higher-level workflows

Flexible: Write logic through LLM calls or through plain Python 

Code-first: Express orchestration logic through code. Easy to read and easy to extend.

Debuggable and Observable: Step through and observe states 

Easily Deployable to Production: Translate notebook code into services that run in production.
def generate_response(context, query):

prompt = f"Question: {query}nnContext: {context}nnAnswer:"

response = llm.complete(prompt)

return response.text


# Define the pipeline

pipeline = QueryPipeline()

pipeline.add_modules({

"input": InputComponent(),

"retriever": retriever,

"reranker": reranker,

"response_generator": FnComponent(fn=generate_response)

})


# Define the flow

pipeline.add_link("input", "retriever")

pipeline.add_link("retriever", "reranker")

pipeline.add_link("input", "response_generator", dest_key="query")

pipeline.add_link(

"reranker", "response_generator", dest_key="context"

)


# Run the pipeline

response = pipeline.run("What is the capital of France?")

print(response)
Compared to Graph-based Approaches
Graph-based approaches (e.g. our deprecated Query Pipelines) can be cumbersome and non-Pythonic for complex agentic workflows.
Orchestration logic baked into edge
More lines of code, less readabl
Cumbersome to dynamically generate workflows based on runtime
conditions
def generate_response(context, query):

prompt = f"Question: {query}nnContext: {context}nnAnswer:"

response = llm.complete(prompt)

return response.text


# Define the pipeline

pipeline = QueryPipeline()

pipeline.add_modules({

"input": InputComponent(),

"retriever": retriever,

"reranker": reranker,

"response_generator": FnComponent(fn=generate_response)

})


# Define the flow

pipeline.add_link("input", "retriever")

pipeline.add_link("retriever", "reranker")

pipeline.add_link("input", "response_generator", dest_key="query")

pipeline.add_link(

"reranker", "response_generator", dest_key="context"

)


# Run the pipeline

response = pipeline.run("What is the capital of France?")

print(response)
Compared to Graph-based Approaches
Graph-based approaches (e.g. our deprecated Query Pipelines) can be cumbersome and non-Pythonic for complex agentic workflows.

Compared to query pipelines, our workflows are more readable, and easier to maintain/scale.
class RAGWorkflow(Workflow):

def __init__(self):

...


@step

async def retrieve(self, query: str):

return self.retriever.retrieve(query)


@step

async def rerank(self, retrieved_nodes):

return self.reranker.postprocess_nodes(retrieved_nodes)


@step

async def generate_response(self, query: str, context):

prompt = f"Question: {query}nnContext:
{context}nnAnswer:"

response = await self.llm.complete(prompt)

return response.text


@step

async def run_workflow(self, query: str):

retrieved_nodes = await self.retrieve(query)

reranked_nodes = await self.rerank(retrieved_nodes)

response = await self.generate_response(query,
[node.get_content() for node in reranked_nodes])

return response
Benefits and Risks

Action-taking and Output Generation
potentially lead to much greater ROI in terms
of time savings and capability improvement




⚠️LLMs need to achieve a greater degree of
reliability

⚠️Action-taking requires ample human-in-the-
loop to build trust.
LlamaIndex
Multimodal Report Generation
LlamaIndex
Generate interleaving text-and-image responses with the help of
structured outputs.
Example architecture: research and writer step
The researcher retrieves relevant chunks and documents, and
puts them into a data cache
The writer uses the data cache to generate a structured output
of interleaving text and image blocks.
Multimodal Report Generation
LlamaIndex
Generate interleaving text-and-image responses with the help of
structured outputs.
https://github.com/run-llama/llama_parse/blob/main/examples/
multimodal/multimodal_report_generation_agent.ipynb
Towards a Scalable, Full-Stack Application
P
Running Agents in Production
LlamaIndex
You need the right architecture and infra components to serve
complex, agentic workflows to end-users as a production application.


Requirements
Encapsulation and re-us
Standardized communication interfaces between agents and with
the client.
Scalability in number of users and number of agent
Human-in-the-loop for the end-use
Debugging and observability tools for the developer
User Production
Agent1
Agent2
Agent3
llama-deploy
LlamaIndex
Deploy agentic workflows as microservices.
Model every agent workflow as a service AP
All agent communication occurs via a central message queu
Distributed tool-executio
Human-in-the-loop as a servic
Easy deployment with docker-compose and Kubernetes
User Message Queue
Agent 1
Agent 2
Agent 3
Orchestrator
Control Plane
Service
Metadata
Decides what happens
next
https://github.com/run-llama/llama_deploy
Thank you!
LlamaIndex

September 23, 2024

More Related Content

Similar to Multimodal Knowledge Assistance - Berkeley LLM AI Agents MOOC (20)

PPTX
The Semantic Knowledge Graph
Trey Grainger
 
PDF
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Big Data Spain
 
PDF
Using_python_webdevolopment_datascience.pdf
Sudipta Bhattacharya
 
PDF
Lambda Architecture and open source technology stack for real time big data
Trieu Nguyen
 
PDF
Lambda architecture for real time big data
Trieu Nguyen
 
PPTX
Microsoft Fabric Introduction
James Serra
 
PDF
Confluent & MongoDB APAC Lunch & Learn
confluent
 
PDF
Off-Label Data Mesh: A Prescription for Healthier Data
HostedbyConfluent
 
PDF
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Paco Nathan
 
PPTX
LlamaIndex_HassGeek_Workshop_for_AI.pptx
kitedav183
 
PDF
Tech leaders guide to effective building of machine learning products
Gianmario Spacagna
 
PDF
Big Data Analytics (ML, DL, AI) hands-on
Dony Riyanto
 
PDF
Building API Powered Chatbot & Application using AI SDK.pdf
diliphembram121
 
PDF
Building API Powered Chatbot & Application using AI SDK (1).pdf
diliphembram121
 
PDF
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Jason Dai
 
PPTX
Serverless machine learning architectures at Helixa
Data Science Milan
 
PDF
Composable Parallel Processing in Apache Spark and Weld
Databricks
 
PDF
Final paper
Samuel-Hunter Berndt
 
PDF
mlflow: Accelerating the End-to-End ML lifecycle
Databricks
 
PPTX
Graph RAG Varieties and Their Enterprise Applications
Ontotext
 
The Semantic Knowledge Graph
Trey Grainger
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Big Data Spain
 
Using_python_webdevolopment_datascience.pdf
Sudipta Bhattacharya
 
Lambda Architecture and open source technology stack for real time big data
Trieu Nguyen
 
Lambda architecture for real time big data
Trieu Nguyen
 
Microsoft Fabric Introduction
James Serra
 
Confluent & MongoDB APAC Lunch & Learn
confluent
 
Off-Label Data Mesh: A Prescription for Healthier Data
HostedbyConfluent
 
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Paco Nathan
 
LlamaIndex_HassGeek_Workshop_for_AI.pptx
kitedav183
 
Tech leaders guide to effective building of machine learning products
Gianmario Spacagna
 
Big Data Analytics (ML, DL, AI) hands-on
Dony Riyanto
 
Building API Powered Chatbot & Application using AI SDK.pdf
diliphembram121
 
Building API Powered Chatbot & Application using AI SDK (1).pdf
diliphembram121
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Jason Dai
 
Serverless machine learning architectures at Helixa
Data Science Milan
 
Composable Parallel Processing in Apache Spark and Weld
Databricks
 
mlflow: Accelerating the End-to-End ML lifecycle
Databricks
 
Graph RAG Varieties and Their Enterprise Applications
Ontotext
 

More from VincentLui15 (8)

PDF
Sequoias2025CompensationandEquityReport-SneakPeekpdf.pdf
VincentLui15
 
PPT
Key Findings of China Solar Energy Market Fact Book
VincentLui15
 
PDF
Toward unified framework and symbolic decision making - Berkeley LLM AI Agent...
VincentLui15
 
PDF
Agents for Enterprise Workflows - Berkeley LLM AI Agents MOOC
VincentLui15
 
PDF
Agents for SW development - Berkeley LLM AI Agents MOOC
VincentLui15
 
PDF
Enterprise Trends for Gen AI - Berkeley LLM AI Agents MOOC
VincentLui15
 
PDF
Brief History and Overview of LLM Agents
VincentLui15
 
PDF
LLM Reasoning - Key Ideas and Limitations
VincentLui15
 
Sequoias2025CompensationandEquityReport-SneakPeekpdf.pdf
VincentLui15
 
Key Findings of China Solar Energy Market Fact Book
VincentLui15
 
Toward unified framework and symbolic decision making - Berkeley LLM AI Agent...
VincentLui15
 
Agents for Enterprise Workflows - Berkeley LLM AI Agents MOOC
VincentLui15
 
Agents for SW development - Berkeley LLM AI Agents MOOC
VincentLui15
 
Enterprise Trends for Gen AI - Berkeley LLM AI Agents MOOC
VincentLui15
 
Brief History and Overview of LLM Agents
VincentLui15
 
LLM Reasoning - Key Ideas and Limitations
VincentLui15
 
Ad

Recently uploaded (20)

PPTX
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PPTX
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
introduction to computer hardware and sofeware
chauhanshraddha2007
 
PDF
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PDF
Market Insight : ETH Dominance Returns
CIFDAQ
 
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
introduction to computer hardware and sofeware
chauhanshraddha2007
 
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
Market Insight : ETH Dominance Returns
CIFDAQ
 
Ad

Multimodal Knowledge Assistance - Berkeley LLM AI Agents MOOC

  • 1. Building an Multimodal Knowledge Assistant Jerry Liu September 23, 2024
  • 2. LlamaIndex: Build Production LLM Apps over Enterprise Data LlamaIndex LlamaIndex helps any developer build context-augmented LLM apps from prototype to production. Open-Source: Leading developer toolkit for building production LLM apps over data. Docs: https://docs.llamaindex.ai/ Repo: https://github.com/run-llama/llama_index LlamaCloud: A centralized knowledge interface for your production LLM application. Link: https://cloud.llamaindex.ai/ Evals Agents Vectors Semantic Search LLMs Chat Raw Data Q&A Embedding Structured Extraction
  • 3. Building a Knowledge Assistant LlamaIndex Goal: Build an interface that can take in any task as input and give back an output. Input forms: simple questions, complex questions, research tasks Output forms: short answer, structured output, research report Human: <Question> Answer: Sources:... Knowledge Base Human:... Human:... Agent:... Agent:...
  • 4. Knowledge Assistant with Basic RAG DataProcessingandIndexing BasicRetrievalandPrompting LlamaIndex ⚠️Naive data processing, primitive retrieval interface ⚠️Poor query understanding/planning ⚠️No function calling or tool use ⚠️Stateless, no memory BasicTextSplitting Top-K=5 SimpleQAPrompt Index Response Data
  • 5. Can we do more? LlamaIndex There’s many questions/tasks that naive RAG can’t give an answer to Hallucinations Limited time savings Limited decision-making enhancement How do we aim to build a production-ready knowledge assistant?
  • 6. A Better Knowledge Assistant LlamaIndex High-quality Multimodal RA Complex output generatio Agentic reasoning over complex input Towards a scalable, full-stack application Advanced Data and Retrieval Data Index Agent Response Data Processing
  • 7. A Better Knowledge Assistant LlamaIndex High-quality Multimodal RA Complex output generatio Agentic reasoning over complex input Towards a scalable, full-stack application Action-Taking Report Generation Data Analysis
  • 8. A Better Knowledge Assistant LlamaIndex High-quality Multimodal RA Complex output generatio Agentic reasoning over complex input Towards a scalable, full-stack application Tool Use Query Planning Memory Reflection Tool x Tool x Tool ds Tool ds Response Agent Other Tools Other Tools Advanced RAG and Retrieval Tool
  • 9. A Better Knowledge Assistant LlamaIndex High-quality Multimodal RA Complex output generatio Agentic reasoning over complex input Towards a scalable, full-stack application User Message Queue Agent 1 Agent 2 Agent 3 Orchestrator Control Plane Human-in-the-loop Service Metadata Decides what happens next
  • 11. Any LLM App is only as Good as your Data LlamaIndex Garbage in = garbage out Good data quality is a necessary component of any production LLM app. RawData DataProcessing CleanData ProductionLLMApps Structured Extraction Semantic Search Chat Agents Q&A ETL for LLM Parsin Chunkin Indexing
  • 12. Case Study: Complex Documents LlamaIndex A lot of documents can be classified as complex: Embedded Tables, Charts, Image Irregular Layout Headers/Footers Users want to ask research questions over this data Simple pointed question Multi-document comparison Research tasks Building a production-ready knowledge assistants over this complex data is challenging. Knowledge-IntensiveLLMApplications Data FoundationModels Developers Sales Dev Legal Finance
  • 13. An LLM-Native Document Parser LlamaIndex An ideal GenAI-native parser can structure complex document data for any downstream use case. Requirement Parse tables accurately into text and semi- structured representation Parse text into semantically coherent chunk Extract visual elements (images/diagrams/charts) into structured formats and return image chunks Automated metadata extraction Non-Requirement Extract detailed JSONs for every elemen Extract bounding boxes PDF Node Node TextChunk Tables TextChunk Diagrams Node Node
  • 14. LlamaParse “As an AI Applied Data Scientist who was granted one of the first ML patents in the U.S., and who is building cutting-edge AI capabilities at one of the world’s largest Private Equity Funds, I can confidently say that LlamaParse from LlamaIndex is currently the best technology I have seen for parsing complex document structures for Enterprise RAG pipelines. Its ability to preserve nested tables, extract challenging spatial layouts, and images is key to maintaining data integrity in advanced RAG and agentic model building.” Dean Barr, Applied AI Lead at Carlyle LlamaParse LlamaIndex Advanced document parser specifically for reducing LLM hallucinations 20k+ unique users 25M+ pages processed
  • 15. Use Cases LlamaIndex Multimodal RAG Annual Reports (Tables) Excel Sheets Forms
  • 16. Advanced Parsing + Advanced Indexing LlamaIndex You can combine parsing with hierarchical indexing and retrieval to model heterogeneous unstructured/tabular/ multimodal data within a document Parse documents into elements: text chunks, tables, images, and more For each element, extract one or more text representations that can be indexed Do recursive retrieval PDF Node Node Text Chunk Tables Text Chunk Diagrams Node Node
  • 17. Multimodal RAG Pipeline LlamaIndex Indexin Parse document into text and image chunks with LlamaPars Link each text chunk to image chunk through metadat Embed and index text chunks A true multimodal RAG pipeline stores both text and image chunks for use within a multi-modal LLM Retrieva Retrieve text chunks by text embedding Feed in both text and image to multimodal LLM during synthesis.
  • 18. Multimodal RAG Pipeline LlamaIndex Let’s run through a demo example of building multimodal RAG over a complex slide deck! The end result is you’re able to ask questions over visual data in the document. https://github.com/run-llama/ llama_parse/blob/main/examples/ multimodal/ multimodal_rag_slide_deck.ipynb
  • 19. LlamaCloud: An Enterprise RAG Platform LlamaIndex A production-ready RAG platform that allows developers to easily connect their unstructured data sources to LLM agent systems. Instant Time-to-Value for building knowledge assistant Out-of-the-box advanced RAG capabilitie Free up developer time to rapidly iterate on higher- level agent use cases State-of-the-Performance leads to increased satisfaction and reduced compliance risk Reduced maintenance cost once application is deployed Enterprise-ready security like access controls Signup: https://cloud.llamaindex.ai/ Data Ingestion Indexing Retrieval LlamaCloud
  • 20. E2E Multimodal RAG Capabilities LlamaIndex Setup multimodal indexing and retrieval in minutes Signup here: https://cloud.llamaindex.ai/
  • 23. Automating Decision Making LlamaIndex Action-Taking Agent Report Generation Data Analysis Agents should have the capability to not only generate chatbot responses, but als Produce knowledge wor Take actions Action-taking and Output Generation potentially lead to much greater ROI in terms of time savings and capability improvement Solution : Structured Outputs and Function Calling
  • 24. Multimodal Report Generation LlamaIndex Generate interleaving text-and-image responses with the help of structured outputs. https://github.com/run-llama/llama_parse/blob/main/examples/ multimodal/multimodal_report_generation.ipynb Output Schema class TextBlock(BaseModel): text: str class ImageBlock(BaseModel): file_path: str class ReportOutput(BaseModel): blocks: ListBlock | ImageBlock]
  • 25. Agentic Reasoning over Complex Inputs
  • 26. Complex Inputs LlamaIndex Naive RAG works well for pointed questions, but fails on more complex tasks. Summarization Questions: “Give me a summary of the entire <company> 10K annual report”  Comparison Questions: “Compare the open-source contributions of candidate A and candidate B” Multi-part Questions: “Tell me about the pro-X arguments in article A, and tell me about the pro-Y arguments in article B, make a table based on our internal style guide, then generate your own conclusion based on these facts.” Research Tasks: “I want to create a research survey on current supervised fine-tuning techniques. Can you help?”
  • 27. Tool Use Routing One-Shot Query Planning ReAct Conversation Memory Dynamic Planning + Execution From Simple to Advanced Agents LlamaIndex Simple Lower Cost Lower Latency Advanced Higher Cost
 Higher Latency Agent Ingredients Agent Ingredients
  • 28. Agentic RAG LlamaIndex Every data interface is a tool Use agent reasoning loops (sequential, DAG, tree) to tackle complex tasks End Result: Build personalized QA systems capable of handling complex questions! Tool Use Query Planning Memory Reflection Tool x Tool x Tool ds Tool ds Response Agent Other Tools Other Tools Advanced RAG and Retrieval Tool
  • 31. Agentic Orchestration Foundations LlamaIndex Router Response RAG Tool Reflection Task SQL Tool LlamaIndex Workflows We believe an agent orchestration framework should have the following properties Event-Driven: Model each step as listening to input events and emitting output events Composable: Piece together granular workflows into higher-level workflows Flexible: Write logic through LLM calls or through plain Python Code-first: Express orchestration logic through code. Easy to read and easy to extend. Debuggable and Observable: Step through and observe states Easily Deployable to Production: Translate notebook code into services that run in production.
  • 32. def generate_response(context, query):
 prompt = f"Question: {query}nnContext: {context}nnAnswer:"
 response = llm.complete(prompt)
 return response.text
 # Define the pipeline pipeline = QueryPipeline() pipeline.add_modules({
 "input": InputComponent(),
 "retriever": retriever,
 "reranker": reranker, "response_generator": FnComponent(fn=generate_response) })
 # Define the flow pipeline.add_link("input", "retriever") pipeline.add_link("retriever", "reranker") pipeline.add_link("input", "response_generator", dest_key="query") pipeline.add_link( "reranker", "response_generator", dest_key="context" ) # Run the pipeline response = pipeline.run("What is the capital of France?") print(response) Compared to Graph-based Approaches Graph-based approaches (e.g. our deprecated Query Pipelines) can be cumbersome and non-Pythonic for complex agentic workflows. Orchestration logic baked into edge More lines of code, less readabl Cumbersome to dynamically generate workflows based on runtime conditions
  • 33. def generate_response(context, query):
 prompt = f"Question: {query}nnContext: {context}nnAnswer:"
 response = llm.complete(prompt)
 return response.text
 # Define the pipeline pipeline = QueryPipeline() pipeline.add_modules({
 "input": InputComponent(),
 "retriever": retriever,
 "reranker": reranker, "response_generator": FnComponent(fn=generate_response) })
 # Define the flow pipeline.add_link("input", "retriever") pipeline.add_link("retriever", "reranker") pipeline.add_link("input", "response_generator", dest_key="query") pipeline.add_link( "reranker", "response_generator", dest_key="context" ) # Run the pipeline response = pipeline.run("What is the capital of France?") print(response) Compared to Graph-based Approaches Graph-based approaches (e.g. our deprecated Query Pipelines) can be cumbersome and non-Pythonic for complex agentic workflows. Compared to query pipelines, our workflows are more readable, and easier to maintain/scale. class RAGWorkflow(Workflow):
 def __init__(self): ...

 @step
 async def retrieve(self, query: str):
 return self.retriever.retrieve(query)

 @step
 async def rerank(self, retrieved_nodes):
 return self.reranker.postprocess_nodes(retrieved_nodes)

 @step
 async def generate_response(self, query: str, context):
 prompt = f"Question: {query}nnContext: {context}nnAnswer:"
 response = await self.llm.complete(prompt)
 return response.text

 @step
 async def run_workflow(self, query: str):
 retrieved_nodes = await self.retrieve(query)
 reranked_nodes = await self.rerank(retrieved_nodes)
 response = await self.generate_response(query, [node.get_content() for node in reranked_nodes])
 return response
  • 34. Benefits and Risks Action-taking and Output Generation potentially lead to much greater ROI in terms of time savings and capability improvement ⚠️LLMs need to achieve a greater degree of reliability ⚠️Action-taking requires ample human-in-the- loop to build trust. LlamaIndex
  • 35. Multimodal Report Generation LlamaIndex Generate interleaving text-and-image responses with the help of structured outputs. Example architecture: research and writer step The researcher retrieves relevant chunks and documents, and puts them into a data cache The writer uses the data cache to generate a structured output of interleaving text and image blocks.
  • 36. Multimodal Report Generation LlamaIndex Generate interleaving text-and-image responses with the help of structured outputs. https://github.com/run-llama/llama_parse/blob/main/examples/ multimodal/multimodal_report_generation_agent.ipynb
  • 37. Towards a Scalable, Full-Stack Application
  • 38. P Running Agents in Production LlamaIndex You need the right architecture and infra components to serve complex, agentic workflows to end-users as a production application. Requirements Encapsulation and re-us Standardized communication interfaces between agents and with the client. Scalability in number of users and number of agent Human-in-the-loop for the end-use Debugging and observability tools for the developer User Production Agent1 Agent2 Agent3
  • 39. llama-deploy LlamaIndex Deploy agentic workflows as microservices. Model every agent workflow as a service AP All agent communication occurs via a central message queu Distributed tool-executio Human-in-the-loop as a servic Easy deployment with docker-compose and Kubernetes User Message Queue Agent 1 Agent 2 Agent 3 Orchestrator Control Plane Service Metadata Decides what happens next https://github.com/run-llama/llama_deploy