SlideShare a Scribd company logo
MELBOURNE EDITION
2025 SPONSORS
API MANAGEMENT IN THE
AI ERA
WHO AM I?
NAME
WORK
SOCIALS
PASSIONS
Nilesh Gule
Avanade
@nileshgule
Photography
Cricket
Code with Passion, Strive for
Excellence
API Management in the AI Era session GAB Melbourne
API Management - GenAI Gateway
Azure-Samples/AI-Gateway: APIM
Challenges in managing GenAI APIs
• Track Token usage across multiple applications
• Ensure single app doesn’t consume whole TPM quota
• Secure API keys across multiple applications
• Distribute load across multiple endpoints
• Ensure committed capacity in PTUs is exhausted before falling back to PAYG instance
Provisioned Throughput Units (PTU)
• Allows to specify the amount of throughput required in a model deployment.
• Granted to subscription as quota
• Quota is specific to region and defines the maximum number of PTUs that can be assigned to deployments in the
subscription and region
• PTU provides
• Predictable performance
• Allocated processing capacity
• Cost savings
Understanding costs associated with provisioned throughput units (PTU)
Token Metrics Emitting
• Sends Token Merics usage to Applications Insights
• Provides overview of utilization of Azure OpenAI models
across multiple applications or API consumers
GenAI Gateway Capabilities in Azure API Management
Token Rate Limiting
• Manage and enforce limits per API consumer based on the
usage of API Tokens
GenAI Gateway Capabilities in Azure API Management
Load Balancer and Circuit Breaker
• Helps to spread load across multiple Azure OpenAI endpoints
• Round-robin, weighted or priority based load distribution
strategy
GenAI Gateway Capabilities in Azure API Management
Semantic Caching
GenAI Gateway Capabilities in Azure API Management
• Optimize Token usage by leveraging semantic caching
• Stores completions for prompts with similar meanings
Summary
• Track Token usage across multiple applications
• Emit Token Metrics policy
• Ensure single app doesn’t consume whole TPM quota
• Token Limit Policy
• Secure API keys across multiple applications
• Subscription keys
• Distribute load across multiple endpoints
• Backend pool load balancing and circuit breaker
Resources
• Azure OpenAI Gateway topologies
• Azure OpenAI Token Limit Policy
• LLM Token Limit Policy
• Azure OpenAI Emit Token Metric Policy
• LLM Emit Token Metric Policy
• Houssem Dellai Youtube videos
• GenAI Labs
• Designing and implementing GenAI gateway solution
Nilesh Gule
ARCHITECT | MICROSOFT MVP
“Code with Passion and
Strive for Excellence”
nileshgule
@nileshgule Nilesh Gule
NileshGule
www.handsonarchitect.com
https://www.youtube.com/@nilesh-gule
Source Code & slide deck
Nilesh Gule fork - GenAI Labs
https://github.com/NileshGule/AI-Gateway
GenAI Labs
https://aka.ms/apim/genai/labs
https://speakerdeck.com/nileshgule/
https://www.slideshare.net/nileshgule/
Q&A

More Related Content

PDF
API Management in the AI Era - Azure Singapore.pdf
Nilesh Gule
 
PPTX
Elevating AI Workflows: Integrating Azure API Management and Azure Functions ...
Callon Campbell
 
PPTX
apidays LIVE Hong Kong 2021 - Headless API Management by Snehal Chakraborty, ...
apidays
 
PDF
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Nilesh Gule
 
PDF
Azure Spring Clean 2024 event - Azure API Management: Architecting for Perfor...
Hamida Rebai Trabelsi
 
PDF
Grand tour of Azure API Management.pdf
Sherman37
 
PPTX
Manchester MuleSoft Meetup #8 - 28 Sept.pptx
Akshata Sawant
 
PPTX
Extending The Power Of Anypoint Platform Using Anypoint Service Mesh
AaronLieberman5
 
API Management in the AI Era - Azure Singapore.pdf
Nilesh Gule
 
Elevating AI Workflows: Integrating Azure API Management and Azure Functions ...
Callon Campbell
 
apidays LIVE Hong Kong 2021 - Headless API Management by Snehal Chakraborty, ...
apidays
 
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Nilesh Gule
 
Azure Spring Clean 2024 event - Azure API Management: Architecting for Perfor...
Hamida Rebai Trabelsi
 
Grand tour of Azure API Management.pdf
Sherman37
 
Manchester MuleSoft Meetup #8 - 28 Sept.pptx
Akshata Sawant
 
Extending The Power Of Anypoint Platform Using Anypoint Service Mesh
AaronLieberman5
 

Similar to API Management in the AI Era session GAB Melbourne (20)

PPTX
Global Azure 2022 - Architecting Modern Serverless APIs with Azure Functions ...
Callon Campbell
 
PDF
APIs In Action -Harnessing the Power of Azure API Management: Building Robust...
Hamida Rebai Trabelsi
 
PDF
Practical Data Mesh: Building Decentralized Data Architectures with Event Str...
Harshana Martin
 
PDF
Practical Data Mesh: Building Decentralized Data Architectures with Event Stream
Eva Mave Ng
 
PPTX
Azure Web Apps Advanced Security
Udaiappa Ramachandran
 
PDF
Extend soa with api management Doag18
Vinay Kumar
 
PDF
Managing the Complexity of Microservices Deployments
Apigee | Google Cloud
 
PPTX
WSO2Con 2025 - Unified Management of Ingress and Egress Across Multiple API G...
WSO2
 
PPTX
Anypoint API Manager Custom Policies & Best Practices
MuleSoft Meetups
 
PDF
I Love APIs 2015: Scaling Mobile-focused Microservices at Verizon
Apigee | Google Cloud
 
PPTX
Google app engine BCA cloud computing subject
Subrahmanya6
 
PDF
Gcp intro-20160721
Haeseung Lee
 
PDF
Sustainability Challenge, Postman, Rest sheet and Anypoint provider : MuleSof...
Angel Alberici
 
PPTX
Transforming Your Business Through APIs
Apigee | Google Cloud
 
PDF
Building modern secure API Products and Monetise with MuleSoft Anypoint Platform
Harshana Martin
 
PPTX
Baltimore jan2019 mule4
ManjuKumara GH
 
PDF
MuleSoft Surat Virtual Meetup#15 - Caching Scope, Caching Strategy and Jenkin...
Jitendra Bafna
 
PPTX
Disruptive Trends in Application Development
WaveMaker, Inc.
 
PDF
apidays LIVE Paris 2021 - Lessons from the API Stewardship Journey in Azure b...
apidays
 
PDF
Innovation morning agenda+azure arc
Claudia Angelelli
 
Global Azure 2022 - Architecting Modern Serverless APIs with Azure Functions ...
Callon Campbell
 
APIs In Action -Harnessing the Power of Azure API Management: Building Robust...
Hamida Rebai Trabelsi
 
Practical Data Mesh: Building Decentralized Data Architectures with Event Str...
Harshana Martin
 
Practical Data Mesh: Building Decentralized Data Architectures with Event Stream
Eva Mave Ng
 
Azure Web Apps Advanced Security
Udaiappa Ramachandran
 
Extend soa with api management Doag18
Vinay Kumar
 
Managing the Complexity of Microservices Deployments
Apigee | Google Cloud
 
WSO2Con 2025 - Unified Management of Ingress and Egress Across Multiple API G...
WSO2
 
Anypoint API Manager Custom Policies & Best Practices
MuleSoft Meetups
 
I Love APIs 2015: Scaling Mobile-focused Microservices at Verizon
Apigee | Google Cloud
 
Google app engine BCA cloud computing subject
Subrahmanya6
 
Gcp intro-20160721
Haeseung Lee
 
Sustainability Challenge, Postman, Rest sheet and Anypoint provider : MuleSof...
Angel Alberici
 
Transforming Your Business Through APIs
Apigee | Google Cloud
 
Building modern secure API Products and Monetise with MuleSoft Anypoint Platform
Harshana Martin
 
Baltimore jan2019 mule4
ManjuKumara GH
 
MuleSoft Surat Virtual Meetup#15 - Caching Scope, Caching Strategy and Jenkin...
Jitendra Bafna
 
Disruptive Trends in Application Development
WaveMaker, Inc.
 
apidays LIVE Paris 2021 - Lessons from the API Stewardship Journey in Azure b...
apidays
 
Innovation morning agenda+azure arc
Claudia Angelelli
 
Ad

More from Nilesh Gule (20)

PDF
Infuse Intelligence Into your App with Foundry Local.pdf
Nilesh Gule
 
PDF
GitHub Copilot Agent Mode - Azure Builders Melbourne
Nilesh Gule
 
PDF
Festive Tech Calendar -2024 Supercharge Kubernetes Debugging with k8sGPT.pdf
Nilesh Gule
 
PDF
Code Creativity and Customers- Navigating the Generative AI Landscape - Austr...
Nilesh Gule
 
PDF
Supercharge Kubernetes Debugging with k8sGPT.pdf
Nilesh Gule
 
PDF
Portable Multi-cloud Applications with Dapr.pdf
Nilesh Gule
 
PDF
k8sug Melbourne - Improve Kubernetes with k8sGPT
Nilesh Gule
 
PDF
Event Driven Autoscaling using KEDA - MVP
Nilesh Gule
 
PDF
Code Creativity and Customers- Navigating the Generative AI Landscape.pdf
Nilesh Gule
 
PDF
Improve Monitoring And Observability for Kubernetes with OSS tools.pdf
Nilesh Gule
 
PDF
Modular Architecturs for Resilience and Adaptability.pdf
Nilesh Gule
 
PDF
Autoscale applications based on external events with KEDA.pdf
Nilesh Gule
 
PDF
Singapore JUG - Open Telemetry.pdf
Nilesh Gule
 
PDF
Cloud Native Ninja - Getting Started with Kubernetes - Part 9.pdf
Nilesh Gule
 
PDF
Build Secure Portable Applications using AKS and its ecosystem
Nilesh Gule
 
PDF
Cloud Native Ninja - PT8 - Containerize React app.pdf
Nilesh Gule
 
PDF
Cloud Native Ninja - PT8 - Containerize React app.pdf
Nilesh Gule
 
PDF
Modular Architecturs for resilience and Adaptability.pdf
Nilesh Gule
 
PDF
Modular Architecturs for resilience and Adaptability.pdf
Nilesh Gule
 
PDF
Cloud Native Ninja - PT7 - Containerize Go apps.pdf
Nilesh Gule
 
Infuse Intelligence Into your App with Foundry Local.pdf
Nilesh Gule
 
GitHub Copilot Agent Mode - Azure Builders Melbourne
Nilesh Gule
 
Festive Tech Calendar -2024 Supercharge Kubernetes Debugging with k8sGPT.pdf
Nilesh Gule
 
Code Creativity and Customers- Navigating the Generative AI Landscape - Austr...
Nilesh Gule
 
Supercharge Kubernetes Debugging with k8sGPT.pdf
Nilesh Gule
 
Portable Multi-cloud Applications with Dapr.pdf
Nilesh Gule
 
k8sug Melbourne - Improve Kubernetes with k8sGPT
Nilesh Gule
 
Event Driven Autoscaling using KEDA - MVP
Nilesh Gule
 
Code Creativity and Customers- Navigating the Generative AI Landscape.pdf
Nilesh Gule
 
Improve Monitoring And Observability for Kubernetes with OSS tools.pdf
Nilesh Gule
 
Modular Architecturs for Resilience and Adaptability.pdf
Nilesh Gule
 
Autoscale applications based on external events with KEDA.pdf
Nilesh Gule
 
Singapore JUG - Open Telemetry.pdf
Nilesh Gule
 
Cloud Native Ninja - Getting Started with Kubernetes - Part 9.pdf
Nilesh Gule
 
Build Secure Portable Applications using AKS and its ecosystem
Nilesh Gule
 
Cloud Native Ninja - PT8 - Containerize React app.pdf
Nilesh Gule
 
Cloud Native Ninja - PT8 - Containerize React app.pdf
Nilesh Gule
 
Modular Architecturs for resilience and Adaptability.pdf
Nilesh Gule
 
Modular Architecturs for resilience and Adaptability.pdf
Nilesh Gule
 
Cloud Native Ninja - PT7 - Containerize Go apps.pdf
Nilesh Gule
 
Ad

Recently uploaded (20)

PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
Software Development Methodologies in 2025
KodekX
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
Software Development Methodologies in 2025
KodekX
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
The Future of Artificial Intelligence (AI)
Mukul
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 

API Management in the AI Era session GAB Melbourne

  • 3. API MANAGEMENT IN THE AI ERA
  • 4. WHO AM I? NAME WORK SOCIALS PASSIONS Nilesh Gule Avanade @nileshgule Photography Cricket Code with Passion, Strive for Excellence
  • 6. API Management - GenAI Gateway Azure-Samples/AI-Gateway: APIM
  • 7. Challenges in managing GenAI APIs • Track Token usage across multiple applications • Ensure single app doesn’t consume whole TPM quota • Secure API keys across multiple applications • Distribute load across multiple endpoints • Ensure committed capacity in PTUs is exhausted before falling back to PAYG instance
  • 8. Provisioned Throughput Units (PTU) • Allows to specify the amount of throughput required in a model deployment. • Granted to subscription as quota • Quota is specific to region and defines the maximum number of PTUs that can be assigned to deployments in the subscription and region • PTU provides • Predictable performance • Allocated processing capacity • Cost savings Understanding costs associated with provisioned throughput units (PTU)
  • 9. Token Metrics Emitting • Sends Token Merics usage to Applications Insights • Provides overview of utilization of Azure OpenAI models across multiple applications or API consumers GenAI Gateway Capabilities in Azure API Management
  • 10. Token Rate Limiting • Manage and enforce limits per API consumer based on the usage of API Tokens GenAI Gateway Capabilities in Azure API Management
  • 11. Load Balancer and Circuit Breaker • Helps to spread load across multiple Azure OpenAI endpoints • Round-robin, weighted or priority based load distribution strategy GenAI Gateway Capabilities in Azure API Management
  • 12. Semantic Caching GenAI Gateway Capabilities in Azure API Management • Optimize Token usage by leveraging semantic caching • Stores completions for prompts with similar meanings
  • 13. Summary • Track Token usage across multiple applications • Emit Token Metrics policy • Ensure single app doesn’t consume whole TPM quota • Token Limit Policy • Secure API keys across multiple applications • Subscription keys • Distribute load across multiple endpoints • Backend pool load balancing and circuit breaker
  • 14. Resources • Azure OpenAI Gateway topologies • Azure OpenAI Token Limit Policy • LLM Token Limit Policy • Azure OpenAI Emit Token Metric Policy • LLM Emit Token Metric Policy • Houssem Dellai Youtube videos • GenAI Labs • Designing and implementing GenAI gateway solution
  • 15. Nilesh Gule ARCHITECT | MICROSOFT MVP “Code with Passion and Strive for Excellence” nileshgule @nileshgule Nilesh Gule NileshGule www.handsonarchitect.com https://www.youtube.com/@nilesh-gule
  • 16. Source Code & slide deck Nilesh Gule fork - GenAI Labs https://github.com/NileshGule/AI-Gateway GenAI Labs https://aka.ms/apim/genai/labs https://speakerdeck.com/nileshgule/ https://www.slideshare.net/nileshgule/
  • 17. Q&A