API Management in the AI Era session GAB Melbourne

0 likes•11 views

Nilesh Gule

Slide desc related to presentation on API management in the AI era for Global Azure Bootcamp Melbourne 2025 edition

Technology

WHO AM I?
NAME
WORK
SOCIALS
PASSIONS
Nilesh Gule
Avanade
@nileshgule
Photography
Cricket
Code with Passion, Strive for
Excellence

API Management in the AI Era session GAB Melbourne

API Management - GenAI Gateway
Azure-Samples/AI-Gateway: APIM

Challenges in managing GenAI APIs
• Track Token usage across multiple applications
• Ensure single app doesn’t consume whole TPM quota
• Secure API keys across multiple applications
• Distribute load across multiple endpoints
• Ensure committed capacity in PTUs is exhausted before falling back to PAYG instance

Provisioned Throughput Units (PTU)
• Allows to specify the amount of throughput required in a model deployment.
• Granted to subscription as quota
• Quota is specific to region and defines the maximum number of PTUs that can be assigned to deployments in the
subscription and region
• PTU provides
• Predictable performance
• Allocated processing capacity
• Cost savings
Understanding costs associated with provisioned throughput units (PTU)

Token Metrics Emitting
• Sends Token Merics usage to Applications Insights
• Provides overview of utilization of Azure OpenAI models
across multiple applications or API consumers
GenAI Gateway Capabilities in Azure API Management

Token Rate Limiting
• Manage and enforce limits per API consumer based on the
usage of API Tokens
GenAI Gateway Capabilities in Azure API Management

Load Balancer and Circuit Breaker
• Helps to spread load across multiple Azure OpenAI endpoints
• Round-robin, weighted or priority based load distribution
strategy
GenAI Gateway Capabilities in Azure API Management

Semantic Caching
GenAI Gateway Capabilities in Azure API Management
• Optimize Token usage by leveraging semantic caching
• Stores completions for prompts with similar meanings

Summary
• Track Token usage across multiple applications
• Emit Token Metrics policy
• Ensure single app doesn’t consume whole TPM quota
• Token Limit Policy
• Secure API keys across multiple applications
• Subscription keys
• Distribute load across multiple endpoints
• Backend pool load balancing and circuit breaker

Resources
• Azure OpenAI Gateway topologies
• Azure OpenAI Token Limit Policy
• LLM Token Limit Policy
• Azure OpenAI Emit Token Metric Policy
• LLM Emit Token Metric Policy
• Houssem Dellai Youtube videos
• GenAI Labs
• Designing and implementing GenAI gateway solution

Nilesh Gule
ARCHITECT | MICROSOFT MVP
“Code with Passion and
Strive for Excellence”
nileshgule
@nileshgule Nilesh Gule
NileshGule
www.handsonarchitect.com
https://www.youtube.com/@nilesh-gule

Source Code & slide deck
Nilesh Gule fork - GenAI Labs
https://github.com/NileshGule/AI-Gateway
GenAI Labs
https://aka.ms/apim/genai/labs
https://speakerdeck.com/nileshgule/
https://www.slideshare.net/nileshgule/

More Related Content

PDF

API Management in the AI Era - Azure Singapore.pdfNilesh Gule

PPTX

Elevating AI Workflows: Integrating Azure API Management and Azure Functions ...Callon Campbell

PPTX

apidays LIVE Hong Kong 2021 - Headless API Management by Snehal Chakraborty, ...apidays

PDF

Enhance GitHub Copilot using MCP - Enterprise version.pdfNilesh Gule

PDF

Azure Spring Clean 2024 event - Azure API Management: Architecting for Perfor...Hamida Rebai Trabelsi

PDF

Grand tour of Azure API Management.pdfSherman37

PPTX

Manchester MuleSoft Meetup #8 - 28 Sept.pptxAkshata Sawant

PPTX

Extending The Power Of Anypoint Platform Using Anypoint Service MeshAaronLieberman5

API Management in the AI Era - Azure Singapore.pdfNilesh Gule

Elevating AI Workflows: Integrating Azure API Management and Azure Functions ...Callon Campbell

apidays LIVE Hong Kong 2021 - Headless API Management by Snehal Chakraborty, ...apidays

Enhance GitHub Copilot using MCP - Enterprise version.pdfNilesh Gule

Azure Spring Clean 2024 event - Azure API Management: Architecting for Perfor...Hamida Rebai Trabelsi

Grand tour of Azure API Management.pdfSherman37

Manchester MuleSoft Meetup #8 - 28 Sept.pptxAkshata Sawant

Extending The Power Of Anypoint Platform Using Anypoint Service MeshAaronLieberman5

Similar to API Management in the AI Era session GAB Melbourne (20)

PPTX

Global Azure 2022 - Architecting Modern Serverless APIs with Azure Functions ...Callon Campbell

PDF

APIs In Action -Harnessing the Power of Azure API Management: Building Robust...Hamida Rebai Trabelsi

PDF

Practical Data Mesh: Building Decentralized Data Architectures with Event Str...Harshana Martin

PDF

Practical Data Mesh: Building Decentralized Data Architectures with Event StreamEva Mave Ng

PPTX

Azure Web Apps Advanced SecurityUdaiappa Ramachandran

PDF

Extend soa with api management Doag18Vinay Kumar

PDF

Managing the Complexity of Microservices DeploymentsApigee | Google Cloud

PPTX

WSO2Con 2025 - Unified Management of Ingress and Egress Across Multiple API G...WSO2

PPTX

Anypoint API Manager Custom Policies & Best PracticesMuleSoft Meetups

PDF

I Love APIs 2015: Scaling Mobile-focused Microservices at VerizonApigee | Google Cloud

PPTX

Google app engine BCA cloud computing subjectSubrahmanya6

PDF

Gcp intro-20160721Haeseung Lee

PDF

Sustainability Challenge, Postman, Rest sheet and Anypoint provider : MuleSof...Angel Alberici

PPTX

Transforming Your Business Through APIsApigee | Google Cloud

PDF

Building modern secure API Products and Monetise with MuleSoft Anypoint PlatformHarshana Martin

PPTX

Baltimore jan2019 mule4ManjuKumara GH

PDF

MuleSoft Surat Virtual Meetup#15 - Caching Scope, Caching Strategy and Jenkin...Jitendra Bafna

PPTX

Disruptive Trends in Application DevelopmentWaveMaker, Inc.

PDF

apidays LIVE Paris 2021 - Lessons from the API Stewardship Journey in Azure b...apidays

PDF

Innovation morning agenda+azure arcClaudia Angelelli

Global Azure 2022 - Architecting Modern Serverless APIs with Azure Functions ...Callon Campbell

APIs In Action -Harnessing the Power of Azure API Management: Building Robust...Hamida Rebai Trabelsi

Practical Data Mesh: Building Decentralized Data Architectures with Event Str...Harshana Martin

Practical Data Mesh: Building Decentralized Data Architectures with Event StreamEva Mave Ng

Azure Web Apps Advanced SecurityUdaiappa Ramachandran

Extend soa with api management Doag18Vinay Kumar

Managing the Complexity of Microservices DeploymentsApigee | Google Cloud

WSO2Con 2025 - Unified Management of Ingress and Egress Across Multiple API G...WSO2

Anypoint API Manager Custom Policies & Best PracticesMuleSoft Meetups

I Love APIs 2015: Scaling Mobile-focused Microservices at VerizonApigee | Google Cloud

Google app engine BCA cloud computing subjectSubrahmanya6

Gcp intro-20160721Haeseung Lee

Sustainability Challenge, Postman, Rest sheet and Anypoint provider : MuleSof...Angel Alberici

Transforming Your Business Through APIsApigee | Google Cloud

Building modern secure API Products and Monetise with MuleSoft Anypoint PlatformHarshana Martin

Baltimore jan2019 mule4ManjuKumara GH

MuleSoft Surat Virtual Meetup#15 - Caching Scope, Caching Strategy and Jenkin...Jitendra Bafna

Disruptive Trends in Application DevelopmentWaveMaker, Inc.

apidays LIVE Paris 2021 - Lessons from the API Stewardship Journey in Azure b...apidays

Innovation morning agenda+azure arcClaudia Angelelli

More from Nilesh Gule (20)

PDF

Infuse Intelligence Into your App with Foundry Local.pdfNilesh Gule

PDF

GitHub Copilot Agent Mode - Azure Builders MelbourneNilesh Gule

PDF

Festive Tech Calendar -2024 Supercharge Kubernetes Debugging with k8sGPT.pdfNilesh Gule

PDF

Code Creativity and Customers- Navigating the Generative AI Landscape - Austr...Nilesh Gule

PDF

Supercharge Kubernetes Debugging with k8sGPT.pdfNilesh Gule

PDF

Portable Multi-cloud Applications with Dapr.pdfNilesh Gule

PDF

k8sug Melbourne - Improve Kubernetes with k8sGPTNilesh Gule

PDF

Event Driven Autoscaling using KEDA - MVPNilesh Gule

PDF

Code Creativity and Customers- Navigating the Generative AI Landscape.pdfNilesh Gule

PDF

Improve Monitoring And Observability for Kubernetes with OSS tools.pdfNilesh Gule

PDF

Modular Architecturs for Resilience and Adaptability.pdfNilesh Gule

PDF

Autoscale applications based on external events with KEDA.pdfNilesh Gule

PDF

Singapore JUG - Open Telemetry.pdfNilesh Gule

PDF

Cloud Native Ninja - Getting Started with Kubernetes - Part 9.pdfNilesh Gule

PDF

Build Secure Portable Applications using AKS and its ecosystemNilesh Gule

PDF

Cloud Native Ninja - PT8 - Containerize React app.pdfNilesh Gule

PDF

Cloud Native Ninja - PT8 - Containerize React app.pdfNilesh Gule

PDF

Modular Architecturs for resilience and Adaptability.pdfNilesh Gule

PDF

Modular Architecturs for resilience and Adaptability.pdfNilesh Gule

PDF

Cloud Native Ninja - PT7 - Containerize Go apps.pdfNilesh Gule

Infuse Intelligence Into your App with Foundry Local.pdfNilesh Gule

GitHub Copilot Agent Mode - Azure Builders MelbourneNilesh Gule

Festive Tech Calendar -2024 Supercharge Kubernetes Debugging with k8sGPT.pdfNilesh Gule

Code Creativity and Customers- Navigating the Generative AI Landscape - Austr...Nilesh Gule

Supercharge Kubernetes Debugging with k8sGPT.pdfNilesh Gule

Portable Multi-cloud Applications with Dapr.pdfNilesh Gule

k8sug Melbourne - Improve Kubernetes with k8sGPTNilesh Gule

Event Driven Autoscaling using KEDA - MVPNilesh Gule

Code Creativity and Customers- Navigating the Generative AI Landscape.pdfNilesh Gule

Improve Monitoring And Observability for Kubernetes with OSS tools.pdfNilesh Gule

Modular Architecturs for Resilience and Adaptability.pdfNilesh Gule

Autoscale applications based on external events with KEDA.pdfNilesh Gule

Singapore JUG - Open Telemetry.pdfNilesh Gule

Cloud Native Ninja - Getting Started with Kubernetes - Part 9.pdfNilesh Gule

Build Secure Portable Applications using AKS and its ecosystemNilesh Gule

Cloud Native Ninja - PT8 - Containerize React app.pdfNilesh Gule

Modular Architecturs for resilience and Adaptability.pdfNilesh Gule

Cloud Native Ninja - PT7 - Containerize Go apps.pdfNilesh Gule

Recently uploaded (20)

PDF

Economic Impact of Data Centres to the Malaysian Economyflintglobalapac

PDF

Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...Sandesh Rao

PPTX

Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...AgileNetwork

PDF

AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...Sandesh Rao

PDF

Tea4chat - another LLM Project by Kerem Atama0m0rajab1

PDF

Software Development Methodologies in 2025KodekX

PPTX

Simple and concise overview about Quantum computing..pptxmughal641

PDF

The Future of Artificial Intelligence (AI)Mukul

PDF

MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdfNeo4j

PDF

Brief History of Internet - Early Days of Internetsutharharshit158

PDF

Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdfPrecisely

PDF

Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdfCA Suvidha Chaplot

PPTX

AI and Robotics for Human Well-being.pptxJAYMIN SUTHAR

PPTX

The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptxsujalchauhan1305

PPTX

Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...AndreeaTom

PDF

How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdfStryv Solutions Pvt. Ltd.

PDF

Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdfSandesh Rao

PPTX

New ThousandEyes Product Innovations: Cisco Live June 2025ThousandEyes

PPTX

cloud computing vai.pptx for the projectvaibhavdobariyal79

PDF

Security features in Dell, HP, and Lenovo PC systems: A research-based compar...Principled Technologies

Economic Impact of Data Centres to the Malaysian Economyflintglobalapac

Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...Sandesh Rao

Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...AgileNetwork

AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...Sandesh Rao

Tea4chat - another LLM Project by Kerem Atama0m0rajab1

Software Development Methodologies in 2025KodekX

Simple and concise overview about Quantum computing..pptxmughal641

The Future of Artificial Intelligence (AI)Mukul

MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdfNeo4j

Brief History of Internet - Early Days of Internetsutharharshit158

Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdfPrecisely

Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdfCA Suvidha Chaplot

AI and Robotics for Human Well-being.pptxJAYMIN SUTHAR

The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptxsujalchauhan1305

Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...AndreeaTom

How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdfStryv Solutions Pvt. Ltd.

Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdfSandesh Rao

New ThousandEyes Product Innovations: Cisco Live June 2025ThousandEyes

cloud computing vai.pptx for the projectvaibhavdobariyal79

Security features in Dell, HP, and Lenovo PC systems: A research-based compar...Principled Technologies

API Management in the AI Era session GAB Melbourne

1. MELBOURNE EDITION

2. 2025 SPONSORS

3. API MANAGEMENT IN THE AI ERA

4. WHO AM I? NAME WORK SOCIALS PASSIONS Nilesh Gule Avanade @nileshgule Photography Cricket Code with Passion, Strive for Excellence

6. API Management - GenAI Gateway Azure-Samples/AI-Gateway: APIM

7. Challenges in managing GenAI APIs • Track Token usage across multiple applications • Ensure single app doesn’t consume whole TPM quota • Secure API keys across multiple applications • Distribute load across multiple endpoints • Ensure committed capacity in PTUs is exhausted before falling back to PAYG instance

8. Provisioned Throughput Units (PTU) • Allows to specify the amount of throughput required in a model deployment. • Granted to subscription as quota • Quota is specific to region and defines the maximum number of PTUs that can be assigned to deployments in the subscription and region • PTU provides • Predictable performance • Allocated processing capacity • Cost savings Understanding costs associated with provisioned throughput units (PTU)

9. Token Metrics Emitting • Sends Token Merics usage to Applications Insights • Provides overview of utilization of Azure OpenAI models across multiple applications or API consumers GenAI Gateway Capabilities in Azure API Management

10. Token Rate Limiting • Manage and enforce limits per API consumer based on the usage of API Tokens GenAI Gateway Capabilities in Azure API Management

11. Load Balancer and Circuit Breaker • Helps to spread load across multiple Azure OpenAI endpoints • Round-robin, weighted or priority based load distribution strategy GenAI Gateway Capabilities in Azure API Management

12. Semantic Caching GenAI Gateway Capabilities in Azure API Management • Optimize Token usage by leveraging semantic caching • Stores completions for prompts with similar meanings

13. Summary • Track Token usage across multiple applications • Emit Token Metrics policy • Ensure single app doesn’t consume whole TPM quota • Token Limit Policy • Secure API keys across multiple applications • Subscription keys • Distribute load across multiple endpoints • Backend pool load balancing and circuit breaker

14. Resources • Azure OpenAI Gateway topologies • Azure OpenAI Token Limit Policy • LLM Token Limit Policy • Azure OpenAI Emit Token Metric Policy • LLM Emit Token Metric Policy • Houssem Dellai Youtube videos • GenAI Labs • Designing and implementing GenAI gateway solution

15. Nilesh Gule ARCHITECT | MICROSOFT MVP “Code with Passion and Strive for Excellence” nileshgule @nileshgule Nilesh Gule NileshGule www.handsonarchitect.com https://www.youtube.com/@nilesh-gule

16. Source Code & slide deck Nilesh Gule fork - GenAI Labs https://github.com/NileshGule/AI-Gateway GenAI Labs https://aka.ms/apim/genai/labs https://speakerdeck.com/nileshgule/ https://www.slideshare.net/nileshgule/

17. Q&A