SlideShare a Scribd company logo
Indexing and searching NuGet.org
with Azure Functions and Search
Maarten Balliauw
@maartenballiauw
“Find this type on NuGet.org”
“Find this type on NuGet.org”
In ReSharper and Rider
Search for namespaces
& types that are not yet referenced
“Find this type on NuGet.org”
Idea in 2013, introduced in ReSharper 9
(2015 - https://www.jetbrains.com/resharper/whatsnew/whatsnew_9.html)
Consists of
ReSharper functionality
A service that indexes packages and powers search
Azure Cloud Service (Web and Worker role)
Indexer uses NuGet OData feed
https://www.nuget.org/api/v2/Packages?$select=Id,Version,NormalizedVersion,
LastEdited,Published&$orderby=LastEdited%20desc
&$filter=LastEdited%20gt%20datetime%272012-01-01%27
NuGet over time...
https://twitter.com/controlflow/status/1067724815958777856
NuGet over time...
https://github.com/NuGet/Announcements/issues/37
NuGet server-side API
V3 Protocol
JSON based
A “resource provider” of various endpoints per purpose
Catalog (NuGet.org only) – append-only event log
Registrations – materialization of newest state of a package
Flat container – .NET Core package restore (and VS autocompletion)
Report abuse URL template
Statistics
…
https://api.nuget.org/v3/index.json
(code in https://github.com/NuGet/NuGet.Services.Metadata)
Catalog seems interesting!
Append-only stream of mutations on NuGet.org
Updates (add/update) and Deletes
Chronological
Can continue where left off (uses a timestamp cursor)
Can restore NuGet.org to a given point in time
Structure
Root https://api.nuget.org/v3/catalog0/index.json
+ Page https://api.nuget.org/v3/catalog0/page0.json
+ Leaf https://api.nuget.org/v3/catalog0/data/2015.02.01.06.22.45/adam.jsgenerator.1.1.0.json
NuGet.org catalog
demo
“Find this type on NuGet.org”
Refactor from using OData to using V3?
Mostly done, one thing missing: download counts (using search now)
https://github.com/NuGet/NuGetGallery/issues/3532
Build a new version?
Welcome to this talk 
Building a new version
What do we need?
Watch the NuGet.org catalog for package changes
For every package change
Scan all assemblies
Store relation between package id+version and namespace+type
API compatible with all ReSharper and Rider versions
What do we need?
Watch the NuGet.org catalog for package changes periodic check
For every package change based on a queue
Scan all assemblies
Store relation between package id+version and namespace+type
API compatible with all ReSharper and Rider versions always up, flexible scale
Sounds like functions!
NuGet.org catalog Watch catalog
Index command
Find type API
Find namespace API
Search index
Index package
Raw .nupkg
Index as JSON
Download packageDownload command
Collecting from catalog
demo
Functions best practices
@PaulDJohnston https://medium.com/@PaulDJohnston/serverless-best-practices-b3c97d551535
Each function should do only one thing
Easier error handling & scaling
Learn to use messages and queues
Asynchronous means of communicating,
helps scale and avoid direct coupling
...
Bindings
Help a function do only one thing
Trigger, provide input/output
Function code bridges those
Build your own!*
SQL Server binding
Dropbox binding
...
NuGet Catalog
*Custom triggers
are not officially supported (yet?)
Trigger Input Output
Timer ✔
HTTP ✔ ✔
Blob ✔ ✔ ✔
Queue ✔ ✔
Table ✔ ✔
Service Bus ✔ ✔
EventHub ✔ ✔
EventGrid ✔
CosmosDB ✔ ✔ ✔
IoT Hub ✔
SendGrid, Twilio ✔
... ✔
Creating a trigger
binding
demo
We’re making progress!
NuGet.org catalog Watch catalog
Index command
Find type API
Find namespace API
Search index
Index package
Raw .nupkg
Index as JSON
Download packageDownload command
Downloading packages
demo
Next up: indexing
NuGet.org catalog Watch catalog
Index command
Find type API
Find namespace API
Search index
Index package
Raw .nupkg
Index as JSON
Download packageDownload command
Indexing
Opening up the .nupkg and reflecting on assemblies
System.Reflection.Metadata
Does not load the assembly being reflected into application process
Provides access to Portable Executable (PE) metadata in assembly
Store relation between package id+version and namespace+type
Azure Search? A database? Redis? Other?
Indexing packages
demo
“Do one thing well”
Our function shouldn’t care about creating a search index.
Better: return index operations, have something else handle those
Custom output binding?
Blog post with full story and implementation
https://bit.ly/2lzba32
👀
Almost there…
NuGet.org catalog Watch catalog
Index command
Find type API
Find namespace API
Search index
Index package
Raw .nupkg
Index as JSON
Download packageDownload command
Making search work
with ReSharper and Rider
demo
We’re done!
We’re done!
Functions
Collect changes from NuGet catalog
Download binaries
Index binaries using PE Header
Make search index available in API
Trigger, input and output bindings
Each function should do only one thing
NuGet.org catalog Watch catalog
Index command
Find type API
Find namespace API
Search index
Index package
Raw .nupkg
Index as JSON
Download packageDownload command
We’re done!
All our functions can scale (and fail)
independently
Full index in May 2019 took ~12h on 2 B1 instances
~ 1.7mio packages (NuGet.org homepage says)
~ 2.1mio packages (the catalog says )
~ 8 400 catalog pages
with ~ 4 200 000 catalog leaves
(hint: repo signing)
January 2020: ~ 2.6 mio packages / 3.5 TB
NuGet.org catalog Watch catalog
Index command
Find type API
Find namespace API
Search index
Index package
Raw .nupkg
Index as JSON
Download packageDownload command
Closing thoughts…
Would deploy in separate function apps for cost
Trigger binding collects all the time so needs dedicated capacity (and thus, cost)
Others can scale within bounds/consumption (think of $$$)
Would deploy in separate function apps for failure boundaries
Trigger, indexing, downloading should not affect health of API
Are bindings portable...?
Avoid them if (framework) lock-in matters to you
They are nice in terms of programming model…
Thank you!
https://blog.maartenballiauw.be
@maartenballiauw
Blog post with full story and implementation
https://bit.ly/2lzba32
👀

More Related Content

What's hot (19)

PDF
Real-time search in Drupal. Meet Elasticsearch
Alexei Gorobets
 
PDF
kRouter
Kelp Chen
 
PPT
Drupal and Elasticsearch
Nikolay Ignatov
 
PDF
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
台灣資料科學年會
 
PDF
FlutterでGraphQLを扱う
IgaHironobu
 
PDF
Getting Started With Angular
Stormpath
 
PDF
Nordic APIs - Automatic Testing of (RESTful) API Documentation
Rouven Weßling
 
PDF
Apache Sling as an OSGi-powered REST middleware
Robert Munteanu
 
KEY
JIRA REST Client for Python - Atlassian Summit 2012
Atlassian
 
PPTX
Combining Django REST framework & Elasticsearch
Yaroslav Muravskyi
 
PPT
Google Ajax APIs
Yu-Wei Chuang
 
PDF
WordPress RESTful API & Amazon API Gateway (English version)
崇之 清水
 
PDF
Real-time search in Drupal with Elasticsearch @Moldcamp
Alexei Gorobets
 
PPTX
grlc: Bridging the Gap Between RESTful APIs and Linked Data
Albert Meroño-Peñuela
 
PDF
Djangocon 2014 angular + django
Nina Zakharenko
 
PDF
Google App Engine With Java And Groovy
Ken Kousen
 
PDF
Building a Serverless company with Node.js, React and the Serverless Framewor...
Luciano Mammino
 
PDF
Django rest framework tips and tricks
xordoquy
 
PDF
Parse cloud code
維佋 唐
 
Real-time search in Drupal. Meet Elasticsearch
Alexei Gorobets
 
kRouter
Kelp Chen
 
Drupal and Elasticsearch
Nikolay Ignatov
 
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
台灣資料科學年會
 
FlutterでGraphQLを扱う
IgaHironobu
 
Getting Started With Angular
Stormpath
 
Nordic APIs - Automatic Testing of (RESTful) API Documentation
Rouven Weßling
 
Apache Sling as an OSGi-powered REST middleware
Robert Munteanu
 
JIRA REST Client for Python - Atlassian Summit 2012
Atlassian
 
Combining Django REST framework & Elasticsearch
Yaroslav Muravskyi
 
Google Ajax APIs
Yu-Wei Chuang
 
WordPress RESTful API & Amazon API Gateway (English version)
崇之 清水
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Alexei Gorobets
 
grlc: Bridging the Gap Between RESTful APIs and Linked Data
Albert Meroño-Peñuela
 
Djangocon 2014 angular + django
Nina Zakharenko
 
Google App Engine With Java And Groovy
Ken Kousen
 
Building a Serverless company with Node.js, React and the Serverless Framewor...
Luciano Mammino
 
Django rest framework tips and tricks
xordoquy
 
Parse cloud code
維佋 唐
 

Similar to Indexing and searching NuGet.org with Azure Functions and Search - .NET fwdays'20 online conference (20)

PPTX
Indexing and searching NuGet.org with Azure Functions and Search - Cloud Deve...
Maarten Balliauw
 
PPTX
CloudBurst 2019 - Indexing and searching NuGet.org with Azure Functions and S...
Maarten Balliauw
 
PPTX
NuGet 3.0 - Transitioning from OData to JSON-LD
Jeff Handley
 
PPTX
NuGet beyond Hello World - DotNext Piter 2017
Maarten Balliauw
 
PPTX
Understanding NuGet implementation for Enterprises
J S Jodha
 
PPTX
ConFoo - NuGet beyond Hello World
Maarten Balliauw
 
PPTX
Using nu get the way you should svcc
Maarten Balliauw
 
PPTX
Using NuGet the way you should - TechDays NL 2014
Maarten Balliauw
 
PDF
Nuget is easier than you think and you should be using it as both a consumer ...
Justin James
 
PPTX
NuGet (anti-)patterns - Tales from the Trenches
Xavier Decoster
 
PPTX
NuGet (Anti-)Patterns - Tales from the Trenches
Xavier Decoster
 
PPTX
20111010 agile minds - organize your chickens - nuget for the enterprise
Xavier Decoster
 
PPTX
An overview of the NuGet ecosystem - Mobel.io
Maarten Balliauw
 
PPTX
Azure serverless computing
Udaiappa Ramachandran
 
PPTX
NuGet Packages Presentation (DoT NeT).pptx
Knoldus Inc.
 
PPTX
Diagnosing issues in your ASP.NET applications in production with Visual Stud...
Microsoft Developer Network (MSDN) - Belgium and Luxembourg
 
PPTX
Controlling Component Chaos with NuGet and Versioning
Perforce
 
PPTX
Intro to NuGet
wlscaudill
 
PPTX
Evolution of NuGet
Jeff Handley
 
PPTX
Developing NuGet
Jeff Handley
 
Indexing and searching NuGet.org with Azure Functions and Search - Cloud Deve...
Maarten Balliauw
 
CloudBurst 2019 - Indexing and searching NuGet.org with Azure Functions and S...
Maarten Balliauw
 
NuGet 3.0 - Transitioning from OData to JSON-LD
Jeff Handley
 
NuGet beyond Hello World - DotNext Piter 2017
Maarten Balliauw
 
Understanding NuGet implementation for Enterprises
J S Jodha
 
ConFoo - NuGet beyond Hello World
Maarten Balliauw
 
Using nu get the way you should svcc
Maarten Balliauw
 
Using NuGet the way you should - TechDays NL 2014
Maarten Balliauw
 
Nuget is easier than you think and you should be using it as both a consumer ...
Justin James
 
NuGet (anti-)patterns - Tales from the Trenches
Xavier Decoster
 
NuGet (Anti-)Patterns - Tales from the Trenches
Xavier Decoster
 
20111010 agile minds - organize your chickens - nuget for the enterprise
Xavier Decoster
 
An overview of the NuGet ecosystem - Mobel.io
Maarten Balliauw
 
Azure serverless computing
Udaiappa Ramachandran
 
NuGet Packages Presentation (DoT NeT).pptx
Knoldus Inc.
 
Diagnosing issues in your ASP.NET applications in production with Visual Stud...
Microsoft Developer Network (MSDN) - Belgium and Luxembourg
 
Controlling Component Chaos with NuGet and Versioning
Perforce
 
Intro to NuGet
wlscaudill
 
Evolution of NuGet
Jeff Handley
 
Developing NuGet
Jeff Handley
 
Ad

More from Maarten Balliauw (20)

PPTX
Bringing nullability into existing code - dammit is not the answer.pptx
Maarten Balliauw
 
PPTX
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...
Maarten Balliauw
 
PPTX
Building a friendly .NET SDK to connect to Space
Maarten Balliauw
 
PPTX
Microservices for building an IDE - The innards of JetBrains Rider - NDC Oslo...
Maarten Balliauw
 
PPTX
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
Maarten Balliauw
 
PPTX
JetBrains Australia 2019 - Exploring .NET’s memory management – a trip down m...
Maarten Balliauw
 
PPTX
Approaches for application request throttling - Cloud Developer Days Poland
Maarten Balliauw
 
PPTX
Approaches for application request throttling - dotNetCologne
Maarten Balliauw
 
PPTX
CodeStock - Exploring .NET memory management - a trip down memory lane
Maarten Balliauw
 
PPTX
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
Maarten Balliauw
 
PPTX
ConFoo Montreal - Approaches for application request throttling
Maarten Balliauw
 
PPTX
Microservices for building an IDE – The innards of JetBrains Rider - TechDays...
Maarten Balliauw
 
PPTX
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
Maarten Balliauw
 
PPTX
DotNetFest - Let’s refresh our memory! Memory management in .NET
Maarten Balliauw
 
PPTX
VISUG - Approaches for application request throttling
Maarten Balliauw
 
PPTX
What is going on - Application diagnostics on Azure - TechDays Finland
Maarten Balliauw
 
PPTX
ConFoo - Exploring .NET’s memory management – a trip down memory lane
Maarten Balliauw
 
PPTX
Approaches to application request throttling
Maarten Balliauw
 
PPTX
Exploring .NET memory management (iSense)
Maarten Balliauw
 
PPTX
Exploring .NET memory management - JetBrains webinar
Maarten Balliauw
 
Bringing nullability into existing code - dammit is not the answer.pptx
Maarten Balliauw
 
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...
Maarten Balliauw
 
Building a friendly .NET SDK to connect to Space
Maarten Balliauw
 
Microservices for building an IDE - The innards of JetBrains Rider - NDC Oslo...
Maarten Balliauw
 
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
Maarten Balliauw
 
JetBrains Australia 2019 - Exploring .NET’s memory management – a trip down m...
Maarten Balliauw
 
Approaches for application request throttling - Cloud Developer Days Poland
Maarten Balliauw
 
Approaches for application request throttling - dotNetCologne
Maarten Balliauw
 
CodeStock - Exploring .NET memory management - a trip down memory lane
Maarten Balliauw
 
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
Maarten Balliauw
 
ConFoo Montreal - Approaches for application request throttling
Maarten Balliauw
 
Microservices for building an IDE – The innards of JetBrains Rider - TechDays...
Maarten Balliauw
 
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
Maarten Balliauw
 
DotNetFest - Let’s refresh our memory! Memory management in .NET
Maarten Balliauw
 
VISUG - Approaches for application request throttling
Maarten Balliauw
 
What is going on - Application diagnostics on Azure - TechDays Finland
Maarten Balliauw
 
ConFoo - Exploring .NET’s memory management – a trip down memory lane
Maarten Balliauw
 
Approaches to application request throttling
Maarten Balliauw
 
Exploring .NET memory management (iSense)
Maarten Balliauw
 
Exploring .NET memory management - JetBrains webinar
Maarten Balliauw
 
Ad

Recently uploaded (20)

PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Digital Circuits, important subject in CS
contactparinay1
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 

Indexing and searching NuGet.org with Azure Functions and Search - .NET fwdays'20 online conference

  • 1. Indexing and searching NuGet.org with Azure Functions and Search Maarten Balliauw @maartenballiauw
  • 2. “Find this type on NuGet.org”
  • 3. “Find this type on NuGet.org” In ReSharper and Rider Search for namespaces & types that are not yet referenced
  • 4. “Find this type on NuGet.org” Idea in 2013, introduced in ReSharper 9 (2015 - https://www.jetbrains.com/resharper/whatsnew/whatsnew_9.html) Consists of ReSharper functionality A service that indexes packages and powers search Azure Cloud Service (Web and Worker role) Indexer uses NuGet OData feed https://www.nuget.org/api/v2/Packages?$select=Id,Version,NormalizedVersion, LastEdited,Published&$orderby=LastEdited%20desc &$filter=LastEdited%20gt%20datetime%272012-01-01%27
  • 8. V3 Protocol JSON based A “resource provider” of various endpoints per purpose Catalog (NuGet.org only) – append-only event log Registrations – materialization of newest state of a package Flat container – .NET Core package restore (and VS autocompletion) Report abuse URL template Statistics … https://api.nuget.org/v3/index.json (code in https://github.com/NuGet/NuGet.Services.Metadata)
  • 9. Catalog seems interesting! Append-only stream of mutations on NuGet.org Updates (add/update) and Deletes Chronological Can continue where left off (uses a timestamp cursor) Can restore NuGet.org to a given point in time Structure Root https://api.nuget.org/v3/catalog0/index.json + Page https://api.nuget.org/v3/catalog0/page0.json + Leaf https://api.nuget.org/v3/catalog0/data/2015.02.01.06.22.45/adam.jsgenerator.1.1.0.json
  • 11. “Find this type on NuGet.org” Refactor from using OData to using V3? Mostly done, one thing missing: download counts (using search now) https://github.com/NuGet/NuGetGallery/issues/3532 Build a new version? Welcome to this talk 
  • 12. Building a new version
  • 13. What do we need? Watch the NuGet.org catalog for package changes For every package change Scan all assemblies Store relation between package id+version and namespace+type API compatible with all ReSharper and Rider versions
  • 14. What do we need? Watch the NuGet.org catalog for package changes periodic check For every package change based on a queue Scan all assemblies Store relation between package id+version and namespace+type API compatible with all ReSharper and Rider versions always up, flexible scale
  • 15. Sounds like functions! NuGet.org catalog Watch catalog Index command Find type API Find namespace API Search index Index package Raw .nupkg Index as JSON Download packageDownload command
  • 17. Functions best practices @PaulDJohnston https://medium.com/@PaulDJohnston/serverless-best-practices-b3c97d551535 Each function should do only one thing Easier error handling & scaling Learn to use messages and queues Asynchronous means of communicating, helps scale and avoid direct coupling ...
  • 18. Bindings Help a function do only one thing Trigger, provide input/output Function code bridges those Build your own!* SQL Server binding Dropbox binding ... NuGet Catalog *Custom triggers are not officially supported (yet?) Trigger Input Output Timer ✔ HTTP ✔ ✔ Blob ✔ ✔ ✔ Queue ✔ ✔ Table ✔ ✔ Service Bus ✔ ✔ EventHub ✔ ✔ EventGrid ✔ CosmosDB ✔ ✔ ✔ IoT Hub ✔ SendGrid, Twilio ✔ ... ✔
  • 20. We’re making progress! NuGet.org catalog Watch catalog Index command Find type API Find namespace API Search index Index package Raw .nupkg Index as JSON Download packageDownload command
  • 22. Next up: indexing NuGet.org catalog Watch catalog Index command Find type API Find namespace API Search index Index package Raw .nupkg Index as JSON Download packageDownload command
  • 23. Indexing Opening up the .nupkg and reflecting on assemblies System.Reflection.Metadata Does not load the assembly being reflected into application process Provides access to Portable Executable (PE) metadata in assembly Store relation between package id+version and namespace+type Azure Search? A database? Redis? Other?
  • 25. “Do one thing well” Our function shouldn’t care about creating a search index. Better: return index operations, have something else handle those Custom output binding? Blog post with full story and implementation https://bit.ly/2lzba32 👀
  • 26. Almost there… NuGet.org catalog Watch catalog Index command Find type API Find namespace API Search index Index package Raw .nupkg Index as JSON Download packageDownload command
  • 27. Making search work with ReSharper and Rider demo
  • 29. We’re done! Functions Collect changes from NuGet catalog Download binaries Index binaries using PE Header Make search index available in API Trigger, input and output bindings Each function should do only one thing NuGet.org catalog Watch catalog Index command Find type API Find namespace API Search index Index package Raw .nupkg Index as JSON Download packageDownload command
  • 30. We’re done! All our functions can scale (and fail) independently Full index in May 2019 took ~12h on 2 B1 instances ~ 1.7mio packages (NuGet.org homepage says) ~ 2.1mio packages (the catalog says ) ~ 8 400 catalog pages with ~ 4 200 000 catalog leaves (hint: repo signing) January 2020: ~ 2.6 mio packages / 3.5 TB NuGet.org catalog Watch catalog Index command Find type API Find namespace API Search index Index package Raw .nupkg Index as JSON Download packageDownload command
  • 31. Closing thoughts… Would deploy in separate function apps for cost Trigger binding collects all the time so needs dedicated capacity (and thus, cost) Others can scale within bounds/consumption (think of $$$) Would deploy in separate function apps for failure boundaries Trigger, indexing, downloading should not affect health of API Are bindings portable...? Avoid them if (framework) lock-in matters to you They are nice in terms of programming model…
  • 32. Thank you! https://blog.maartenballiauw.be @maartenballiauw Blog post with full story and implementation https://bit.ly/2lzba32 👀

Editor's Notes

  • #2: https://pixabay.com
  • #5: Show feature in action in Visual Studio (and show you can see basic metadata etc.)
  • #6: Copied in 2017 in VS - https://www.hanselman.com/blog/VisualStudio2017CanAutomaticallyRecommendNuGetPackagesForUnknownTypes.aspx Demo the feed quickly?
  • #7: More and more packages OData feed slow to query & scheduled for deprecation Is there a better way?
  • #8: More and more packages OData feed slow to query & scheduled for deprecation Is there a better way?
  • #10: Demo: click around in the API to show some base things
  • #12: Raw API - click around in the API to show some base things, explain how a cursor could go over it Root https://api.nuget.org/v3/catalog0/index.json Page https://api.nuget.org/v3/catalog0/page0.json Leaf https://api.nuget.org/v3/catalog0/data/2015.02.01.06.22.45/adam.jsgenerator.1.1.0.json Explain CatalogDump NuGet.Protocol.Catalog comes from GitHub CatalogProcessor feches all pages between min and max timestamp My implementation BatchCatalogProcessor fetches multiple pages at the same time and build a “latest state” – much faster! Fetches leaves, for every leaf calls into a simple method Much faster, easy to pause (keep track of min/max timestamp)
  • #17: Will use storage queues n demo’s to be able to run things locally. Ideally use SB topics or event grid (transactional)
  • #18: Create a new TimerTrigger function We will need a function to index things from NuGet Timer will trigger every X amount of time Timer provides last timestamp and next timestamp, so we can run our collector for that period Snippet: demo-timertrigger Mention HttpClient not used correctly: not disposed, so will starve TCP connections at some point Go over code example and run it var httpClient = new HttpClient(); var cursor = new InMemoryCursor(timer.ScheduleStatus?.Last ?? DateTimeOffset.UtcNow); var processor = new CatalogProcessor( cursor, new CatalogClient(httpClient, new NullLogger<CatalogClient>()), new DelegatingCatalogLeafProcessor( added => { log.LogInformation("[ADDED] " + added.PackageId + "@" + added.PackageVersion); return Task.FromResult(true); }, deleted => { log.LogInformation("[DELETED] " + deleted.PackageId + "@" + deleted.PackageVersion); return Task.FromResult(true); }), new CatalogProcessorSettings { MinCommitTimestamp = timer.ScheduleStatus?.Last ?? DateTimeOffset.UtcNow, MaxCommitTimestamp = timer.ScheduleStatus?.Next ?? DateTimeOffset.UtcNow, ServiceIndexUrl = "https://api.nuget.org/v3/index.json" }, new NullLogger<CatalogProcessor>()); await processor.ProcessAsync(CancellationToken.None);
  • #19: Each function should only do one thing! We are violating this.
  • #20: https://github.com/thinktecture/azure-functions-extensibility
  • #21: Go over Approach2 code Show this is MUCH simpler – trigger binding that provides input, queue output bindign to write that input to a queue Let’s go over what it takes to build a trigger binding NuGetCatalogTriggerAttribute – the data needed for the trigger to work – go over properties and attributes Hooking it up requires a binding configuration – NuGetCatalogTriggerExtensionConfigProvider It says: if you see this specific binding, register it as a trigger that maps to some provider So we need that provider – NuGetCatalogTriggerAttributeBindingProvider Provider is there to create an object that provides data. In our case we need to store the NuGet catalog timestamp cursor, so we do that on storage, and then return the actual binding – NuGetCatalogTriggerBinding In NuGetCatalogTriggerBinding, we have to specify how data can be bound. What if I use a differnt type of object than PackageOperation? What if someone used a node.js or Python function instead of .NET. Need to define the shape of the data our trigger provides. PackageOperationValueProvider is also interesting, this provides data shown in the portal diagnostics CreateListenerAsync is where the actual triger code will be created – NuGetCatalogListener NuGetCatalogListener uses the BatchCatalogProcessor we had previously, and when a package is added or deleted it will call into the injected ITriggeredFunctionExecutor ITriggeredFunctionExecutor is Azure Functions framework specific, but it’s the glue that will clal into our function with the data we provide Note StartAsync/StopAsync where you can add startup/shutdown code ONE THING LEFT THAT IS NOT DOCUMENTED – Startup.cs to register the binding. And since we are in a different class library, also need Microsoft.Azure.WebJobs.Extensions referenced to generate \bin\Debug\netcoreapp2.1\bin\extensions.json As a result our code is now MUCH cleaner, show it again and maybe also show it in action Mention [Singleton(Mode = SingletonMode.Listener)] – we need to ensure this binding only runs single-instance (cursor clashes otherwise). This is due to ho the catalog works, parallel processing is harder to do. But we can fix that by scaling the Indexer later on. Show Approach3 PopulateQueueAndTable Same code, but a bit more production worthy Sending data to two queues (indexing and downloading) Storing data in a table (and yes, violating “do one thing” again but I call it architectural freedom)
  • #22: Next up will be downloading and indexing. Let’s start with downloading. Grab a copy of the .nupkg from NuGet and store it in a blob Redundancy - no need to re-download/stress NuGet on a re-index
  • #23: Go over Approach3 code DownloadToStorage uses a QueueTrigger to run whenever a message appears in queue Note no singleton: we can scale this across multiple instances/multiple servers Uses a Blob input binding that provides access to a blob Note the parameters, name of the blob is resolved based on data from other inputs which is prety nifty Our code checks whether it’s an add or a delete, and either downloads+uploads to the blob reference, or delets the blob reference
  • #24: Next up will be indexing itself. There are a couple of things here…
  • #26: Go over Approach3 code PackageIndexer uses a QueueTrigger to run whenever a message appears in queue Uses a Blob input binding that provides access to a blob where we can write our indexed entity – will show this later Based on package operation, we will add or delete from the index RunAddPackageAsync has some plumbing, probably too much, to dowload the .nupkg file and store it on disk Note: we store it on disk as we need a seekable stream. So why no memoy stream? Some NuGet packages are HUGE. Find PEReader usage and show how it will index a given package’s public types and namespaces All goes into a typeNames collection. Now: how do we add this info to the index? Show PackageDocument class, has MANY properties First important: the Identifier property has [Key] applied. Azure Search needs a key for teh document so we can retrieve by key, which could be useful when updating existing content or to find a specific document and delete it from the index. Second important: TypeNames is searchable. Also mention “simpleanalyzer”: “Divides text at non-letters and converts them to lower case.” Other analyzers remove stopwords and do other things, this one should be as searchable as possible. Other fields are sometimes searchable, sometimes facetable – a bit of leftover from me thinking about search use cases. The R# API ony searches on typename so could make everything else just retrievable as well. Of course, index is not there by default, so need to create it. We do this when our function is instantiated (static constructor, so only once per launch of our functions) Is this good? Yes, because only once per server instance our function runs on. No because we do it at one point, what if the index is deleted in between and needs to be recreated? Edge case, but a retry strategy could be a good idea... Next, we create our package document, and at one point we add it to a list of index actions, and to blob storage indexActions.Add(IndexAction.MergeOrUpload(packageToIndex)); JsonSerializer.Serialize(jsonWriter, packagesToIndex); Writing to index using batch - var indexBatch = IndexBatch.New(actions); Leftover code from earlier, batch makes no sense for one document, but in case you want to do multiple in one go this is the way. Do beware a batch can only be several MB in size, for this NuGet indexing I can only do ~25 in a batch before payload is too large. That’s… it! Run approach 3 (for last hour) and see functions being hit / packages added to index Go to Azure Search portal as well, show how importer would work in case of fire
  • #28: Now we need to make ReSharper talk to our search. We have the index, so that should be a breeze, right?
  • #29: Go over Web code RunFindTypeApiAsync and RunFindNamespaceAsync Both use “name” as their query parameter to search for RunInternalAsync does the heavy lifting Grabs other parameters Runs search, and collects several pages of results Why is this ForEachAsync there? Search index has multiple versions for every package id, yet ReSharper expects only the latest matching all parameters Azure Search has no group by / distinct by, so need to do this in memory. Doing it here by fetching a maximum number of results and doing the grouping manually. Use the collected data to build result. Add matching type names etc. Example requests: http://localhost:7071/api/v1/find-type?name=JsonConvert http://localhost:7071/api/v1/find-type?name=CamoServer&allowPrerelease=true&latestVersion=false https://nugettypesearch.azurewebsites.net/api/v1/find-type?name=JsonConvert In ReSharper (devenv /ReSharper.Internal, go to NuGet tool window, set base URL to https://nugettypesearch.azurewebsites.net/api/v1/) Write some code that uses JsonConvert / JObject and try it out.