SlideShare a Scribd company logo
Nikos Katirtzis
Software Engineer @ Hotels.com
Improving your team’s source code
searching capabilities
Educational Background
� Meng in Electrical and Computer Engineering (Aristotle University of Thessaloniki)
� MSc in Computer Science (University of Edinburgh)
Working Experience
� Software Engineer at Hotels.com (Expedia Group)
• Part of the team that’s responsible for user authentication and identification (~2 years).
• Recently joined a team that’s exploring and evaluating new technologies.
Projects/Interests
� Developed Mantissa, a TDD code search engine, and CLAMS, an approach for mining API usage
examples from client source code.
� Particularly interested in source code searching/mining.
Who am I?
Part 1 – Searching for source code
• Why you need a source code search engine
• Overview and comparison between the most
popular code search engines
• Recommendations and what you need to
consider
• Recent advances
Presentation structure
Part 2 – Searching for API usage examples
• HApiDoc: A service that mines API usage
examples from client source code
• CLAMS or behind the scenes of HApiDoc
PART 1
Searching for source code
Monoliths are dead, long live microservices!
A monolithic application puts all its
functionality into a single process...
... and scales by replicating the monolith on
multiple servers.
A microservices architecture puts each
element of functionality into a separate
service...
... and scales by distributing these services
across servers, replicating as needed.
Source: https://martinfowler.com/articles/microservices.html
Monoliths are dead, long live microservices?
Monoliths are dead, long live microservices.
I can’t find
the code I’m
looking for!
We need to
buy him a code
search engine.
Why you need a source code search engine?
A. 0
How many searches does the average developer perform on
an internal code search engine on a typical weekday?
B. 1-2
C. 5-10 D. >10
Source: Sadowski, Caitlin, Kathryn T. Stolee, and Sebastian Elbaum. "How developers search for code: a case study."
Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. ACM, 2015. Available at https://research.google.com/pubs/pub43835.html.
“Software engineering is more about reading code than writing it, and part of this process is
finding the code that you should read”. (Han-Wen Nienhuys - author of Zoekt)
� Understand code dependencies in order to avoid breaking changes.
� Fix production issues faster by locating the root cause.
� Find references of hosts/code that will be deprecated.
� Avoid duplicating existing code.
� Share coding solutions and styles.
� Locate security problems (e.g. hardcoded keys/passwords).
Why you need a source code search engine?
Would you go to a souvlaki shop for fish?
Can’t I use my Git hosting service’s search?
� No partial/substring matching.
� Special characters are removed before indexing and are not allowed when searching.
� Case sensitive search not possible.
� No regex.
� Cannot configure or add any features since this is a solution that's integrated into GitHub
Enterprise.
� Best match is based on naive techniques (tf-idf).
� Too much unused space in the results page!
Can’t I use my Git hosting service’s search?
Popular CSEs
Hound
Zoekt
� Searchcode Server is a powerful code search engine with a sleek web user interface.
� Uses Lucene to index code and provides rich additional features.
� Developed by Ben Boyter.
� Originally a public source code search engine (searchcode.com), later the developer created
Searchcode Server which is the enterprise version.
Searchcode Server screenshot
Searchcode Server pros/cons
Pros Cons
 Consistent speed regardless of
search
✘ Limited partial/substring
matching
 User friendly filtering by
repo/user/language
✘ Does not support case-sensitive
matching
 Rich UI ✘ Special characters removed
 Relatively easy to setup,
maintain and monitor
✘ Does not fully support regex
 APIs ✘ Inconsistent search results
Hound
Source:. Rus Cox. “Regular Expression Matching with a Trigram Index or How Google Code Search Worked.” 2015. Available at
https://swtch.com/~rsc/regexp/regexp4.html.
� Hound is an open-source source code search engine which uses a static React frontend that
talks to a Go backend.
� Uses ngrams for indexing and matching.
� Created at Etsy by Kelly Norton and Jonathan Klein.
� Its core is based on Russ Cox’s “Regular Expression Matching with a Trigram Index or How
Google Code Search Worked” article and code.
Hound screenshot
Hound pros/cons
Pros Cons
 Supports regex and substring
searches
✘ Response time heavily depends
on query
 Case-sensitive and case-
insensitive matching
✘ Scalability issues
 File path and repo filtering ✘ Limited searching options
 APIs ✘ Limited monitoring capabilities
 Open-source ✘ Limited additional features
� Zoekt is an open-source fast trigram based source code search engine developed by
Google.
� Uses positional ngrams for indexing and matching.
� Developed at Google by Han-Wen Nienhuys.
� 10x faster than Hound, rich support for filtering.
Zoekt
Coverage Speed Approximate
queries
Filtering Ranking
Zoekt design principles
Zoekt screenshot
Zoekt pros/cons
Pros Cons
 Super fast search, consistent speed
regardless of search
✘ Poor UI
 Sophisticated design and approach for
indexing using position trigrams
✘ Limited monitoring
 Rich searching options, fully supports regex
and substring searches
✘ Limited automation around running
and deploying the service
 Easy to build features on top of it ✘ Limited additional features
 Open-source ✘ No APIs
� Sourcegraph is a fast, solid, full-featured code navigation engine with code intelligence
features by Sourcegraph.
� It leverages git grep to find code and uses Zoekt for indexed searches.
� Its language models implement the Language Server Protocol (LSP) to provide Code
Intelligence features.
� Developed by Sourcegraph, a company often referred to as the “Google for Code”.
Did you know?
• The open-source Sourcegraph browser extension adds code intelligence to files and diffs on
GitHub, GitHub Enterprise, Phabricator, and Bitbucket Server for free!
Sourcegraph demo
Sourcegraph pros/cons
Pros Cons
 Rich features inc. cross-
reference and semantic search
✘ Requires many resources to run
properly
 Excellent support (company
dedicated on that)
✘ Constantly changing price
 Excellent documentation around
setting it up and running it.
✘ Sourcegraph Enterprise price
per user
 Numerous plugins (e.g. for text
editors, IDEs, browsers)
 Core version free
Comparison
Searchcode
Server
Hound Zoekt Sourcegraph
Speed    *
Scalability    *
Searching
options    
Additional
features    
Maintainability    
Support    
License/Price    **
*Unless deployed to a cluster, the service is relatively slow when compared to its alternatives.
**The basic version of the software is free, but companies would need the Enterprise version for which there’s a cost per user. Basic version lacks indexed search and
cluster deployments.
Use:
� Searchcode Server: if you’re looking for a more generic search engine, that can be easily
maintained and monitored.
� Hound: No reason to use Hound instead of Zoekt.
� Zoekt: if you focus on speed, scalability, and search options.
� Sourcegraph: if you want to invest on a source code search engine and need additional
features such as code intelligence and integrations.
Recommendations
✍ Use a local SSD (instance store volume) since these
services constantly hit the disk.
✍ Use permanent storage to store the cloned repos. This
allows you to achieve near zero-downtime in case the
instance goes down.
✍ Make sure you understand and experiment when configuring the services, i.e.
in many services you’ll need to set limits for max file size,
max lines, etc.
✍ Monitor logs for errors.
✍ Remember, bad documents can always slow down your service!
Things to consider
EC2 with
instance store
index
Repos
Logs
EBS
✌ GitHub is experimenting with semantic code search1,2.
✌ Microsoft offers semantic code search for Azure repos in Azure DevOps Services and TFS3.
✌ Google offers fast code search for its Cloud Source Repositories. It’s searching options are
quite similar to Zoekt’s4. Plus Google’s Bazel code search.
✌ Sourcegraph becomes more and more popular by adding more languages to its Code
Intelligence feature (thanks to Microsoft’s Language Server Protocol) and by providing more
integrations and open-source browser extensions.
Recent advances
1 “Towards Natural Language Semantic Code Search”: https://githubengineering.com/towards-natural-language-semantic-code-search/
2 “GitHub experiments; semantic code search.”: https://experiments.github.com/semantic-code-search
3 “Search across all your code and work items”: https://docs.microsoft.com/en-us/azure/devops/project/search/overview?view=vsts&tabs=new-nav
4 “Searching for code”: https://cloud.google.com/source-repositories/docs/searching-code
PART 2
Searching for API usage examples
The Problem
� Lack of proper documentation for the APIs
� How can I use this API/API method?
� Creating API usage examples is time-consuming
The Concept
✌ What if we mine examples from client source code?
✌ Would be nice to cluster results
✌ And show the most indicative example(s) of each cluster
✌ And provide a summarised version of the most indicative
example(s)
CLAMS
HApiDoc Architecture
HApiDoc UI
Behind the scenes - CLAMS
✌ HApiDoc is getting open-sourced! Looking for contributors!
✌ Take a look at our GitHub space: https://github.com/HotelsDotCom
✌ Presentation material: https://github.com/nikos912000/voxxed-thes-material
✌ Using any other code search engines? Let us know!
PS: We’re hiring!
Closing Notes
Thank you!
nikos912000@notmail.com @nikos912000
www.linkedin.com/in/nkatirtzis nikos912000
1 Icon made by Gregor Cresnar from www.flaticon.com.
2 Icon made by Freepik from www.flaticon.com.
3 Icon made by Freepik from www.flaticon.com.
4 Icon made by Pixel Perfect from www.flaticon.com.
1 2
3 4

More Related Content

What's hot (11)

PPTX
Cloud Security Monitoring and Spark Analytics
amesar0
 
PPTX
Introduction to shodan
n|u - The Open Security Community
 
PPTX
Vonk fhir facade (christiaan)
DevDays
 
PDF
Evaluating Recommended Applications
rsse2008
 
PDF
Webinar: Simpler Semantic Search with Solr
Lucidworks
 
PPTX
How to Manage Open Source requirements with AboutCode
nexB Inc.
 
PPTX
Best practice recommendations for utilizing open source software (from a lega...
Rogue Wave Software
 
PDF
Moving into API documentation writing
Ellis Pratt
 
PDF
An Ultimate Guide To Hire Python Developer
RishiVardhaniM
 
PPTX
REST Coder: Auto Generating Client Stubs and Documentation for REST APIs
Hiranya Jayathilaka
 
PPTX
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
State of Search Conference
 
Cloud Security Monitoring and Spark Analytics
amesar0
 
Introduction to shodan
n|u - The Open Security Community
 
Vonk fhir facade (christiaan)
DevDays
 
Evaluating Recommended Applications
rsse2008
 
Webinar: Simpler Semantic Search with Solr
Lucidworks
 
How to Manage Open Source requirements with AboutCode
nexB Inc.
 
Best practice recommendations for utilizing open source software (from a lega...
Rogue Wave Software
 
Moving into API documentation writing
Ellis Pratt
 
An Ultimate Guide To Hire Python Developer
RishiVardhaniM
 
REST Coder: Auto Generating Client Stubs and Documentation for REST APIs
Hiranya Jayathilaka
 
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
State of Search Conference
 

Similar to Improving your team's source code searching capabilities - Voxxed Thessaloniki 2018 (20)

PDF
System design for Web Application
Michael Choi
 
PDF
Open Source Security and ChatGPT-Published.pdf
Javier Perez
 
PPTX
The Hacking Game - Think Like a Hacker Meetup 12072023.pptx
lior mazor
 
PDF
Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023
Vadym Kazulkin
 
PPTX
GitHub Copilot.pptx
Luis Beltran
 
PPT
Tools to Find Source Code on the Web
rgallard
 
PPTX
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy Cabral
 
PPTX
Publishing strategies for API documentation
Tom Johnson
 
PPTX
Towards Reusable Research Software
dgarijo
 
PDF
What would Jesus Developer do?
Lukáš Čech
 
PPTX
RAG Techniques – for engineering student
ÑïshĶãrsʜ Shäh
 
PPTX
Docs as Part of the Product - Open Source Summit North America 2018
Den Delimarsky
 
PDF
[Russia] Bugs -> max, time <= T
OWASP EEE
 
PPTX
Rightsizing Open Source Software Identification
nexB Inc.
 
PPTX
IBM Developer Model Asset eXchange - Deep Learning for Everyone
Nick Pentreath
 
PPTX
API workshop: Introduction to APIs (TC Camp)
Tom Johnson
 
PDF
Top 10 python frameworks for web development in 2020
Alaina Carter
 
PDF
File000162
Desmond Devendran
 
PDF
"Different software evolutions from Start till Release in PHP product" Oleksa...
Fwdays
 
PDF
PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...
Alexandr Savchenko
 
System design for Web Application
Michael Choi
 
Open Source Security and ChatGPT-Published.pdf
Javier Perez
 
The Hacking Game - Think Like a Hacker Meetup 12072023.pptx
lior mazor
 
Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023
Vadym Kazulkin
 
GitHub Copilot.pptx
Luis Beltran
 
Tools to Find Source Code on the Web
rgallard
 
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy Cabral
 
Publishing strategies for API documentation
Tom Johnson
 
Towards Reusable Research Software
dgarijo
 
What would Jesus Developer do?
Lukáš Čech
 
RAG Techniques – for engineering student
ÑïshĶãrsʜ Shäh
 
Docs as Part of the Product - Open Source Summit North America 2018
Den Delimarsky
 
[Russia] Bugs -> max, time <= T
OWASP EEE
 
Rightsizing Open Source Software Identification
nexB Inc.
 
IBM Developer Model Asset eXchange - Deep Learning for Everyone
Nick Pentreath
 
API workshop: Introduction to APIs (TC Camp)
Tom Johnson
 
Top 10 python frameworks for web development in 2020
Alaina Carter
 
File000162
Desmond Devendran
 
"Different software evolutions from Start till Release in PHP product" Oleksa...
Fwdays
 
PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...
Alexandr Savchenko
 
Ad

Recently uploaded (20)

PPTX
Farrell__10e_ch04_PowerPoint.pptx Programming Logic and Design slides
bashnahara11
 
PPT
Activate_Methodology_Summary presentatio
annapureddyn
 
PPTX
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
PPTX
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PDF
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
PPTX
Presentation about variables and constant.pptx
kr2589474
 
PDF
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
PDF
AI Image Enhancer: Revolutionizing Visual Quality”
docmasoom
 
PDF
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
PDF
Why Are More Businesses Choosing Partners Over Freelancers for Salesforce.pdf
Cymetrix Software
 
PPTX
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
PDF
AWS_Agentic_AI_in_Indian_BFSI_A_Strategic_Blueprint_for_Customer.pdf
siddharthnetsavvies
 
PDF
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
PPTX
Employee salary prediction using Machine learning Project template.ppt
bhanuk27082004
 
PDF
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
PPTX
Role Of Python In Programing Language.pptx
jaykoshti048
 
PDF
Supabase Meetup: Build in a weekend, scale to millions
Carlo Gilmar Padilla Santana
 
PDF
Infrastructure planning and resilience - Keith Hastings.pptx.pdf
Safe Software
 
Farrell__10e_ch04_PowerPoint.pptx Programming Logic and Design slides
bashnahara11
 
Activate_Methodology_Summary presentatio
annapureddyn
 
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
Presentation about variables and constant.pptx
kr2589474
 
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
AI Image Enhancer: Revolutionizing Visual Quality”
docmasoom
 
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
Why Are More Businesses Choosing Partners Over Freelancers for Salesforce.pdf
Cymetrix Software
 
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
AWS_Agentic_AI_in_Indian_BFSI_A_Strategic_Blueprint_for_Customer.pdf
siddharthnetsavvies
 
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
Employee salary prediction using Machine learning Project template.ppt
bhanuk27082004
 
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
Role Of Python In Programing Language.pptx
jaykoshti048
 
Supabase Meetup: Build in a weekend, scale to millions
Carlo Gilmar Padilla Santana
 
Infrastructure planning and resilience - Keith Hastings.pptx.pdf
Safe Software
 
Ad

Improving your team's source code searching capabilities - Voxxed Thessaloniki 2018

  • 1. Nikos Katirtzis Software Engineer @ Hotels.com Improving your team’s source code searching capabilities
  • 2. Educational Background � Meng in Electrical and Computer Engineering (Aristotle University of Thessaloniki) � MSc in Computer Science (University of Edinburgh) Working Experience � Software Engineer at Hotels.com (Expedia Group) • Part of the team that’s responsible for user authentication and identification (~2 years). • Recently joined a team that’s exploring and evaluating new technologies. Projects/Interests � Developed Mantissa, a TDD code search engine, and CLAMS, an approach for mining API usage examples from client source code. � Particularly interested in source code searching/mining. Who am I?
  • 3. Part 1 – Searching for source code • Why you need a source code search engine • Overview and comparison between the most popular code search engines • Recommendations and what you need to consider • Recent advances Presentation structure Part 2 – Searching for API usage examples • HApiDoc: A service that mines API usage examples from client source code • CLAMS or behind the scenes of HApiDoc
  • 4. PART 1 Searching for source code
  • 5. Monoliths are dead, long live microservices! A monolithic application puts all its functionality into a single process... ... and scales by replicating the monolith on multiple servers. A microservices architecture puts each element of functionality into a separate service... ... and scales by distributing these services across servers, replicating as needed. Source: https://martinfowler.com/articles/microservices.html
  • 6. Monoliths are dead, long live microservices?
  • 7. Monoliths are dead, long live microservices. I can’t find the code I’m looking for! We need to buy him a code search engine.
  • 8. Why you need a source code search engine? A. 0 How many searches does the average developer perform on an internal code search engine on a typical weekday? B. 1-2 C. 5-10 D. >10 Source: Sadowski, Caitlin, Kathryn T. Stolee, and Sebastian Elbaum. "How developers search for code: a case study." Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. ACM, 2015. Available at https://research.google.com/pubs/pub43835.html.
  • 9. “Software engineering is more about reading code than writing it, and part of this process is finding the code that you should read”. (Han-Wen Nienhuys - author of Zoekt) � Understand code dependencies in order to avoid breaking changes. � Fix production issues faster by locating the root cause. � Find references of hosts/code that will be deprecated. � Avoid duplicating existing code. � Share coding solutions and styles. � Locate security problems (e.g. hardcoded keys/passwords). Why you need a source code search engine?
  • 10. Would you go to a souvlaki shop for fish? Can’t I use my Git hosting service’s search?
  • 11. � No partial/substring matching. � Special characters are removed before indexing and are not allowed when searching. � Case sensitive search not possible. � No regex. � Cannot configure or add any features since this is a solution that's integrated into GitHub Enterprise. � Best match is based on naive techniques (tf-idf). � Too much unused space in the results page! Can’t I use my Git hosting service’s search?
  • 13. � Searchcode Server is a powerful code search engine with a sleek web user interface. � Uses Lucene to index code and provides rich additional features. � Developed by Ben Boyter. � Originally a public source code search engine (searchcode.com), later the developer created Searchcode Server which is the enterprise version.
  • 15. Searchcode Server pros/cons Pros Cons  Consistent speed regardless of search ✘ Limited partial/substring matching  User friendly filtering by repo/user/language ✘ Does not support case-sensitive matching  Rich UI ✘ Special characters removed  Relatively easy to setup, maintain and monitor ✘ Does not fully support regex  APIs ✘ Inconsistent search results
  • 16. Hound Source:. Rus Cox. “Regular Expression Matching with a Trigram Index or How Google Code Search Worked.” 2015. Available at https://swtch.com/~rsc/regexp/regexp4.html. � Hound is an open-source source code search engine which uses a static React frontend that talks to a Go backend. � Uses ngrams for indexing and matching. � Created at Etsy by Kelly Norton and Jonathan Klein. � Its core is based on Russ Cox’s “Regular Expression Matching with a Trigram Index or How Google Code Search Worked” article and code.
  • 18. Hound pros/cons Pros Cons  Supports regex and substring searches ✘ Response time heavily depends on query  Case-sensitive and case- insensitive matching ✘ Scalability issues  File path and repo filtering ✘ Limited searching options  APIs ✘ Limited monitoring capabilities  Open-source ✘ Limited additional features
  • 19. � Zoekt is an open-source fast trigram based source code search engine developed by Google. � Uses positional ngrams for indexing and matching. � Developed at Google by Han-Wen Nienhuys. � 10x faster than Hound, rich support for filtering. Zoekt
  • 20. Coverage Speed Approximate queries Filtering Ranking Zoekt design principles
  • 22. Zoekt pros/cons Pros Cons  Super fast search, consistent speed regardless of search ✘ Poor UI  Sophisticated design and approach for indexing using position trigrams ✘ Limited monitoring  Rich searching options, fully supports regex and substring searches ✘ Limited automation around running and deploying the service  Easy to build features on top of it ✘ Limited additional features  Open-source ✘ No APIs
  • 23. � Sourcegraph is a fast, solid, full-featured code navigation engine with code intelligence features by Sourcegraph. � It leverages git grep to find code and uses Zoekt for indexed searches. � Its language models implement the Language Server Protocol (LSP) to provide Code Intelligence features. � Developed by Sourcegraph, a company often referred to as the “Google for Code”. Did you know? • The open-source Sourcegraph browser extension adds code intelligence to files and diffs on GitHub, GitHub Enterprise, Phabricator, and Bitbucket Server for free!
  • 25. Sourcegraph pros/cons Pros Cons  Rich features inc. cross- reference and semantic search ✘ Requires many resources to run properly  Excellent support (company dedicated on that) ✘ Constantly changing price  Excellent documentation around setting it up and running it. ✘ Sourcegraph Enterprise price per user  Numerous plugins (e.g. for text editors, IDEs, browsers)  Core version free
  • 26. Comparison Searchcode Server Hound Zoekt Sourcegraph Speed    * Scalability    * Searching options     Additional features     Maintainability     Support     License/Price    ** *Unless deployed to a cluster, the service is relatively slow when compared to its alternatives. **The basic version of the software is free, but companies would need the Enterprise version for which there’s a cost per user. Basic version lacks indexed search and cluster deployments.
  • 27. Use: � Searchcode Server: if you’re looking for a more generic search engine, that can be easily maintained and monitored. � Hound: No reason to use Hound instead of Zoekt. � Zoekt: if you focus on speed, scalability, and search options. � Sourcegraph: if you want to invest on a source code search engine and need additional features such as code intelligence and integrations. Recommendations
  • 28. ✍ Use a local SSD (instance store volume) since these services constantly hit the disk. ✍ Use permanent storage to store the cloned repos. This allows you to achieve near zero-downtime in case the instance goes down. ✍ Make sure you understand and experiment when configuring the services, i.e. in many services you’ll need to set limits for max file size, max lines, etc. ✍ Monitor logs for errors. ✍ Remember, bad documents can always slow down your service! Things to consider EC2 with instance store index Repos Logs EBS
  • 29. ✌ GitHub is experimenting with semantic code search1,2. ✌ Microsoft offers semantic code search for Azure repos in Azure DevOps Services and TFS3. ✌ Google offers fast code search for its Cloud Source Repositories. It’s searching options are quite similar to Zoekt’s4. Plus Google’s Bazel code search. ✌ Sourcegraph becomes more and more popular by adding more languages to its Code Intelligence feature (thanks to Microsoft’s Language Server Protocol) and by providing more integrations and open-source browser extensions. Recent advances 1 “Towards Natural Language Semantic Code Search”: https://githubengineering.com/towards-natural-language-semantic-code-search/ 2 “GitHub experiments; semantic code search.”: https://experiments.github.com/semantic-code-search 3 “Search across all your code and work items”: https://docs.microsoft.com/en-us/azure/devops/project/search/overview?view=vsts&tabs=new-nav 4 “Searching for code”: https://cloud.google.com/source-repositories/docs/searching-code
  • 30. PART 2 Searching for API usage examples
  • 31. The Problem � Lack of proper documentation for the APIs � How can I use this API/API method? � Creating API usage examples is time-consuming The Concept ✌ What if we mine examples from client source code? ✌ Would be nice to cluster results ✌ And show the most indicative example(s) of each cluster ✌ And provide a summarised version of the most indicative example(s) CLAMS
  • 34. Behind the scenes - CLAMS
  • 35. ✌ HApiDoc is getting open-sourced! Looking for contributors! ✌ Take a look at our GitHub space: https://github.com/HotelsDotCom ✌ Presentation material: https://github.com/nikos912000/voxxed-thes-material ✌ Using any other code search engines? Let us know! PS: We’re hiring! Closing Notes
  • 36. Thank you! [email protected] @nikos912000 www.linkedin.com/in/nkatirtzis nikos912000 1 Icon made by Gregor Cresnar from www.flaticon.com. 2 Icon made by Freepik from www.flaticon.com. 3 Icon made by Freepik from www.flaticon.com. 4 Icon made by Pixel Perfect from www.flaticon.com. 1 2 3 4

Editor's Notes

  • #6: Monoliths vs microservices Hotels.com’s website Shift brings advantages + challenges
  • #9: Survey at Google, 2015
  • #10: Dependencies; hundreds of microservices Prod issues: bridge between Splunk and BitBucket/GitHub Host: make sure your services don’t rely on servers you aim to kill
  • #11: Git hosting services don’t specialize on source code searching
  • #12: Substring matching; day-Monday Special characters; urls, xmls Search results page non-optimal in terms of content fitting
  • #17: Hound uses 3grams Ngrams: powerful concept for approximate matching. Ngrams are contiguous sequences of n items from a given sample of text.
  • #21: Coverage; the code that is of interest to you should be available for searching. Speed; Implements sophisticated techniques and by using an index which is based on positional ngrams. Approximate queries; performs substring or partial matching case-insensitively but it also gives you the option for case sensitive searches. Filtering; allows you to filter queries by adding extra atoms and filter out terms with the minus symbol (-). Ranking; uses ctags to find declarations such us class or method definitions and variable declarations, which are then boosted in the search ranking.
  • #26: Cannot be fairly compared to any existing solutions. You’ll need the Enterprise version for Enterprise environments. Charges per user. Prices keep changing.
  • #30: Until last year it was hard to convince someone that they should invest on source code searching. There was limited interest and progress in that area.
  • #33: Orchestrate any required tasks and to act as an end-to-end solution for mining usage examples using CLAMS Automates the filtering step which is the most time-consuming task for systems like CLAMS, stores results to a MongoDB instance and shows these on a neat web service.
  • #34: Type the fully qualified name of the method Returns summarized snippets alongside with information such as: Repository name where method is called Support; number of other client methods that are calling your API method in a similar way Additional calls to the same API