SlideShare a Scribd company logo
Algorithms 101 for Data
Scientists
Presented by Chris Conlan and Janice McMahon
Bethesda Data Science Meetup
Disclaimer
• This is a short presentation.
Story Time
Definitions
The computational complexity, or simply complexity, of
an algorithm describes the amount of resources required to run it
relative to its inputs.
The term complexity, when standing alone, typically refers to time
complexity, or the relationship between the runtime of an algorithm
and its inputs.
Big Oh Notation in Algorithms
When we say …
It is because …
For some arbitrary constant 𝑀.
𝑔 𝑥 is essentially growth behavior of 𝑓 𝑥 .
See Wikipedia for a mathematically rigorous definition.
This is a simplification for use with algorithms.
𝑓 𝑥 = 𝑂 𝑔 𝑥
𝑓 𝑥 ≤ 𝑀𝑔 𝑥 as 𝑥 → ∞
Big Oh Notation in Algorithms
If 𝑓 𝑥 describes the runtime of an algorithm for input of size 𝑥, then
𝑂 𝑔 𝑥 describes the growth behavior of the runtime.
So, given an input of size 𝑛, we might say things like …
• My algorithm here is 𝑂(1)
• Your algorithm there is 𝑂(𝑛)
• My neighbor’s algorithm is 𝑂(log(𝑛))
• My dog’s algorithm is 𝑂(𝑛2)
Common Computational Complexities
Algorithm Input dimensions Complexity
Adding two numbers together 2 𝑂 1
Summing a vector of numbers 𝑛 𝑂 𝑛
Sorting a list (fastest known method) 𝑛 𝑂 𝑛 ∗ 𝑙𝑜𝑔(𝑛)
Finding an item in an unsorted list 𝑛 𝑂 𝑛
Finding an item in a sorted list (binary search) 𝑛 𝑂 𝑙𝑜𝑔(𝑛)
Finding an item in a hash table 𝑛 𝑂 1
Matrix multiplication (fastest known method) two 𝑛 × 𝑛 matrices 𝑂 𝑛2.373
Matrix inversion (fastest known method) 𝑛 × 𝑛 matrix 𝑂 𝑛2.373
Common Computational Complexities
Algorithm Python examples
Adding two numbers together x + y
Summing a vector of numbers sum(x)
Sorting a list v.sort()
Finding an item in an unsorted list x in v v.index(x)
Finding an item in a sorted list Complicated. See bisect library.
Finding an item in a hash table d[x] x in d
Matrix multiplication A.dot(B)
Matrix inversion np.linalg.inv(A)
Hidden Complexity
• Computational complexity is very well-studied.
• There are many optimal algorithms seamlessly integrated into
modern programming languages.
• It is still easy to accidentally write suboptimal code.
Compute the cumulative sum 𝑂 𝑛2
Compute the cumulative sum 𝑂 𝑛2
Compute the cumulative sum 𝑂 𝑛
Compute the cumulative sum 𝑂 𝑛
Don’t do this
Instead, do this
function n_values t_milliseconds values_per_ms
0 slow_cumulative_sum 10 0.040 252
1 slow_cumulative_sum 100 0.763 131
2 slow_cumulative_sum 1000 58.234 17
3 slow_cumulative_sum 10000 4846.866 2
4 slow_cumulative_sum_expanded 10 0.028 361
5 slow_cumulative_sum_expanded 100 0.991 101
6 slow_cumulative_sum_expanded 1000 93.545 11
7 slow_cumulative_sum_expanded 10000 9242.796 1
8 fast_cumulative_sum 10 0.016 610
9 fast_cumulative_sum 100 0.042 2358
10 fast_cumulative_sum 1000 0.311 3219
11 fast_cumulative_sum 10000 2.243 4459
12 fast_cumulative_sum 100000 18.902 5290
13 fast_cumulative_sum 1000000 197.117 5073
14 fast_cumulative_sum 10000000 1981.922 5046
15 fast_cumulative_sum 100000000 20699.219 4831
16 pandas_fast_cumulative_sum 10 0.409 24
17 pandas_fast_cumulative_sum 100 0.346 289
18 pandas_fast_cumulative_sum 1000 0.349 2863
19 pandas_fast_cumulative_sum 10000 0.336 29727
20 pandas_fast_cumulative_sum 100000 1.665 60049
21 pandas_fast_cumulative_sum 1000000 14.201 70420
22 pandas_fast_cumulative_sum 10000000 123.686 80850
23 pandas_fast_cumulative_sum 100000000 1031.409 96955
https://github.com/chrisconlan/complexity-studies/blob/master/cumulative_sum.py
Algorithms 101 for Data Scientists
Algorithms 101 for Data Scientists
There are countless examples of hidden complexity
Count the frequency of words in a book
function n_values t_milliseconds values_per_ms
0 slow_count_occurences 10 0.010 1031
1 slow_count_occurences 100 0.675 148
2 slow_count_occurences 1000 39.999 25
3 slow_count_occurences 10000 3169.300 3
4 fast_count_occurences 10 0.011 893
5 fast_count_occurences 100 0.042 2387
6 fast_count_occurences 1000 0.218 4591
7 fast_count_occurences 10000 1.988 5031
8 fast_count_occurences 100000 26.933 3713
9 fast_count_occurences 1000000 373.540 2677
10 fast_count_occurences 10000000 4816.169 2076
11 pandas_fast_count_occurences 10 1.372 7
12 pandas_fast_count_occurences 100 0.954 105
13 pandas_fast_count_occurences 1000 1.530 654
14 pandas_fast_count_occurences 10000 4.976 2010
15 pandas_fast_count_occurences 100000 74.877 1336
16 pandas_fast_count_occurences 1000000 1123.875 890
17 pandas_fast_count_occurences 10000000 12356.014 809
https://github.com/chrisconlan/complexity-studies/blob/master/count_occurences.py
Algorithms 101 for Data Scientists
Algorithms 101 for Data Scientists
Find the words in a first list that are in a second list
function n_values t_milliseconds values_per_ms
0 slow_match_within 10 0.006 1639
1 slow_match_within 100 0.192 520
2 slow_match_within 1000 16.793 60
3 slow_match_within 10000 1577.224 6
4 fast_match_within 10 0.010 1010
5 fast_match_within 100 0.027 3690
6 fast_match_within 1000 0.196 5110
7 fast_match_within 10000 1.931 5180
8 fast_match_within 100000 37.496 2667
9 fast_match_within 1000000 533.951 1873
10 fast_match_within 10000000 6079.538 1645
11 fast_intersection 10 0.013 769
12 fast_intersection 100 0.026 3774
13 fast_intersection 1000 0.159 6277
14 fast_intersection 10000 1.967 5083
15 fast_intersection 100000 27.306 3662
16 fast_intersection 1000000 408.962 2445
17 fast_intersection 10000000 4790.723 2087
https://github.com/chrisconlan/complexity-studies/blob/master/match_within.py
Algorithms 101 for Data Scientists
Algorithms 101 for Data Scientists

More Related Content

What's hot (20)

PPTX
Programming in python
Ivan Rojas
 
PDF
Algorithem complexity in data sructure
Kumar
 
PPT
Big oh Representation Used in Time complexities
LAKSHMITHARUN PONNAM
 
PPTX
Computer Science Assignment Help
Programming Homework Help
 
PPTX
Linear Algebra and Matlab tutorial
Jia-Bin Huang
 
PPTX
Big O Notation
Marcello Missiroli
 
PDF
Parallel Algorithms
Dr Sandeep Kumar Poonia
 
PPTX
Data streaming algorithms
Sandeep Joshi
 
PPTX
Clustering techniques
talktoharry
 
PDF
Faster persistent data structures through hashing
Johan Tibell
 
PPTX
Introduction to data_structure
Ashim Lamichhane
 
PDF
Parallel Algorithms
Dr Sandeep Kumar Poonia
 
PPTX
Big o notation
keb97
 
PDF
DATA STRUCTURE AND ALGORITHM FULL NOTES
Aniruddha Paul
 
PPTX
Search algorithms master
Hossam Hassan
 
PPT
Data structure lecture 2
Kumar
 
PDF
Data Streaming Algorithms
宇 傅
 
PPTX
Differential privacy without sensitivity [NIPS2016読み会資料]
Kentaro Minami
 
PPTX
Analysis of algorithms
iqbalphy1
 
Programming in python
Ivan Rojas
 
Algorithem complexity in data sructure
Kumar
 
Big oh Representation Used in Time complexities
LAKSHMITHARUN PONNAM
 
Computer Science Assignment Help
Programming Homework Help
 
Linear Algebra and Matlab tutorial
Jia-Bin Huang
 
Big O Notation
Marcello Missiroli
 
Parallel Algorithms
Dr Sandeep Kumar Poonia
 
Data streaming algorithms
Sandeep Joshi
 
Clustering techniques
talktoharry
 
Faster persistent data structures through hashing
Johan Tibell
 
Introduction to data_structure
Ashim Lamichhane
 
Parallel Algorithms
Dr Sandeep Kumar Poonia
 
Big o notation
keb97
 
DATA STRUCTURE AND ALGORITHM FULL NOTES
Aniruddha Paul
 
Search algorithms master
Hossam Hassan
 
Data structure lecture 2
Kumar
 
Data Streaming Algorithms
宇 傅
 
Differential privacy without sensitivity [NIPS2016読み会資料]
Kentaro Minami
 
Analysis of algorithms
iqbalphy1
 

Similar to Algorithms 101 for Data Scientists (20)

PPTX
Searching.pptx
VenkataRaoS1
 
PPTX
19. algorithms and-complexity
showkat27
 
PPTX
19. algorithms and-complexity
ashishtinku
 
PDF
Sienna 2 analysis
chidabdu
 
PDF
Abstract Data Types - understanding Complexity
saxsql
 
PPT
19 algorithms-and-complexity-110627100203-phpapp02
Muhammad Aslam
 
PPTX
Intro to super. advance algorithm..pptx
ManishBaranwal10
 
PPT
Stacks queues lists
Luis Goldster
 
PPT
Stack squeues lists
James Wong
 
PPT
Stacks queues lists
Young Alista
 
PPT
Stacksqueueslists
Fraboni Ec
 
PPT
Stacks queues lists
Tony Nguyen
 
PPT
Stacks queues lists
Harry Potter
 
PPT
Cs1311lecture23wdl
Muhammad Wasif
 
PPT
Profiling and optimization
g3_nittala
 
PPT
Data Structures- Part2 analysis tools
Abdullah Al-hazmy
 
PDF
jn;lm;lkm';m';;lmppt of data structure.pdf
VinayNassa3
 
PDF
Iare ds ppt_3
AlugatiRajitha
 
PPTX
TIME EXECUTION OF DIFFERENT SORTED ALGORITHMS
Tanya Makkar
 
PPT
time complexity of algorithm presentation
ipl03saanvia
 
Searching.pptx
VenkataRaoS1
 
19. algorithms and-complexity
showkat27
 
19. algorithms and-complexity
ashishtinku
 
Sienna 2 analysis
chidabdu
 
Abstract Data Types - understanding Complexity
saxsql
 
19 algorithms-and-complexity-110627100203-phpapp02
Muhammad Aslam
 
Intro to super. advance algorithm..pptx
ManishBaranwal10
 
Stacks queues lists
Luis Goldster
 
Stack squeues lists
James Wong
 
Stacks queues lists
Young Alista
 
Stacksqueueslists
Fraboni Ec
 
Stacks queues lists
Tony Nguyen
 
Stacks queues lists
Harry Potter
 
Cs1311lecture23wdl
Muhammad Wasif
 
Profiling and optimization
g3_nittala
 
Data Structures- Part2 analysis tools
Abdullah Al-hazmy
 
jn;lm;lkm';m';;lmppt of data structure.pdf
VinayNassa3
 
Iare ds ppt_3
AlugatiRajitha
 
TIME EXECUTION OF DIFFERENT SORTED ALGORITHMS
Tanya Makkar
 
time complexity of algorithm presentation
ipl03saanvia
 
Ad

More from Christopher Conlan (7)

PPTX
Fast Python: Master the Basics to Write Faster Code
Christopher Conlan
 
PPTX
Algorithms 101 for Data Scientists (Part 2)
Christopher Conlan
 
PPTX
Hiring in the Software & Data Science Sector - D.C. Metro Area
Christopher Conlan
 
PPTX
Beyond Moneyball: Data Science for Baseball in 2019
Christopher Conlan
 
PDF
Cooperative Machine Learning Network with Ahmed Masud of saf.ai
Christopher Conlan
 
PPTX
Data Visualization for the Web - How to Get Started
Christopher Conlan
 
PPTX
Data Science Applications in Finance and Investing
Christopher Conlan
 
Fast Python: Master the Basics to Write Faster Code
Christopher Conlan
 
Algorithms 101 for Data Scientists (Part 2)
Christopher Conlan
 
Hiring in the Software & Data Science Sector - D.C. Metro Area
Christopher Conlan
 
Beyond Moneyball: Data Science for Baseball in 2019
Christopher Conlan
 
Cooperative Machine Learning Network with Ahmed Masud of saf.ai
Christopher Conlan
 
Data Visualization for the Web - How to Get Started
Christopher Conlan
 
Data Science Applications in Finance and Investing
Christopher Conlan
 
Ad

Recently uploaded (20)

PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
July Patch Tuesday
Ivanti
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 

Algorithms 101 for Data Scientists

  • 1. Algorithms 101 for Data Scientists Presented by Chris Conlan and Janice McMahon Bethesda Data Science Meetup
  • 2. Disclaimer • This is a short presentation.
  • 4. Definitions The computational complexity, or simply complexity, of an algorithm describes the amount of resources required to run it relative to its inputs. The term complexity, when standing alone, typically refers to time complexity, or the relationship between the runtime of an algorithm and its inputs.
  • 5. Big Oh Notation in Algorithms When we say … It is because … For some arbitrary constant 𝑀. 𝑔 𝑥 is essentially growth behavior of 𝑓 𝑥 . See Wikipedia for a mathematically rigorous definition. This is a simplification for use with algorithms. 𝑓 𝑥 = 𝑂 𝑔 𝑥 𝑓 𝑥 ≤ 𝑀𝑔 𝑥 as 𝑥 → ∞
  • 6. Big Oh Notation in Algorithms If 𝑓 𝑥 describes the runtime of an algorithm for input of size 𝑥, then 𝑂 𝑔 𝑥 describes the growth behavior of the runtime. So, given an input of size 𝑛, we might say things like … • My algorithm here is 𝑂(1) • Your algorithm there is 𝑂(𝑛) • My neighbor’s algorithm is 𝑂(log(𝑛)) • My dog’s algorithm is 𝑂(𝑛2)
  • 7. Common Computational Complexities Algorithm Input dimensions Complexity Adding two numbers together 2 𝑂 1 Summing a vector of numbers 𝑛 𝑂 𝑛 Sorting a list (fastest known method) 𝑛 𝑂 𝑛 ∗ 𝑙𝑜𝑔(𝑛) Finding an item in an unsorted list 𝑛 𝑂 𝑛 Finding an item in a sorted list (binary search) 𝑛 𝑂 𝑙𝑜𝑔(𝑛) Finding an item in a hash table 𝑛 𝑂 1 Matrix multiplication (fastest known method) two 𝑛 × 𝑛 matrices 𝑂 𝑛2.373 Matrix inversion (fastest known method) 𝑛 × 𝑛 matrix 𝑂 𝑛2.373
  • 8. Common Computational Complexities Algorithm Python examples Adding two numbers together x + y Summing a vector of numbers sum(x) Sorting a list v.sort() Finding an item in an unsorted list x in v v.index(x) Finding an item in a sorted list Complicated. See bisect library. Finding an item in a hash table d[x] x in d Matrix multiplication A.dot(B) Matrix inversion np.linalg.inv(A)
  • 9. Hidden Complexity • Computational complexity is very well-studied. • There are many optimal algorithms seamlessly integrated into modern programming languages. • It is still easy to accidentally write suboptimal code.
  • 10. Compute the cumulative sum 𝑂 𝑛2
  • 11. Compute the cumulative sum 𝑂 𝑛2
  • 12. Compute the cumulative sum 𝑂 𝑛
  • 13. Compute the cumulative sum 𝑂 𝑛
  • 16. function n_values t_milliseconds values_per_ms 0 slow_cumulative_sum 10 0.040 252 1 slow_cumulative_sum 100 0.763 131 2 slow_cumulative_sum 1000 58.234 17 3 slow_cumulative_sum 10000 4846.866 2 4 slow_cumulative_sum_expanded 10 0.028 361 5 slow_cumulative_sum_expanded 100 0.991 101 6 slow_cumulative_sum_expanded 1000 93.545 11 7 slow_cumulative_sum_expanded 10000 9242.796 1 8 fast_cumulative_sum 10 0.016 610 9 fast_cumulative_sum 100 0.042 2358 10 fast_cumulative_sum 1000 0.311 3219 11 fast_cumulative_sum 10000 2.243 4459 12 fast_cumulative_sum 100000 18.902 5290 13 fast_cumulative_sum 1000000 197.117 5073 14 fast_cumulative_sum 10000000 1981.922 5046 15 fast_cumulative_sum 100000000 20699.219 4831 16 pandas_fast_cumulative_sum 10 0.409 24 17 pandas_fast_cumulative_sum 100 0.346 289 18 pandas_fast_cumulative_sum 1000 0.349 2863 19 pandas_fast_cumulative_sum 10000 0.336 29727 20 pandas_fast_cumulative_sum 100000 1.665 60049 21 pandas_fast_cumulative_sum 1000000 14.201 70420 22 pandas_fast_cumulative_sum 10000000 123.686 80850 23 pandas_fast_cumulative_sum 100000000 1031.409 96955 https://github.com/chrisconlan/complexity-studies/blob/master/cumulative_sum.py
  • 19. There are countless examples of hidden complexity
  • 20. Count the frequency of words in a book function n_values t_milliseconds values_per_ms 0 slow_count_occurences 10 0.010 1031 1 slow_count_occurences 100 0.675 148 2 slow_count_occurences 1000 39.999 25 3 slow_count_occurences 10000 3169.300 3 4 fast_count_occurences 10 0.011 893 5 fast_count_occurences 100 0.042 2387 6 fast_count_occurences 1000 0.218 4591 7 fast_count_occurences 10000 1.988 5031 8 fast_count_occurences 100000 26.933 3713 9 fast_count_occurences 1000000 373.540 2677 10 fast_count_occurences 10000000 4816.169 2076 11 pandas_fast_count_occurences 10 1.372 7 12 pandas_fast_count_occurences 100 0.954 105 13 pandas_fast_count_occurences 1000 1.530 654 14 pandas_fast_count_occurences 10000 4.976 2010 15 pandas_fast_count_occurences 100000 74.877 1336 16 pandas_fast_count_occurences 1000000 1123.875 890 17 pandas_fast_count_occurences 10000000 12356.014 809 https://github.com/chrisconlan/complexity-studies/blob/master/count_occurences.py
  • 23. Find the words in a first list that are in a second list function n_values t_milliseconds values_per_ms 0 slow_match_within 10 0.006 1639 1 slow_match_within 100 0.192 520 2 slow_match_within 1000 16.793 60 3 slow_match_within 10000 1577.224 6 4 fast_match_within 10 0.010 1010 5 fast_match_within 100 0.027 3690 6 fast_match_within 1000 0.196 5110 7 fast_match_within 10000 1.931 5180 8 fast_match_within 100000 37.496 2667 9 fast_match_within 1000000 533.951 1873 10 fast_match_within 10000000 6079.538 1645 11 fast_intersection 10 0.013 769 12 fast_intersection 100 0.026 3774 13 fast_intersection 1000 0.159 6277 14 fast_intersection 10000 1.967 5083 15 fast_intersection 100000 27.306 3662 16 fast_intersection 1000000 408.962 2445 17 fast_intersection 10000000 4790.723 2087 https://github.com/chrisconlan/complexity-studies/blob/master/match_within.py