SlideShare a Scribd company logo
<Vincent Tatan>
vintatan@google.com
Intro to
Machine Learning
Wait, who am I?
Tak Kenal maka Tak Sayang,
Tak Sayang, maka Tak Tanya
Tak Tanya, maka Tak Tahu
-- Vincent Tatan --
Proprietary + Confidential
Meet Vincent
Safe Browsing Analyst (Machine Learning)
Google Trust & Safety
Medium : towardsdatascience.com/@vincentkernn
Linkedin : linkedin.com/in/vincenttatan/
Data Podcast: https://datacast.simplecast.com/
Proprietary + Confidential
Path to Google
Lazada Group
Data Scientist Intern
Dec 16 - Apr 17
B.Sc., Management Information
Systems and Services
Aug 13 - July 17
Visa
Data & Architecture
Engineer
Jun 17 - Aug 19
Google
Data Analyst, Machine Learning
Aug 19 - Present
Proprietary + Confidential
Google:
Trust and
Safety
To prevent phishing @
scale with Data Analytics
and ML
Proprietary + Confidential
So, what do I do at Google?
Proprietary + Confidential
Trust and Safety
Protect more than 3 billion
devices worldwide
1. Google notifies your browsers
to prevent phishing and malware.
2. Using machine learning-based
detection, we contributed to
99.9% accuracy in spam
detection
3. So if you see this, beware!
What is machine learning?
Proprietary + Confidential
Machine Learning
Computational methods using experience to improve
performance
Proprietary + Confidential
Machine Learning
Using Computer and data to achieve objective
● Computer → Algorithm, complexity analysis, theoretical
guarantees.
● Data analysis→ Statistics, probability
● Achieve Objective → Understanding the problem, simulation,
evaluation, etc
Proprietary + Confidential
Supervised vs Unsupervised
Proprietary + Confidential
Unlabeled
Training Data
Labeled
Training Data
Unseen
Test Data
Unsupervised Learning : No labeled data. Finding patterns/insights
Supervised Learning: Most common learning scenarios
Proprietary + Confidential
Labeled
Training Data
Unlabeled
Training Data
Semi Supervised Learning : With labeled and unlabeled training data
Unseen Test
Data
}
Why? Training data might imply same distributions.
Proprietary + Confidential
Real World Impact
What are the applications of AI and ML?
17
Important Resources (Teachable Machine)
https://teachablemachine.withgoogle.com/train/image
Half screen photo slide if
text is necessary
19
Street nameStreet number
Street View
Sign
Business facade
Sign
Business name
Traffic light
Traffic signStreet number
20
Google Translate
20
What do I do mostly?
Proprietary + Confidential
My Life at Google...
Proprietary + Confidential
Focus Work: The cycle of Data Project
● Generate Insights from Escalation
● Conduct EDA
● Create Prelim Un/Supervised Model
Policy Making
● Action in case of Phishing/SE Attacks
● Analyse Reports and Detect Causes
● Create Data Dashboard to understand impacts
Escalation
● Creating Deep Machine Learning Model
● Research and Analyse Effectiveness
● Deployment & Governance
Automation (ML, DNN)
ML Pipeline
Proprietary + Confidential
Frame the Problem
What is your goal?
Who are your stakeholders?
How do you add value to them?
Proprietary + Confidential
ML Pipeline
Data Collection +
Preprocessing
Model Training
and Evaluation
Machine Learning
Operations (MLOps)
Proprietary + Confidential
Data Collection
More data beats smarter
algorithms
1. But it is not practical
2. Data is expensive. Money and
time to collect labels
3. Big data might be overkill
Proprietary + Confidential
Model Training
Based on different use cases
1. Regression: n dim-Polynomial?
2. Classification: Decision tree,
logistic regression SVM
3. Each of the algorithm has multiple
characteristics:
a. Susceptible to outliers
b. Explainability
Proprietary + Confidential
Model Evaluation
Is it useful?
1. Regression: Root Mean Squared
Error (RMSE)
2. Classification: Confusion metrics,
AUC, Precision, Recall, F1
3. Complexity, explainability,
latency (time and space)
4. Eager/Lazy learners
Proprietary + Confidential
ML Ops
Operating real ML for real Use Case
1. Model Push
2. Model Validation
3. Monitoring/Anomaly Detection
ML Tech Stack
Proprietary + ConfidentialDevelopment + IDE
Language + Library Data + ML Ops
Analytics / ML Trend
How Analytics enter/menyurupi our lives?
Proprietary + Confidential
Analytics Development in Indonesia
4G Technology is
out in Indonesia
First time 4G is out in
Indonesia. High
smartphone adoption
with large digital
market.
2014
Large Growth
from now
“unicorn” Tech
Gojek Launches
Android and iOS apps
for 4 services:
transportation,
courier, and shopping
Traveloka becomes
the leading choice of
online flights and
hotel bookings.
2015
UI opens the first
big data curriculum
The first analytics
curriculum opened by
tertiary institutions
The expansion of
Jakarta Smart City
supported by Ahok
for Government and
Startup Collaboration
2016
Strong Political Support
of Indonesia Tech-
Education Movement
Nadiem was
appointed as the
minister of Education
Merdeka Hackathon
and Merdeka belajar
is held
Sandiaga Uno and
Anies greatly
supports these
initatives.
2018-
2019
Corona Virus
Corona virus affects
livelihood while
promoting the
attractiveness of
data science / IT
sector
2020
(now)
Data is the new
Electricity
The unity of political
movements,
businesses, startups,
and capital labour to
boost data analytics
movement in
Indonesia
The coming of 5G in
Indonesia
2021/
next
Infrastructure &
Business Growth
Education &
Political Supports
Crisis & Innovation
Movement
Proprietary + Confidential
Analytics Development in Indonesia
4G Technology is
out in Indonesia
First time 4G is out in
Indonesia. High
smartphone adoption
with large digital
market.
2014
Large Growth
from now
“unicorn” Tech
Gojek Launches
Android and iOS apps
for 4 services:
transportation,
courier, and shopping
Traveloka becomes
the leading choice of
online flights and
hotel bookings.
2015
UI opens the first
big data curriculum
The first analytics
curriculum opened by
tertiary institutions
The expansion of
Jakarta Smart City
supported by Ahok
for Government and
Startup Collaboration
2016
Strong Political Support
of Indonesia Tech-
Education Movement
Nadiem was
appointed as the
minister of Education
Merdeka Hackathon
and Merdeka belajar
is held
Sandiaga Uno and
Anies greatly
supports these
initatives.
2018-
2019
Corona Virus
Corona virus affects
livelihood while
promoting the
attractiveness of
data science / IT
sector
2020
(now)
Data is the new
Electricity
The unity of political
movements,
businesses, startups,
and capital labour to
boost data analytics
movement in
Indonesia
The coming of 5G in
Indonesia
2021/
next
Infrastructure &
Business Growth
Education &
Political Supports
Crisis & Innovation
Movement
Proprietary + Confidential
Analytics Development in Indonesia
4G Technology is
out in Indonesia
First time 4G is out in
Indonesia. High
smartphone adoption
with large digital
market.
2014
Large Growth
from now
“unicorn” Tech
Gojek Launches
Android and iOS apps
for 4 services:
transportation,
courier, and shopping
Traveloka becomes
the leading choice of
online flights and
hotel bookings.
2015
UI opens the first
big data curriculum
The first analytics
curriculum opened by
tertiary institutions
The expansion of
Jakarta Smart City
supported by Ahok
for Government
,Education and
Startup Collaboration
2016
Strong Political Support
of Indonesia Tech-
Education Movement
Nadiem was appointed
as the minister of
Education and Culture
Merdeka Hackathon and
Kampus Merdeka is held
Sandiaga Uno and Anies
greatly supports these
initatives.
2018-
2019
Corona Virus
Corona virus affects
livelihood while
promoting the
attractiveness of
data science / IT
sector
2020
(now)
Data is the new
Electricity
The unity of political
movements,
businesses, startups,
and capital labour to
boost data analytics
movement in
Indonesia
The coming of 5G in
Indonesia
2021/
next
Infrastructure &
Business Growth
Education &
Political Supports
Crisis & Innovation
Movement
Proprietary + Confidential
Analytics Development in Indonesia
4G Technology is
out in Indonesia
First time 4G is out in
Indonesia. High
smartphone adoption
with large digital
market.
2014
Large Growth
from now
“unicorn” Tech
Gojek Launches
Android and iOS apps
for 4 services:
transportation,
courier, and shopping
Traveloka becomes
the leading choice of
online flights and
hotel bookings.
2015
UI opens the first
big data curriculum
The first analytics
curriculum opened by
tertiary institutions
The expansion of
Jakarta Smart City
supported by Ahok
for Government and
Startup Collaboration
2016
Strong Political Support
of Indonesia Tech-
Education Movement
Nadiem was
appointed as the
minister of Education
Merdeka Hackathon
and Merdeka belajar
is held
Sandiaga Uno and
Anies greatly
supports these
initatives.
2018-
2019
Corona Virus
Corona virus affects
livelihood while
promoting the
attractiveness of
data science / IT
sector
2020
(now)
Data is the new
Electricity
The unity of political
movements,
businesses, startups,
and capital labour to
boost data analytics
movement in
Indonesia
The coming of 5G in
Indonesia
2021/
next
Infrastructure &
Business Growth
Education &
Political Supports
Crisis & Innovation
Movement
Proprietary + Confidential
Analytics Development in Indonesia
4G Technology is
out in Indonesia
First time 4G is out in
Indonesia. High
smartphone adoption
with large digital
market.
2014
Large Growth
from now
“unicorn” Tech
Gojek Launches
Android and iOS apps
for 4 services:
transportation,
courier, and shopping
Traveloka becomes
the leading choice of
online flights and
hotel bookings.
2015
UI opens the first
big data curriculum
The first analytics
curriculum opened by
tertiary institutions
The expansion of
Jakarta Smart City
supported by Pak
Ahok for Government
and Startup
Collaboration
2016
Strong Political Support
of Indonesia Tech-
Education Movement
Nadiem was
appointed as the
minister of Education
Merdeka Hackathon
and Merdeka belajar
is held
Pak Sandiaga Uno
and Pak Anies greatly
support these
initatives.
2018-
2019
Corona Virus
Corona virus affects
livelihood while
promoting the
attractiveness of
data science / IT
sector
2020
(now)
Data is the new
Electricity
The unity of political
movements,
businesses, startups,
and capital labour to
boost data analytics
movement in
Indonesia
The coming of 5G in
Indonesia
2021/
next
Infrastructure &
Business Growth
Education &
Political Supports
Crisis & Innovation
Movement
How can you excel in ML?
I’m super excited! What’s next!
https://bit.ly/2Pc2axe
https://bit.ly/2rBUP0Y
Proprietary + Confidential
Contribute more!
Data Science expands a lot, share your knowledge
1. Read more: Keep experimenting
with your learning styles
(kinesthetic, auditory, visual)
2. Write more: Write articles and share
them!
3. Speak more: Teach your fellow
peers or any conferences out there!
Proprietary + Confidential
Generalize and Specialize
1. Strength: Utilize your biggest
strengths
2. Communicate: Communicate your
strength and impacts more.
3. Learn in T: SQL & Python/R are the
breadth, then domain knowledge is
your depth
Read Deep Work
Follow it to the T
Proprietary + Confidential
Smile!
Data Science is Fun
1. Play: Tough, so have fun.
2. Hack: Use Saturdays to learn with
friends.
3. Celebrate impacts: Data science is
about building impacts. Start small
and celebrate!
Contribute, Prepare, and Smile
This is the key to excel
Confidential & Proprietary
Questions?

More Related Content

PPTX
Big Data & Implementasinya
Anshar Abdullah
 
PDF
Future of value of data interim summary aug 2018-compressed
Future Agenda
 
PPTX
Brace Yourselves Because The Internet of Things Is Coming
Cherwell Software
 
PDF
Blockchain Decentralized Identifier (DID) Innovation Insights from Patents
Alex G. Lee, Ph.D. Esq. CLP
 
PPTX
Blockchain + Big Data + AI + IoT Integration
Alex G. Lee, Ph.D. Esq. CLP
 
PDF
How software is transforming the u.s. economy 080517
Economic Strategy Institute
 
PPTX
IOT: The Evolving World of Realtime BigData by Jerry Power
Data Con LA
 
PDF
Covid-19 and IoT: Some Perspectives on the Use of IoT Technologies in Prevent...
eraser Juan José Calderón
 
Big Data & Implementasinya
Anshar Abdullah
 
Future of value of data interim summary aug 2018-compressed
Future Agenda
 
Brace Yourselves Because The Internet of Things Is Coming
Cherwell Software
 
Blockchain Decentralized Identifier (DID) Innovation Insights from Patents
Alex G. Lee, Ph.D. Esq. CLP
 
Blockchain + Big Data + AI + IoT Integration
Alex G. Lee, Ph.D. Esq. CLP
 
How software is transforming the u.s. economy 080517
Economic Strategy Institute
 
IOT: The Evolving World of Realtime BigData by Jerry Power
Data Con LA
 
Covid-19 and IoT: Some Perspectives on the Use of IoT Technologies in Prevent...
eraser Juan José Calderón
 

What's hot (19)

PDF
IoT and the implications on business IT architecture and security
DeniseFerniza
 
PPTX
8 trends of IoT in 2018
Ahmed Banafa
 
PPTX
Big Data Analytics
vijayapriya16
 
PDF
IOT SECURITY: PENETRATION TESTING OF WHITE-LABEL CLOUD-BASED IOT CAMERA COMPR...
ijcsit
 
PDF
IoT Trends to Drive Innovation for Business 2019-2020
Takayuki Yamazaki
 
PDF
Future of iot and big data
kashif kashif
 
PDF
An Introduction AI, Blockchain and IoT
Stylight
 
PPTX
Hot technologies of 2019
Ahmed Banafa
 
DOCX
Internet of things New Version 2017 Document
Ajith Kumar Ravi
 
PPTX
Internet of Things & Wearable Technology: Unlocking the Next Wave of Data-Dri...
Adam Thierer
 
PDF
Privacy and security policies in supply chain
Vanya Vladeva
 
PPTX
Internet of Things (IOT) - The Tipping Point
Dr. Mazlan Abbas
 
PPTX
Jan 2018: IoT trends in silicon valley keynote at consumer electronics forum ...
Sudha Jamthe
 
PDF
Security Analysis in Digital India
journal ijrtem
 
PPTX
The Internet of Things (IoT for Beginners Guide)
Ashish Kumar
 
PPTX
Internet of Things
Mphasis
 
PDF
From “Connected” to “Smart” Home: the future is IoT and Insurtech
Andrea Silvello
 
PPTX
New trends of IoT in 2018 and beyond (SJSU Conference )
Ahmed Banafa
 
PPTX
IoT and Covid 19
Ahmed Banafa
 
IoT and the implications on business IT architecture and security
DeniseFerniza
 
8 trends of IoT in 2018
Ahmed Banafa
 
Big Data Analytics
vijayapriya16
 
IOT SECURITY: PENETRATION TESTING OF WHITE-LABEL CLOUD-BASED IOT CAMERA COMPR...
ijcsit
 
IoT Trends to Drive Innovation for Business 2019-2020
Takayuki Yamazaki
 
Future of iot and big data
kashif kashif
 
An Introduction AI, Blockchain and IoT
Stylight
 
Hot technologies of 2019
Ahmed Banafa
 
Internet of things New Version 2017 Document
Ajith Kumar Ravi
 
Internet of Things & Wearable Technology: Unlocking the Next Wave of Data-Dri...
Adam Thierer
 
Privacy and security policies in supply chain
Vanya Vladeva
 
Internet of Things (IOT) - The Tipping Point
Dr. Mazlan Abbas
 
Jan 2018: IoT trends in silicon valley keynote at consumer electronics forum ...
Sudha Jamthe
 
Security Analysis in Digital India
journal ijrtem
 
The Internet of Things (IoT for Beginners Guide)
Ashish Kumar
 
Internet of Things
Mphasis
 
From “Connected” to “Smart” Home: the future is IoT and Insurtech
Andrea Silvello
 
New trends of IoT in 2018 and beyond (SJSU Conference )
Ahmed Banafa
 
IoT and Covid 19
Ahmed Banafa
 
Ad

Similar to Intro to ml lesson vincent (20)

PDF
Atomico Need-to-Know 12 May 2017
Atomico
 
PPTX
Data is the New Oil: Presented By Naveen Narayanan, Global Client Partner of ...
InterCon
 
PDF
Big Data et eGovernment
eGov Innovation Center
 
PDF
Carlo Colicchio: Big Data for business
Carlo Vaccari
 
PDF
TOP TEN: Big Data_ Issue 16 _ Dec 2014
MOTC Qatar
 
PPTX
Big Data in Business Application use case and benefits
Gaurav493374
 
PDF
Big databigideasit4bc
Vincent Ohprecio
 
PPTX
Cognitivo PoV - Market power in the data-driven economy
Alan Hsiao
 
PPTX
BIG DATA IN INDIAN ELECTION.pptx
ShubhamYadav769267
 
PPTX
University Public Driven Applications - Big Data and Organizational Design
maria chiara pettenati
 
PPTX
Big data
Debashish Jana
 
PPTX
What is Big Data?
iPresso
 
PDF
IE Big Data Club Data Science Challenge with Novum Insights
Felix Müller
 
PDF
Eemov data
Joaquín Salvachúa
 
PDF
Living in a data driven world by V Laxmikanth Broadridge
Zinnov
 
PPTX
Managing service business
İnform Elektronik
 
PPTX
Big Data, Big Investment
GGV Capital
 
PPTX
Data Disruption by Vertical Innovation
Chandan Rajah
 
PDF
Big Data and Public Policy: Course, Content and Outcome Rebecca Moody
pozzanfecko
 
PPTX
What exactly is big data? What exactly is big data? .pptx
TusharSengar6
 
Atomico Need-to-Know 12 May 2017
Atomico
 
Data is the New Oil: Presented By Naveen Narayanan, Global Client Partner of ...
InterCon
 
Big Data et eGovernment
eGov Innovation Center
 
Carlo Colicchio: Big Data for business
Carlo Vaccari
 
TOP TEN: Big Data_ Issue 16 _ Dec 2014
MOTC Qatar
 
Big Data in Business Application use case and benefits
Gaurav493374
 
Big databigideasit4bc
Vincent Ohprecio
 
Cognitivo PoV - Market power in the data-driven economy
Alan Hsiao
 
BIG DATA IN INDIAN ELECTION.pptx
ShubhamYadav769267
 
University Public Driven Applications - Big Data and Organizational Design
maria chiara pettenati
 
Big data
Debashish Jana
 
What is Big Data?
iPresso
 
IE Big Data Club Data Science Challenge with Novum Insights
Felix Müller
 
Living in a data driven world by V Laxmikanth Broadridge
Zinnov
 
Managing service business
İnform Elektronik
 
Big Data, Big Investment
GGV Capital
 
Data Disruption by Vertical Innovation
Chandan Rajah
 
Big Data and Public Policy: Course, Content and Outcome Rebecca Moody
pozzanfecko
 
What exactly is big data? What exactly is big data? .pptx
TusharSengar6
 
Ad

More from Vincent Tatan (10)

PPTX
Life of a ML Engineer [Redacted].pptx
Vincent Tatan
 
PDF
Classification case study + intro to cnn
Vincent Tatan
 
PPTX
Introduction to ml ops in daily apps
Vincent Tatan
 
PPTX
[Master] unboxing design docs for data scientists
Vincent Tatan
 
PPTX
Listen, check, and pay
Vincent Tatan
 
PPTX
[Revised] Intro to CNN
Vincent Tatan
 
PPTX
Overcoming Imposter Syndrome
Vincent Tatan
 
PPTX
Dssg talk CNN intro
Vincent Tatan
 
PPTX
SAS Slides FINALE (2)
Vincent Tatan
 
PDF
Doc 14 Jan 2016%2c 1014-rotated (1)
Vincent Tatan
 
Life of a ML Engineer [Redacted].pptx
Vincent Tatan
 
Classification case study + intro to cnn
Vincent Tatan
 
Introduction to ml ops in daily apps
Vincent Tatan
 
[Master] unboxing design docs for data scientists
Vincent Tatan
 
Listen, check, and pay
Vincent Tatan
 
[Revised] Intro to CNN
Vincent Tatan
 
Overcoming Imposter Syndrome
Vincent Tatan
 
Dssg talk CNN intro
Vincent Tatan
 
SAS Slides FINALE (2)
Vincent Tatan
 
Doc 14 Jan 2016%2c 1014-rotated (1)
Vincent Tatan
 

Recently uploaded (20)

PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PDF
CH2-MODEL-SETUP-v2017.1-JC-APR27-2017.pdf
jcc00023con
 
PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
PPTX
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
PPTX
Azure Data management Engineer project.pptx
sumitmundhe77
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPT
2009worlddatasheet_presentation.ppt peoole
umutunsalnsl4402
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PDF
Research about a FoodFolio app for personalized dietary tracking and health o...
AustinLiamAndres
 
PDF
oop_java (1) of ice or cse or eee ic.pdf
sabiquntoufiqlabonno
 
PPTX
Employee Salary Presentation.l based on data science collection of data
barridevakumari2004
 
PPTX
Short term internship project report on power Bi
JMJCollegeComputerde
 
PDF
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
PPTX
Economic Sector Performance Recovery.pptx
yulisbaso2020
 
PPTX
Introduction to Data Analytics and Data Science
KavithaCIT
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PPTX
1intro to AI.pptx AI components & composition
ssuserb993e5
 
PPTX
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
PDF
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
CH2-MODEL-SETUP-v2017.1-JC-APR27-2017.pdf
jcc00023con
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
Azure Data management Engineer project.pptx
sumitmundhe77
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
2009worlddatasheet_presentation.ppt peoole
umutunsalnsl4402
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
Research about a FoodFolio app for personalized dietary tracking and health o...
AustinLiamAndres
 
oop_java (1) of ice or cse or eee ic.pdf
sabiquntoufiqlabonno
 
Employee Salary Presentation.l based on data science collection of data
barridevakumari2004
 
Short term internship project report on power Bi
JMJCollegeComputerde
 
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
Economic Sector Performance Recovery.pptx
yulisbaso2020
 
Introduction to Data Analytics and Data Science
KavithaCIT
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
1intro to AI.pptx AI components & composition
ssuserb993e5
 
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 

Intro to ml lesson vincent

  • 2. Wait, who am I? Tak Kenal maka Tak Sayang, Tak Sayang, maka Tak Tanya Tak Tanya, maka Tak Tahu -- Vincent Tatan --
  • 3. Proprietary + Confidential Meet Vincent Safe Browsing Analyst (Machine Learning) Google Trust & Safety Medium : towardsdatascience.com/@vincentkernn Linkedin : linkedin.com/in/vincenttatan/ Data Podcast: https://datacast.simplecast.com/
  • 4. Proprietary + Confidential Path to Google Lazada Group Data Scientist Intern Dec 16 - Apr 17 B.Sc., Management Information Systems and Services Aug 13 - July 17 Visa Data & Architecture Engineer Jun 17 - Aug 19 Google Data Analyst, Machine Learning Aug 19 - Present
  • 5. Proprietary + Confidential Google: Trust and Safety To prevent phishing @ scale with Data Analytics and ML
  • 6. Proprietary + Confidential So, what do I do at Google?
  • 7. Proprietary + Confidential Trust and Safety Protect more than 3 billion devices worldwide 1. Google notifies your browsers to prevent phishing and malware. 2. Using machine learning-based detection, we contributed to 99.9% accuracy in spam detection 3. So if you see this, beware!
  • 8. What is machine learning?
  • 9. Proprietary + Confidential Machine Learning Computational methods using experience to improve performance
  • 10. Proprietary + Confidential Machine Learning Using Computer and data to achieve objective ● Computer → Algorithm, complexity analysis, theoretical guarantees. ● Data analysis→ Statistics, probability ● Achieve Objective → Understanding the problem, simulation, evaluation, etc
  • 13. Proprietary + Confidential Unlabeled Training Data Labeled Training Data Unseen Test Data Unsupervised Learning : No labeled data. Finding patterns/insights Supervised Learning: Most common learning scenarios
  • 14. Proprietary + Confidential Labeled Training Data Unlabeled Training Data Semi Supervised Learning : With labeled and unlabeled training data Unseen Test Data } Why? Training data might imply same distributions.
  • 16. Real World Impact What are the applications of AI and ML?
  • 17. 17 Important Resources (Teachable Machine) https://teachablemachine.withgoogle.com/train/image
  • 18. Half screen photo slide if text is necessary
  • 19. 19 Street nameStreet number Street View Sign Business facade Sign Business name Traffic light Traffic signStreet number
  • 21. What do I do mostly?
  • 22. Proprietary + Confidential My Life at Google...
  • 23. Proprietary + Confidential Focus Work: The cycle of Data Project ● Generate Insights from Escalation ● Conduct EDA ● Create Prelim Un/Supervised Model Policy Making ● Action in case of Phishing/SE Attacks ● Analyse Reports and Detect Causes ● Create Data Dashboard to understand impacts Escalation ● Creating Deep Machine Learning Model ● Research and Analyse Effectiveness ● Deployment & Governance Automation (ML, DNN)
  • 25. Proprietary + Confidential Frame the Problem What is your goal? Who are your stakeholders? How do you add value to them?
  • 26. Proprietary + Confidential ML Pipeline Data Collection + Preprocessing Model Training and Evaluation Machine Learning Operations (MLOps)
  • 27. Proprietary + Confidential Data Collection More data beats smarter algorithms 1. But it is not practical 2. Data is expensive. Money and time to collect labels 3. Big data might be overkill
  • 28. Proprietary + Confidential Model Training Based on different use cases 1. Regression: n dim-Polynomial? 2. Classification: Decision tree, logistic regression SVM 3. Each of the algorithm has multiple characteristics: a. Susceptible to outliers b. Explainability
  • 29. Proprietary + Confidential Model Evaluation Is it useful? 1. Regression: Root Mean Squared Error (RMSE) 2. Classification: Confusion metrics, AUC, Precision, Recall, F1 3. Complexity, explainability, latency (time and space) 4. Eager/Lazy learners
  • 30. Proprietary + Confidential ML Ops Operating real ML for real Use Case 1. Model Push 2. Model Validation 3. Monitoring/Anomaly Detection
  • 32. Proprietary + ConfidentialDevelopment + IDE Language + Library Data + ML Ops
  • 33. Analytics / ML Trend How Analytics enter/menyurupi our lives?
  • 34. Proprietary + Confidential Analytics Development in Indonesia 4G Technology is out in Indonesia First time 4G is out in Indonesia. High smartphone adoption with large digital market. 2014 Large Growth from now “unicorn” Tech Gojek Launches Android and iOS apps for 4 services: transportation, courier, and shopping Traveloka becomes the leading choice of online flights and hotel bookings. 2015 UI opens the first big data curriculum The first analytics curriculum opened by tertiary institutions The expansion of Jakarta Smart City supported by Ahok for Government and Startup Collaboration 2016 Strong Political Support of Indonesia Tech- Education Movement Nadiem was appointed as the minister of Education Merdeka Hackathon and Merdeka belajar is held Sandiaga Uno and Anies greatly supports these initatives. 2018- 2019 Corona Virus Corona virus affects livelihood while promoting the attractiveness of data science / IT sector 2020 (now) Data is the new Electricity The unity of political movements, businesses, startups, and capital labour to boost data analytics movement in Indonesia The coming of 5G in Indonesia 2021/ next Infrastructure & Business Growth Education & Political Supports Crisis & Innovation Movement
  • 35. Proprietary + Confidential Analytics Development in Indonesia 4G Technology is out in Indonesia First time 4G is out in Indonesia. High smartphone adoption with large digital market. 2014 Large Growth from now “unicorn” Tech Gojek Launches Android and iOS apps for 4 services: transportation, courier, and shopping Traveloka becomes the leading choice of online flights and hotel bookings. 2015 UI opens the first big data curriculum The first analytics curriculum opened by tertiary institutions The expansion of Jakarta Smart City supported by Ahok for Government and Startup Collaboration 2016 Strong Political Support of Indonesia Tech- Education Movement Nadiem was appointed as the minister of Education Merdeka Hackathon and Merdeka belajar is held Sandiaga Uno and Anies greatly supports these initatives. 2018- 2019 Corona Virus Corona virus affects livelihood while promoting the attractiveness of data science / IT sector 2020 (now) Data is the new Electricity The unity of political movements, businesses, startups, and capital labour to boost data analytics movement in Indonesia The coming of 5G in Indonesia 2021/ next Infrastructure & Business Growth Education & Political Supports Crisis & Innovation Movement
  • 36. Proprietary + Confidential Analytics Development in Indonesia 4G Technology is out in Indonesia First time 4G is out in Indonesia. High smartphone adoption with large digital market. 2014 Large Growth from now “unicorn” Tech Gojek Launches Android and iOS apps for 4 services: transportation, courier, and shopping Traveloka becomes the leading choice of online flights and hotel bookings. 2015 UI opens the first big data curriculum The first analytics curriculum opened by tertiary institutions The expansion of Jakarta Smart City supported by Ahok for Government ,Education and Startup Collaboration 2016 Strong Political Support of Indonesia Tech- Education Movement Nadiem was appointed as the minister of Education and Culture Merdeka Hackathon and Kampus Merdeka is held Sandiaga Uno and Anies greatly supports these initatives. 2018- 2019 Corona Virus Corona virus affects livelihood while promoting the attractiveness of data science / IT sector 2020 (now) Data is the new Electricity The unity of political movements, businesses, startups, and capital labour to boost data analytics movement in Indonesia The coming of 5G in Indonesia 2021/ next Infrastructure & Business Growth Education & Political Supports Crisis & Innovation Movement
  • 37. Proprietary + Confidential Analytics Development in Indonesia 4G Technology is out in Indonesia First time 4G is out in Indonesia. High smartphone adoption with large digital market. 2014 Large Growth from now “unicorn” Tech Gojek Launches Android and iOS apps for 4 services: transportation, courier, and shopping Traveloka becomes the leading choice of online flights and hotel bookings. 2015 UI opens the first big data curriculum The first analytics curriculum opened by tertiary institutions The expansion of Jakarta Smart City supported by Ahok for Government and Startup Collaboration 2016 Strong Political Support of Indonesia Tech- Education Movement Nadiem was appointed as the minister of Education Merdeka Hackathon and Merdeka belajar is held Sandiaga Uno and Anies greatly supports these initatives. 2018- 2019 Corona Virus Corona virus affects livelihood while promoting the attractiveness of data science / IT sector 2020 (now) Data is the new Electricity The unity of political movements, businesses, startups, and capital labour to boost data analytics movement in Indonesia The coming of 5G in Indonesia 2021/ next Infrastructure & Business Growth Education & Political Supports Crisis & Innovation Movement
  • 38. Proprietary + Confidential Analytics Development in Indonesia 4G Technology is out in Indonesia First time 4G is out in Indonesia. High smartphone adoption with large digital market. 2014 Large Growth from now “unicorn” Tech Gojek Launches Android and iOS apps for 4 services: transportation, courier, and shopping Traveloka becomes the leading choice of online flights and hotel bookings. 2015 UI opens the first big data curriculum The first analytics curriculum opened by tertiary institutions The expansion of Jakarta Smart City supported by Pak Ahok for Government and Startup Collaboration 2016 Strong Political Support of Indonesia Tech- Education Movement Nadiem was appointed as the minister of Education Merdeka Hackathon and Merdeka belajar is held Pak Sandiaga Uno and Pak Anies greatly support these initatives. 2018- 2019 Corona Virus Corona virus affects livelihood while promoting the attractiveness of data science / IT sector 2020 (now) Data is the new Electricity The unity of political movements, businesses, startups, and capital labour to boost data analytics movement in Indonesia The coming of 5G in Indonesia 2021/ next Infrastructure & Business Growth Education & Political Supports Crisis & Innovation Movement
  • 39. How can you excel in ML? I’m super excited! What’s next!
  • 41. Proprietary + Confidential Contribute more! Data Science expands a lot, share your knowledge 1. Read more: Keep experimenting with your learning styles (kinesthetic, auditory, visual) 2. Write more: Write articles and share them! 3. Speak more: Teach your fellow peers or any conferences out there!
  • 42. Proprietary + Confidential Generalize and Specialize 1. Strength: Utilize your biggest strengths 2. Communicate: Communicate your strength and impacts more. 3. Learn in T: SQL & Python/R are the breadth, then domain knowledge is your depth Read Deep Work Follow it to the T
  • 43. Proprietary + Confidential Smile! Data Science is Fun 1. Play: Tough, so have fun. 2. Hack: Use Saturdays to learn with friends. 3. Celebrate impacts: Data science is about building impacts. Start small and celebrate!
  • 44. Contribute, Prepare, and Smile This is the key to excel

Editor's Notes

  • #2: Welcome the audience Introduce yourself Tell them broadly what you are going to talk about Transition to video
  • #3: 5 real-world examples 4 Google products
  • #9: Untuk materinya, tidak perlu terlalu dalam mas. Cukup overview saja. Karena ini intro to machine laerning dan pesertanya adalah pemula, jadi isi materi kurang lebih: 1. Apa itu machibe learning? 2. Kegunaanya 3. Jenis-jenis (supervised, unsupervised) 4. Algoritma dari supervised dan unsupervised 5. Contoh penerapanya dari setiap algoritma 6. Workflow / Alur pengerjaan project machine learning (contoh: data preprocessing, modelling, tunning, deployment, monitoring) 7. Library apa yang paling sering digunakan 8. Kemampuan dasar apa yang perlu dipersiapan
  • #13: Untuk materinya, tidak perlu terlalu dalam mas. Cukup overview saja. Karena ini intro to machine laerning dan pesertanya adalah pemula, jadi isi materi kurang lebih: 1. Apa itu machibe learning? 2. Kegunaanya 3. Jenis-jenis (supervised, unsupervised) 4. Algoritma dari supervised dan unsupervised 5. Contoh penerapanya dari setiap algoritma 6. Workflow / Alur pengerjaan project machine learning (contoh: data preprocessing, modelling, tunning, deployment, monitoring) 7. Library apa yang paling sering digunakan 8. Kemampuan dasar apa yang perlu dipersiapan
  • #17: ML has already made a huge impact in the world especially in the areas of science and health care. ML is impacting almost every industry from Manufacturing to sales and Marketing and from Agriculture to Astronomy.
  • #18: For the simple basic codes that I am going to talk about is using this material from Google Colab In case you don’t know what Google Colab is, it is an impressive tool where you can run your GPU for free using interactive notebooks environments. So if you want to run your machine learnign model quickly using Tensorflow, Keras, and many more but you don’t want to invest a lot. Then you can come to this environment. It is easy. If you are still unsure, then let me know. But for now, you can just know that we are using this training tutorial as our simple intro to CNN
  • #19: In Agriculture: In dairy farming a cows health is vital to the survival business and Connecterra a company in the Netherlands wondered if they can use Machine Learning to keep cows healthy by tracking behaviors and being able to provide insights to farmers and veterinarians on actions to be taken to ensure happy, healthy cows with higher yields. So now, happy cows come not only from California but also from the Netherlands
  • #20: Google Maps has created Street View-style visual guides for step-by-step directions overlaid onto the real world, as viewed through the smartphone camera. Further, Google plans to integrate its Assistant, equipped with the computer vision platform Google Lens, into Maps. That way, you’ll be able to pan over a city street and see pop-ups highlighting restaurants and other locations in real time.
  • #21: Now you Google is offering offline downloads for its AI-powered translator. So if you don’t have unlimited data or you have a plan that doesn’t work internationally, you can now download neural machine translation from Google’s Android and iOS apps. Google Translate’s offline AI translations will first be available in 59 languages, including English, Arabic, Chinese, German, and Hindi, to name a few. They’ll take about 35MB per language, so they won’t use up too much of your device’s storage. Lower-specced phones should also be able to support the new update, as Google says it wants users in all markets to have access to the feature.
  • #22: 5 real-world examples 4 Google products
  • #25: 5 real-world examples 4 Google products
  • #26: P
  • #27: P
  • #32: 5 real-world examples 4 Google products
  • #34: 5 real-world examples 4 Google products
  • #35: P
  • #36: P
  • #37: P
  • #38: P
  • #39: P
  • #40: Now that we are aware of all the resources let’s understand the framework for building ML models.
  • #45: Now that we are aware of all the resources let’s understand the framework for building ML models.