SlideShare a Scribd company logo
Software
Engineer
@Criteo AI Lab
Gilles LEGOUX at Grenoble INP - Ensimag
2020-11-10 <g.legoux@criteo.com>
2 •
What is Criteo ?
The leading advertising platform for the open internet
Open Internet AI* Engine E-commerce
Dataset
Criteo was founded in 2005
+2700 employees with +650 in R&D
See more details
AMERICAS EMEA
APAC
Publisher
Access
Advertiser
Platform
*: Artificial Intelligence
Source Criteo in 2019
30 locations in the world
with Paris (FR), Grenoble (FR), Ann-Arbor (USA)
as R&D offices
See more details
3 •
" I am Software Engineer @Criteo R&D*
in the Criteo AI Lab, more precisely in UC** Team "
https://ailab.criteo.com
*: Research & Development
**: Universal Catalog
4 •
2016
2014
2017
2020
Information Systems
Engineering specialization
Startup experience
as Web Software Engineer
Post master's degree in data
science and big data
Software Reliability Engineer
Software Engineer
at Criteo AI Lab
5 •
Ad Online World
go, Go, GO!
6 •
Criteo demo
Ad choices
" Here an online ad "
Publisher website
Advertiser website
7 •
Who are the top acting companies of the online ad world?
go, Go, GO!
Source SimilarTech* online data
*Bing for Microsoft, Double click is Google, Taboola bought Outbrain, Amazon new actors should be present
for 10K sites (data viewed in 2020-10 but should in ~2018)
Ads market share
8 •
How works of the online ads ?
source ad-exchange.fr
DSP: Demand-Side Platforms
SSP: Supply-Side Platforms
Cash flow
go, Go, GO!
CTR* 1%
x2 in relation to competitor's
average
*: Click Through Rate
Cash flow
9 •
What are products provided by Criteo ?
https://marketing.criteo.com
Advertiser
Platform
Criteo is a full DSP, our main business
partners are advertisers:
• Import products
• Manage campaigns
with budget & audience rules
• Analyze results
• Create ads
https://pmc.criteo.com
Publisher
Access
Our business partners are
publishers, but Criteo can use other
SSPs to provides ads.
10 •
Criteo datasets
User Events Advertiser
Configuration
E-commerce
Dataset
AI* Engine
Universal
Catalog
Advertiser catalogs
11 •
How the users interact with the online ads ?
RTB: Real Time Bidding CAS: Criteo Ads Server CAT : Criteo Ads Targeting
CRITEO
INTERNET
Billing
Views
Displays Clicks
Events
View, List, Basket, Sale
Auctions &
Biddings
Loading script
Browsing
Open Internet
Won auction
static.criteo.net
12 •
How the universal catalog is used ?
Publisher Direct Access
& SSPs
RTB
Render
Ads Creator Reco*
Universal Catalog
+ 12B products
Audience Budget
*: Recommendation
User Web Client
Arbitrage
CAS CAT
Campaign
Internal Criteo Network
The Internet
Advertiser
13 •
Criteo AI* Lab
go, Go, GO!
*: Artificial Intelligence
14 •
What and who is the Criteo AI Lab ?
• R&D department
• Machine Learning, said ML
• Researchers & Software Engineers
Infrastructure
Product Engineering
Site Reliability Engineering
Product
Engineering
Engineering
Pprogram
Management
Research & Development
Product
Engineering
15 •
How and why is the Criteo AI Lab ?
• 4 groups of teams
• Provide ML state-of-art for Criteo
• Academic contributions & visibility
Criteo AI Lab Structure
Product
Engineering
Research CAML**
ML Platform
Recommendation
**: Criteo Applied Machine Learning
*: Universal Catalog
UC* Team
16 •
A yearly kick-off for the Criteo strategy. We have a 9
months plan, several Objective Key Results (OKRs) per
quarter, bi-weeks scrum sprint, and daily tasks.
Organization of a team
" Every team is owner of its own daily organizations with a common culture "
Team members
EPM*
Manager
Team lead
*: Engineering Program Manager
Software Engineer
Product
Owner
17 •
Workday
8h-10h Start
• Development: Single/Pair/Mob programming for maintenance,
tech debt, features, hot fixes
• Meeting: Demo, Sharing Knowledge, Brainstorming, Project,
1:1 team lead or manager
• Communication: Email/Slack Questions, News
• Documentation: User/Developer/Design/Code/Organization
• Event: Social, Conference, CAIL/R&D All Hands, CTF*, Hackathon
• Learning: Online courses, blog articles reading, competition
• Break: coffee, lunch
17h-21h End
*: Capture The Flag
18 •
Used Tools
Instant messaging Code versioning
Presentation, Email,
Calendar management
Programming language
Online meetings
platform
Ticketing
management
Documentation
management
Feedback platform
Award platform
Integrated
Development
Environment
19 •
Software Engineer Skills
Feedback processes 2 times in the year:
middle of year and end of year by your peers, from a
matrix of levels (junior, senior, staff, senior staff, principal, …) based
on these 10 skills.
Hard skills
Soft skills
20 •
Interactions
Research
*: Engineering Program Manager
Software Engineer
Data scientist
Software Engineer
Site Reliability Engineer
Product Analyst
Manager
EPM* Product Owner
Users
21 •
Universal Catalog
go, Go, GO!
22 •
Our mission ?
Outcome
Universal Catalog
+ 12B enriched products
Advertiser catalogs
+30K catalogs
Merge and unify all advertiser catalogs to a universal catalog.
23 •
How to build this universal catalog ?
Product Model Prediction
Enriched
Product
Simple processing
Build the universal catalog for Criteo business
with machine learning and data processing algorithms.
24 •
What are the features of an enriched product ?
Outcome
Provided features
vendor
id
title
description
category
brand
price
universal brand
universal category
gender
price in euros
price range
Product Enriched Product
vendor
id
title
description
category
brand
price
Enrichments
25 •
What is our data ?
Universal Catalog
+ 30K products catalogs
+ 12B products
12 languages
Outcome
Product Universal
Categories
+5K
Product Universal
Brands
+60K
E-commerce
Dataset
26 •
What is the universal category model ?
AI Engine
Deep Learning model
title
description
Product
Predicted universal
leaf category
Supervised model for classification with K classes
27 •
What is the technical environment ?
annotate
products
Import catalogs
meta store
ML labs & experiments
models
metadata
deploy model
sample
products
enrich the products
with predictions
or simple processes
feed
data sets
get data sets
Annotation API & UI
Jobs scheduler
Advertiser catalogs
data sets
AI Engine
data warehouse
28 •
What are our components ?
• Scheduler with a Spark job
• Web Application
• Machine learning lab
29 •
Build
Tools Server
CI/CD*
Server
Review
Server
Gerrit server
Artifact
stores
Deployment
Server
Container
platform
*: Continous Integration/Continous Delivery
Workstation
What is the development cycle and the pipeline for “go to (pre-)production” ?
Preprod or prod?
Datacenter(s)?
.pex
.jar
30 •
What is the production environment ?
Container
platform
meta store
Container
platform
models
data warehouse
metadata
universal catalog
databases
Spark job
enricher
Jobs
scheduler
31 •
What is the technical stack ?
Jobs
Web applications
ML labs & experiments
Analytics Monitoring
Container platforms Storages
Thank you!
go, Go, GO!
33 •
" We are recruiting ! "
Already +20 graduates here
" Join us ✌️ "
Criteo Tech blog
Criteo Open Positions
criteo.com
Q & A
go, Go, GO!
g.legoux@criteo.com
@gilleslegoux
35 •
Criteoers* contribute and create regularly
open source projects , but we have some internal
projects to keep advance on our competitors!
*: name for the employees of Criteo
Criteo GitHub
Open source projects
See more details
Criteo Gitlab
Experiment internal projects
Criteo Gerrit
Production internal projects
What's about open source?
" We love Open Source projects "
36 •
One situation by location, but remote work is "strongly
advised" until June 2021 for Paris and Grenoble. We have
a small impact on our business due to Covid-19.
What's happen with Covid-19 ?
" Everyone is safe , business is good "
Covid-19 vs Criteo
See more details
37 •
Each team has a part of this common tech stack,
and can use any tech for experiments.
What's about your technical stack ?
" It depends on your team and mission, but
we have a common tech stack! "
Criteo Tech Stack
See more details
38 •
We have 1 kickoff, 1 hackathon (3 days) and 2 conferences per
year, an onboarding with datacenter visit, paying external
trainings or internal trainings, tool licenses, matrix levels (SRE,
SDE, ML ENG, ...), 3 voyager programs, peer feedbacks every 6
months with promotion process … See working in R&D to join us
What's about professional career and experience life at Criteo ?
" Become a complete happy engineer! "
Criteo Experience life
See more details
39 •
What are the voyager programs?
Annexes
go, Go, GO!
41 •
" We are sensible at these questions "
Criteo is also a society project, not only a company
for the open internet! See our values and cares .
Save environment
See more details
Respect private data
See more details
42 •
" Criteo in digits ? "
See more details
The development team of the future at Criteo
Here are a few figures, because we like data, yes indeed we do:
• 15 datacenters (9 with computing capacity + 6 dedicated to network connectivity)
across US, EU, APAC
• More than 35K servers, running a mix of Linux and Windows
• One of the largest Hadoop clusters in Europe with close to 171 PB of storage and 42.000 cores
• 250B HTTP requests and close to 4B unique banners displayed per day
• 130Gbps of bandwidth, half of it through peering exchanges
• Respond to bids in 80ms or less, 24/7
• Close to 4M HTTP requests per second handled during peak times
• Less than 10ms on average to select optimal campaign
• 10ms to find best product in catalogue of hundreds of millions of products
• Tens of TB of new data stored daily
• Largest public Machine Learning Dataset in the world with over 4 billion lines and over 1TB in size
•Technologies: Hadoop, Couchbase, Redis, Mesos, Kafka, Storm, Cassandra, Spark, Vertica, Druid, …
Source Criteo in 2019
43 •
" What are the Criteo datacenters ? "
Source Criteo in 2020
44
" How a data center is installed at Criteo ? "
You can visit it !
go, Go, GO!

More Related Content

PDF
Brief introduction on GAN
Dai-Hai Nguyen
 
PPTX
Blockchain
Mohit Singh
 
PDF
AWS와 함께 한 쿠키런 서버 Re-architecting 사례 (Gaming on AWS)
Brian Hong
 
PDF
How does blockchain work
Shishir Aryal
 
PDF
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Databricks
 
PDF
LMAX Architecture
Stephan Schmidt
 
PDF
Jenkins vs GitLab CI
CEE-SEC(R)
 
PDF
Kubeinvaders & Chaos Engineering practices for Kubernetes-1.pdf
Eugenio Marzo
 
Brief introduction on GAN
Dai-Hai Nguyen
 
Blockchain
Mohit Singh
 
AWS와 함께 한 쿠키런 서버 Re-architecting 사례 (Gaming on AWS)
Brian Hong
 
How does blockchain work
Shishir Aryal
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Databricks
 
LMAX Architecture
Stephan Schmidt
 
Jenkins vs GitLab CI
CEE-SEC(R)
 
Kubeinvaders & Chaos Engineering practices for Kubernetes-1.pdf
Eugenio Marzo
 

What's hot (20)

PDF
Iocp 기본 구조 이해
Nam Hyeonuk
 
PPTX
Blockchain
Ranajeet Barik
 
PPTX
Blockchain
Liam Moore
 
PDF
Devops Porto - CI/CD at Gitlab
Filipa Lacerda
 
PDF
intro to DevOps
Mujahed Al-Tahle
 
PDF
Continuous Integration/Deployment with Gitlab CI
David Hahn
 
PDF
송창규, unity build로 빌드타임 반토막내기, NDC2010
devCAT Studio, NEXON
 
PDF
Crypto currencies presentation by Dr. Andre Gholam
PMILebanonChapter
 
PDF
Windows Registered I/O (RIO) vs IOCP
Seungmo Koo
 
PDF
DevOps introduction
Sridhara T V
 
PDF
Latent diffusions vs DALL-E v2
Vitaly Bondar
 
PPTX
Scalable data pipeline at Traveloka - Facebook Dev Bandung
Rendy Bambang Junior
 
PDF
Blockchain Scalability - Architectures and Algorithms
Gokul Alex
 
PDF
Blockchain Technology | Blockchain Explained | Blockchain Tutorial | Blockcha...
Edureka!
 
PDF
Streaming all over the world Real life use cases with Kafka Streams
confluent
 
PPTX
DevOps introduction
Christian F. Nissen
 
PPTX
Introduction to Apache ZooKeeper
Saurav Haloi
 
PPTX
Local Stack - A fully functional AWS cloud on your desktop
CodeOps Technologies LLP
 
PPTX
Learning Solidity
Arnold Pham
 
PDF
김민욱, (달빛조각사) 엘릭서를 이용한 mmorpg 서버 개발, NDC2019
min woog kim
 
Iocp 기본 구조 이해
Nam Hyeonuk
 
Blockchain
Ranajeet Barik
 
Blockchain
Liam Moore
 
Devops Porto - CI/CD at Gitlab
Filipa Lacerda
 
intro to DevOps
Mujahed Al-Tahle
 
Continuous Integration/Deployment with Gitlab CI
David Hahn
 
송창규, unity build로 빌드타임 반토막내기, NDC2010
devCAT Studio, NEXON
 
Crypto currencies presentation by Dr. Andre Gholam
PMILebanonChapter
 
Windows Registered I/O (RIO) vs IOCP
Seungmo Koo
 
DevOps introduction
Sridhara T V
 
Latent diffusions vs DALL-E v2
Vitaly Bondar
 
Scalable data pipeline at Traveloka - Facebook Dev Bandung
Rendy Bambang Junior
 
Blockchain Scalability - Architectures and Algorithms
Gokul Alex
 
Blockchain Technology | Blockchain Explained | Blockchain Tutorial | Blockcha...
Edureka!
 
Streaming all over the world Real life use cases with Kafka Streams
confluent
 
DevOps introduction
Christian F. Nissen
 
Introduction to Apache ZooKeeper
Saurav Haloi
 
Local Stack - A fully functional AWS cloud on your desktop
CodeOps Technologies LLP
 
Learning Solidity
Arnold Pham
 
김민욱, (달빛조각사) 엘릭서를 이용한 mmorpg 서버 개발, NDC2019
min woog kim
 
Ad

Similar to Tech Job Conference: Software Engineer @Criteo (20)

PDF
Continuum Analytics and Python
Travis Oliphant
 
PDF
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
Neo4j
 
PPTX
DevOpsGuys FutureDecoded 2016 - is DevOps the Answer
DevOpsGroup
 
PDF
Datenstrategie der Zukunft - Technologietrends, die Sie kennen müssen
Denodo
 
PDF
Cubitic: Predictive Analytics
huguk
 
PDF
Criteo TektosData Meetup
Olivier Koch
 
PDF
Rethink! How Digital Transformation disrupts Enterprise Architecture
LeanIX GmbH
 
PDF
Digital Personalisation: Growing Revenue Faster with Digital Experiences That...
DRI - Discovery/Reinvention/Integration/
 
PPTX
Datasciencein E-commerce industry
Rakuten Group, Inc.
 
PDF
Rakuten - Recommendation Platform
Karthik Murugesan
 
PPTX
Sinergija 11 Introduction to HealthVault
Catalin Gheorghiu
 
PDF
What's new in the latest source{d} releases!
source{d}
 
PDF
BUDDY White Paper
Achmad Surya Afandy
 
PDF
Building Intelligent Solutions with Graphs, Stefan Kolmar, Neo4j
Neo4j
 
PDF
Real-World, Open Source, End-to-End JavaScript in IoT
All Things Open
 
PPTX
Big Data & IoT. Opportunities and challenges
MediaTek Labs
 
PDF
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
Elasticsearch
 
PDF
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Looker
 
PDF
Webinar-Serie: Digital Experiences, Teil 1: Innovative Konzepte
Acquia
 
PDF
Session 4 - A practical journey on how to use the DataBench Toolbox
DataBench
 
Continuum Analytics and Python
Travis Oliphant
 
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
Neo4j
 
DevOpsGuys FutureDecoded 2016 - is DevOps the Answer
DevOpsGroup
 
Datenstrategie der Zukunft - Technologietrends, die Sie kennen müssen
Denodo
 
Cubitic: Predictive Analytics
huguk
 
Criteo TektosData Meetup
Olivier Koch
 
Rethink! How Digital Transformation disrupts Enterprise Architecture
LeanIX GmbH
 
Digital Personalisation: Growing Revenue Faster with Digital Experiences That...
DRI - Discovery/Reinvention/Integration/
 
Datasciencein E-commerce industry
Rakuten Group, Inc.
 
Rakuten - Recommendation Platform
Karthik Murugesan
 
Sinergija 11 Introduction to HealthVault
Catalin Gheorghiu
 
What's new in the latest source{d} releases!
source{d}
 
BUDDY White Paper
Achmad Surya Afandy
 
Building Intelligent Solutions with Graphs, Stefan Kolmar, Neo4j
Neo4j
 
Real-World, Open Source, End-to-End JavaScript in IoT
All Things Open
 
Big Data & IoT. Opportunities and challenges
MediaTek Labs
 
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
Elasticsearch
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Looker
 
Webinar-Serie: Digital Experiences, Teil 1: Innovative Konzepte
Acquia
 
Session 4 - A practical journey on how to use the DataBench Toolbox
DataBench
 
Ad

Recently uploaded (20)

PPTX
Civil Engineering Practices_BY Sh.JP Mishra 23.09.pptx
bineetmishra1990
 
PDF
Advanced LangChain & RAG: Building a Financial AI Assistant with Real-Time Data
Soufiane Sejjari
 
PDF
AI-Driven IoT-Enabled UAV Inspection Framework for Predictive Maintenance and...
ijcncjournal019
 
PDF
Unit I Part II.pdf : Security Fundamentals
Dr. Madhuri Jawale
 
PDF
STUDY OF NOVEL CHANNEL MATERIALS USING III-V COMPOUNDS WITH VARIOUS GATE DIEL...
ijoejnl
 
PDF
settlement FOR FOUNDATION ENGINEERS.pdf
Endalkazene
 
PDF
Cryptography and Information :Security Fundamentals
Dr. Madhuri Jawale
 
PDF
top-5-use-cases-for-splunk-security-analytics.pdf
yaghutialireza
 
PPTX
Tunnel Ventilation System in Kanpur Metro
220105053
 
PDF
Packaging Tips for Stainless Steel Tubes and Pipes
heavymetalsandtubes
 
PPTX
Information Retrieval and Extraction - Module 7
premSankar19
 
PPTX
database slide on modern techniques for optimizing database queries.pptx
aky52024
 
PDF
Zero carbon Building Design Guidelines V4
BassemOsman1
 
PPT
Understanding the Key Components and Parts of a Drone System.ppt
Siva Reddy
 
PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PDF
LEAP-1B presedntation xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
hatem173148
 
PDF
Zero Carbon Building Performance standard
BassemOsman1
 
PPT
1. SYSTEMS, ROLES, AND DEVELOPMENT METHODOLOGIES.ppt
zilow058
 
PDF
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
PPTX
business incubation centre aaaaaaaaaaaaaa
hodeeesite4
 
Civil Engineering Practices_BY Sh.JP Mishra 23.09.pptx
bineetmishra1990
 
Advanced LangChain & RAG: Building a Financial AI Assistant with Real-Time Data
Soufiane Sejjari
 
AI-Driven IoT-Enabled UAV Inspection Framework for Predictive Maintenance and...
ijcncjournal019
 
Unit I Part II.pdf : Security Fundamentals
Dr. Madhuri Jawale
 
STUDY OF NOVEL CHANNEL MATERIALS USING III-V COMPOUNDS WITH VARIOUS GATE DIEL...
ijoejnl
 
settlement FOR FOUNDATION ENGINEERS.pdf
Endalkazene
 
Cryptography and Information :Security Fundamentals
Dr. Madhuri Jawale
 
top-5-use-cases-for-splunk-security-analytics.pdf
yaghutialireza
 
Tunnel Ventilation System in Kanpur Metro
220105053
 
Packaging Tips for Stainless Steel Tubes and Pipes
heavymetalsandtubes
 
Information Retrieval and Extraction - Module 7
premSankar19
 
database slide on modern techniques for optimizing database queries.pptx
aky52024
 
Zero carbon Building Design Guidelines V4
BassemOsman1
 
Understanding the Key Components and Parts of a Drone System.ppt
Siva Reddy
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
LEAP-1B presedntation xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
hatem173148
 
Zero Carbon Building Performance standard
BassemOsman1
 
1. SYSTEMS, ROLES, AND DEVELOPMENT METHODOLOGIES.ppt
zilow058
 
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
business incubation centre aaaaaaaaaaaaaa
hodeeesite4
 

Tech Job Conference: Software Engineer @Criteo

  • 1. Software Engineer @Criteo AI Lab Gilles LEGOUX at Grenoble INP - Ensimag 2020-11-10 <[email protected]>
  • 2. 2 • What is Criteo ? The leading advertising platform for the open internet Open Internet AI* Engine E-commerce Dataset Criteo was founded in 2005 +2700 employees with +650 in R&D See more details AMERICAS EMEA APAC Publisher Access Advertiser Platform *: Artificial Intelligence Source Criteo in 2019 30 locations in the world with Paris (FR), Grenoble (FR), Ann-Arbor (USA) as R&D offices See more details
  • 3. 3 • " I am Software Engineer @Criteo R&D* in the Criteo AI Lab, more precisely in UC** Team " https://ailab.criteo.com *: Research & Development **: Universal Catalog
  • 4. 4 • 2016 2014 2017 2020 Information Systems Engineering specialization Startup experience as Web Software Engineer Post master's degree in data science and big data Software Reliability Engineer Software Engineer at Criteo AI Lab
  • 5. 5 • Ad Online World go, Go, GO!
  • 6. 6 • Criteo demo Ad choices " Here an online ad " Publisher website Advertiser website
  • 7. 7 • Who are the top acting companies of the online ad world? go, Go, GO! Source SimilarTech* online data *Bing for Microsoft, Double click is Google, Taboola bought Outbrain, Amazon new actors should be present for 10K sites (data viewed in 2020-10 but should in ~2018) Ads market share
  • 8. 8 • How works of the online ads ? source ad-exchange.fr DSP: Demand-Side Platforms SSP: Supply-Side Platforms Cash flow go, Go, GO! CTR* 1% x2 in relation to competitor's average *: Click Through Rate Cash flow
  • 9. 9 • What are products provided by Criteo ? https://marketing.criteo.com Advertiser Platform Criteo is a full DSP, our main business partners are advertisers: • Import products • Manage campaigns with budget & audience rules • Analyze results • Create ads https://pmc.criteo.com Publisher Access Our business partners are publishers, but Criteo can use other SSPs to provides ads.
  • 10. 10 • Criteo datasets User Events Advertiser Configuration E-commerce Dataset AI* Engine Universal Catalog Advertiser catalogs
  • 11. 11 • How the users interact with the online ads ? RTB: Real Time Bidding CAS: Criteo Ads Server CAT : Criteo Ads Targeting CRITEO INTERNET Billing Views Displays Clicks Events View, List, Basket, Sale Auctions & Biddings Loading script Browsing Open Internet Won auction static.criteo.net
  • 12. 12 • How the universal catalog is used ? Publisher Direct Access & SSPs RTB Render Ads Creator Reco* Universal Catalog + 12B products Audience Budget *: Recommendation User Web Client Arbitrage CAS CAT Campaign Internal Criteo Network The Internet Advertiser
  • 13. 13 • Criteo AI* Lab go, Go, GO! *: Artificial Intelligence
  • 14. 14 • What and who is the Criteo AI Lab ? • R&D department • Machine Learning, said ML • Researchers & Software Engineers Infrastructure Product Engineering Site Reliability Engineering Product Engineering Engineering Pprogram Management Research & Development Product Engineering
  • 15. 15 • How and why is the Criteo AI Lab ? • 4 groups of teams • Provide ML state-of-art for Criteo • Academic contributions & visibility Criteo AI Lab Structure Product Engineering Research CAML** ML Platform Recommendation **: Criteo Applied Machine Learning *: Universal Catalog UC* Team
  • 16. 16 • A yearly kick-off for the Criteo strategy. We have a 9 months plan, several Objective Key Results (OKRs) per quarter, bi-weeks scrum sprint, and daily tasks. Organization of a team " Every team is owner of its own daily organizations with a common culture " Team members EPM* Manager Team lead *: Engineering Program Manager Software Engineer Product Owner
  • 17. 17 • Workday 8h-10h Start • Development: Single/Pair/Mob programming for maintenance, tech debt, features, hot fixes • Meeting: Demo, Sharing Knowledge, Brainstorming, Project, 1:1 team lead or manager • Communication: Email/Slack Questions, News • Documentation: User/Developer/Design/Code/Organization • Event: Social, Conference, CAIL/R&D All Hands, CTF*, Hackathon • Learning: Online courses, blog articles reading, competition • Break: coffee, lunch 17h-21h End *: Capture The Flag
  • 18. 18 • Used Tools Instant messaging Code versioning Presentation, Email, Calendar management Programming language Online meetings platform Ticketing management Documentation management Feedback platform Award platform Integrated Development Environment
  • 19. 19 • Software Engineer Skills Feedback processes 2 times in the year: middle of year and end of year by your peers, from a matrix of levels (junior, senior, staff, senior staff, principal, …) based on these 10 skills. Hard skills Soft skills
  • 20. 20 • Interactions Research *: Engineering Program Manager Software Engineer Data scientist Software Engineer Site Reliability Engineer Product Analyst Manager EPM* Product Owner Users
  • 22. 22 • Our mission ? Outcome Universal Catalog + 12B enriched products Advertiser catalogs +30K catalogs Merge and unify all advertiser catalogs to a universal catalog.
  • 23. 23 • How to build this universal catalog ? Product Model Prediction Enriched Product Simple processing Build the universal catalog for Criteo business with machine learning and data processing algorithms.
  • 24. 24 • What are the features of an enriched product ? Outcome Provided features vendor id title description category brand price universal brand universal category gender price in euros price range Product Enriched Product vendor id title description category brand price Enrichments
  • 25. 25 • What is our data ? Universal Catalog + 30K products catalogs + 12B products 12 languages Outcome Product Universal Categories +5K Product Universal Brands +60K E-commerce Dataset
  • 26. 26 • What is the universal category model ? AI Engine Deep Learning model title description Product Predicted universal leaf category Supervised model for classification with K classes
  • 27. 27 • What is the technical environment ? annotate products Import catalogs meta store ML labs & experiments models metadata deploy model sample products enrich the products with predictions or simple processes feed data sets get data sets Annotation API & UI Jobs scheduler Advertiser catalogs data sets AI Engine data warehouse
  • 28. 28 • What are our components ? • Scheduler with a Spark job • Web Application • Machine learning lab
  • 29. 29 • Build Tools Server CI/CD* Server Review Server Gerrit server Artifact stores Deployment Server Container platform *: Continous Integration/Continous Delivery Workstation What is the development cycle and the pipeline for “go to (pre-)production” ? Preprod or prod? Datacenter(s)? .pex .jar
  • 30. 30 • What is the production environment ? Container platform meta store Container platform models data warehouse metadata universal catalog databases Spark job enricher Jobs scheduler
  • 31. 31 • What is the technical stack ? Jobs Web applications ML labs & experiments Analytics Monitoring Container platforms Storages
  • 33. 33 • " We are recruiting ! " Already +20 graduates here " Join us ✌️ " Criteo Tech blog Criteo Open Positions criteo.com
  • 34. Q & A go, Go, GO! [email protected] @gilleslegoux
  • 35. 35 • Criteoers* contribute and create regularly open source projects , but we have some internal projects to keep advance on our competitors! *: name for the employees of Criteo Criteo GitHub Open source projects See more details Criteo Gitlab Experiment internal projects Criteo Gerrit Production internal projects What's about open source? " We love Open Source projects "
  • 36. 36 • One situation by location, but remote work is "strongly advised" until June 2021 for Paris and Grenoble. We have a small impact on our business due to Covid-19. What's happen with Covid-19 ? " Everyone is safe , business is good " Covid-19 vs Criteo See more details
  • 37. 37 • Each team has a part of this common tech stack, and can use any tech for experiments. What's about your technical stack ? " It depends on your team and mission, but we have a common tech stack! " Criteo Tech Stack See more details
  • 38. 38 • We have 1 kickoff, 1 hackathon (3 days) and 2 conferences per year, an onboarding with datacenter visit, paying external trainings or internal trainings, tool licenses, matrix levels (SRE, SDE, ML ENG, ...), 3 voyager programs, peer feedbacks every 6 months with promotion process … See working in R&D to join us What's about professional career and experience life at Criteo ? " Become a complete happy engineer! " Criteo Experience life See more details
  • 39. 39 • What are the voyager programs?
  • 41. 41 • " We are sensible at these questions " Criteo is also a society project, not only a company for the open internet! See our values and cares . Save environment See more details Respect private data See more details
  • 42. 42 • " Criteo in digits ? " See more details The development team of the future at Criteo Here are a few figures, because we like data, yes indeed we do: • 15 datacenters (9 with computing capacity + 6 dedicated to network connectivity) across US, EU, APAC • More than 35K servers, running a mix of Linux and Windows • One of the largest Hadoop clusters in Europe with close to 171 PB of storage and 42.000 cores • 250B HTTP requests and close to 4B unique banners displayed per day • 130Gbps of bandwidth, half of it through peering exchanges • Respond to bids in 80ms or less, 24/7 • Close to 4M HTTP requests per second handled during peak times • Less than 10ms on average to select optimal campaign • 10ms to find best product in catalogue of hundreds of millions of products • Tens of TB of new data stored daily • Largest public Machine Learning Dataset in the world with over 4 billion lines and over 1TB in size •Technologies: Hadoop, Couchbase, Redis, Mesos, Kafka, Storm, Cassandra, Spark, Vertica, Druid, … Source Criteo in 2019
  • 43. 43 • " What are the Criteo datacenters ? " Source Criteo in 2020
  • 44. 44 " How a data center is installed at Criteo ? " You can visit it ! go, Go, GO!