SlideShare a Scribd company logo
130 million customers
in over 190 countries
streaming 140 million hrs/day
Going FaaSter, Functions as a Service at Netflix
Going FaaSter, Functions as a Service at Netflix
4
Going FaaSter, Functions as a Service at Netflix
We use a data driven approach via A/B testing for most changes to our
product — ensuring every change delights our customers
source: https://www.optimizely.com/optimization-glossary/ab-testing/
1000s of A/B tests a year
Going FaaSter, Functions as a Service at Netflix
Going FaaSter, Functions as a Service at Netflix
Going FaaSter, Functions as a Service at Netflix
Netflix API
http://api.netflix.com
TV
iOS
Android
Windows
Browsers
Remote
Service
Layer
Search
MAP
GPS
Playback
…
Clients Client API Edge API Backend Services
The Netflix API decouples clients from the backend services, providing a
integration point for both services and clients
BFF
The Netflix API uses the BFF (backend for frontend) pattern, where the
BFF is tightly coupled to each device — making it easier to define and
adapt the UI, and streamlining releases
TV
iOS
Android
Windows
Browsers
Remote
Service
Layer
Search
MAP
GPS
Playback
…
Clients Client API Edge API Backend Services
These BFFs are maintained by the UI teams, since it’s tightly coupled to
their UI
Netflix API requirements
Velocity Reliability
Ergonomic No Operations
Going FaaSter: Function as a Service at
Netflix
@
Yunong Xiao,
Principal Software Engineer, Netflix
FaaS Evolution
Others ManageYou Manage
Services
Platform
Application
λ
Pre-Cloud
On Prem
Application
λ
Services
Platform
FaaS
Application
λ
Services
Platform
IaaS
λ
Services
Platform
Application
PaaS
Pros Cons
No-ops Homogenous architecture
Accessible monitoring & debugging
Velocity Netflix stack integration
Reliable service platform Limits: latency, memory,
execution time
Build or buy?
We’ll cover:
Runtime platform
architecture
Developer
experience
Management &
operations
Others ManageYou Manage
Services
Platform
Application
λ
Pre-Cloud
On Prem
Application
λ
Services
Platform
FaaS
Application
λ
Services
Platform
IaaS
λ
Services
Platform
Application
PaaS
We are almost completely hosted in the cloud using AWS
EC2 makes up the foundation of infrastructure at Netflix
VMs or Containers?
We chose to use containers as the foundation of our FaaS platform, as it
gave us advantages which let us build a platform that is ergonomic,
efficient, with high deployment velocity
Lightweight & Fast
Deployments
Portability across
environments
Efficient bin packing
We built Titus — our own container management platform — capable of
launching millions of containers a day
Others ManageYou Manage
Services
Platform
Application
λ
Pre-Cloud
On Prem
Application
λ
Services
Platform
FaaS
Application
λ
Services
Platform
IaaS
λ
Services
Platform
Application
PaaS
We have created a reliable, open source services platform
We have created a reliable, open source services platform
Service Discovery: Eureka https://github.com/Netflix/eureka
RPC: Ribbon (HTTP), gRPC https://github.com/Netflix/ribbon
Configuration: Archaius https://github.com/Netflix/archaius
Metrics: Atlas https://github.com/Netflix/atlas
Fault tolerance: Hystrix https://github.com/Netflix/hystrix
External LB: Zuul https://github.com/Netflix/zuul
Tracing: Mantis, Salp
…
Assembling these components yourself is time consuming, difficult, and
error prone
Assembling these components yourself is time consuming, difficult, and
error prone
You always have to keep components updated to the latest versions
yourself
You have to ensure that metrics and dashboards are created for your
service
You’re on the hook for managing and operating the infrastructure
34
You shouldn’t have to set everything up from scratch every time when all
you care about is the business logic
Others ManageYou Manage
Services
Platform
Application
λ
Pre-Cloud
On Prem
Application
λ
Services
Platform
FaaS
Application
λ
Services
Platform
IaaS
λ
Services
Platform
Application
PaaS
36
We set out to build our runtime FaaS platform that solves these issues
No assembly required
Automatic updates
Observable metrics
Managed operations
The platform is a services container that has been pre-assembled with all
of the components needed for a production ready service
Server Service
Discovery
Daemon
Metrics
Daemon
Log rotation
Service
Registration
Configuration
Metrics
Stream
Processing
RPC
Clients
Auth
Throttling
All that’s needed is for customers to insert their business logic
Server
Service
Registration
Configuration
Metrics
Stream
Processing
Service
Discovery
Daemon
Metrics
Daemon
Log rotation
RPC
Clients
Auth
Throttling
Route /foo
Route /bar
…
We package and version the platform as a single entity, and can easily
upgrade and test the components once and ensure everyone receives the
upgrade
We control the runtime, the platform can emit a consistent set of
application, RPC, and systems metrics for every function
Server
Service
Registration
Configuration
Metrics
Stream
Processing
Service
Discovery
Daemon
Metrics
Daemon
Log rotation
RPC
Clients
Auth
Throttling
Route /foo
Route /bar
…
41
We set out to build our runtime FaaS platform that solves these issues
No assembly required
Automatic updates
Observable metrics
Managed operations
We’ll cover:
Runtime platform
architecture
Developer
experience
Management &
operations
Going FaaSter, Functions as a Service at Netflix
{
"service": {
"org": "iosui",
"name": "iphone"
},
"platformVersion": "^6.0.0",
"routes": {
"routes": {
"movies": {
"get": {
"source": “./lib/endpoints/movies.js"
}
},
"profile": {
"post": {
"source": “./lib/endpoints/profile.js”
}
}
}
},
"sources": ["./lib"],
"propertiesPath": "./etc",
"startupHooks": [
"./hooks/startupHook.js"
]
Functions are managed via a configuration API, where most fields are
optional.
Service name
FaaS platform version
Function declarations
Additional source code
Configuration
Lifecycle
management
Business logic can be implemented using the popular Node.js “Connect”
style middleware which handles requests.
module.exports = function(req, res, next) {
res.send(200, req.query);
return next();
};
HTTP Request object
HTTP Response
callback
Platform components such as metrics, loggers, or RPC clients are
available via the “req” object — providing a full runtime API for
developers
module.exports = function ping(req, res, next) {
req.log.info('Hello World!');
req.getRequestContext(); // request context
req.getAtlas(); // metrics client
req.getDNAClient(); // RPC client
req.getProperties(); // Configuration Client
req.getEdgar(); // Tracing
req.getMantis(); // Stream processing client
req.getGeo(); // Geo location
req.getPassport(); // Auth
return next();
};
Long lived third party libraries can be managed via startup and shutdown
lifecycle hooks.
"startupHooks": [
"./hooks/startupHook.js"
],
"shutdownHooks": [
"./hooks/shutdownHook.js"
]
Hooks are initiated before the platform starts, have access to all platform
components, and allow for third party libraries to be made available on
the request object
// executed before platform starts
module.exports = function startuphook(opts, cb) {
// access to all platform components
opts.atlas;
opts.infrastructureInfo;
opts.log;
...
opts.properties;
opts.serviceInfo;
// return an object that will be made available
// to all functions
return cb(null, { foo: 'bar' });
};
External dependencies can be imported from
Our goal is to create a local function development experience that
improves the software development life cycle for developers
We created a developer workflow tool called NEWT (Netflix Workflow
Toolkit) which simplifies and facilitates common developer tasks 
Development
Debugging
Testing
Publishing
Deployment
Going FaaSter, Functions as a Service at Netflix
One-click setup for a consistent development environment. Installs
dependencies and keeps them updated
We created a development FaaS platform for local development —
enabling engineers to interactively test functions in seconds —
reducing friction and increasing velocity
Server
Service
Registration
Configuration
Metrics
Stream Processing
Service
Discovery
Daemon
Metrics Daemon
Log rotation
RPC
Clients
Auth
Throttling
Dev FaaS platform
local functions
live reload
Local debugging further increases velocity and reduces friction of the
SDLC
Serve
Service
Registration
Configuration
Metrics
Stream
Processing
Service
Discovery
Daemon
Metrics Daemon
Log rotation
RPC
Clients
Auth
Throttling
Dev FaaS platform
Attach debugger
local testing Logs
The local FaaS platform can be integrated and routed within the Netflix
cloud, enabling seamless end to end testing
S
Servi
Confi
Metri
Strea
Servi
Metri
Log
RPC
Auth
Throt
Zuul: Auth, SSL, … Backend servicesDevice
Local functions
Teams also want to test functions in isolation without having to connect
to or depend on upstream and downstream services
Isolated
local functions
Local functions
S
Servi
Confi
Metri
Strea
Servi
Metri
Log
RPC
Auth
Throt
Zuul: Auth, SSL, … Backend servicesDevice
The FaaS platform provides mocks and unit test APIs which allows teams
to test functions in isolation without having to connect to or depend on
upstream and downstream services
module.exports = function ping(req, res, next) {
req.log.info('Hello World!');
req.getRequestContext(); // request context
req.getAtlas(); // metrics client
req.getDNAClient(); // RPC client
req.getProperties(); // Configuration Client
req.getEdgar(); // Tracing
req.getMantis(); // Stream processing client
req.getGeo(); // Geo location
req.getPassport(); // Auth
return next();
};
Runtime API requires downstream services to be available
The FaaS platform provides mocks and unit test APIs which allows teams
to test functions in isolation without having to connect to or depend on
upstream and downstream services
// Unit test
it('should create all mocks', function(done) {
mocks.create(function(err, allMocks) {
assert.isObject(allMocks);
assert.isObject(allMocks.log);
assert.isObject(allMocks.properties);
...
assert.isObject(allMocks.req);
assert.isObject(allMocks.res);
return done();
});
});
Mocks are available from the unit test API
This development platform can also be easily deployed to Jenkins using
NEWT, unlocking CI/CD tests for both the FaaS platform and functions
themselves
We’ll cover:
Runtime platform
architecture
Developer
experience
Management &
operations
Publish
Deploy
Operate
Functions are published using our NEWT tool, and are immutably
versioned and saved in a central registry
Underneath the hood, a Docker image is created at publish time by
combining the functions and the platform into one image, achieving
immutability
FaaS base platform image
S
/etc/functions
myrepo/config.json
myrepo/foo.js
myrepo/bar.js
Customer Functions
S
Customer function image
The centralized function registry can be used to manage published
functions
These published functions can be deployed to the cloud via the NEWT
deploy commands
S
Functions are deployed using Titus, with most functions scheduled under
a few minutes
S
Registry
Titus
Container
Scheduler
S
S
S
S
S
S
S
S
…
Canary deployment and analysis can be used as part of deployment,
minimizing outages and increasing availability
Canary deployment and analysis can be used as part of deployment,
minimizing outages and increasing availability
Each deployed function version can be managed via the control plane,
with access to detailed runtime information
Detailed historical deployment and managed activity is available to aid
debugging
Autoscaling is used to automatically scale the infrastructure for each
function, saving costs and increasing availability. We require an initial
baseline configuration for each function
Metrics and dashboards are automatically generated for each function
Alerts are automatically generated based on metrics
Real time and historical logs are available
Profiling and post mortem debugging tools are made available
The infrastructure and operations of the platform and application itself is
handled by the centralized API platform team. UI teams are only
responsible for managing their individual functions
Netflix FaaS Platform
Runtime platform
architecture
Developer
experience
Management &
operations
Going FaaSter, Functions as a Service at Netflix
80
81
Going FaaSter, Functions as a Service at Netflix
Going FaaSter, Functions as a Service at Netflix
84
Questions?
@yunongx
yunong@netflix.com
@yunongx
linkedin.com/in/yunongxiao/

More Related Content

What's hot (20)

PDF
[네이버오픈소스세미나] What’s new in Zipkin - Adrian Cole
NAVER Engineering
 
PDF
FBTFTP: an opensource framework to build dynamic tftp servers
Angelo Failla
 
PDF
"Using Automation Tools To Deploy And Operate Applications In Real World Scen...
ConSol Consulting & Solutions Software GmbH
 
PDF
Seven perilous pitfalls to avoid with Java | DevNation Tech Talk
Red Hat Developers
 
PDF
Hello istio
Jooho Lee
 
PDF
The Integration of Laravel with Swoole
Albert Chen
 
PDF
Cloud native - CI/CD
Elad Hirsch
 
PDF
2018 10-31 modern-http_routing-lisa18
Sandor Szuecs
 
PPTX
Docker, Atomic Host and Kubernetes.
Jooho Lee
 
PDF
Terraform AWS modules and some best practices - September 2019
Anton Babenko
 
PPTX
Pc54
guestd9aa5
 
PDF
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
C4Media
 
PDF
zebra & openconfigd Introduction
Kentaro Ebisawa
 
PDF
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...
Chris Fregly
 
PDF
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Flink Forward
 
PDF
Building an Observability Platform in 389 Difficult Steps
DigitalOcean
 
PDF
Gotchas using Terraform in a secure delivery pipeline
Anton Babenko
 
PDF
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...
Chris Fregly
 
PDF
ApacheCon 2021 - Apache NiFi Deep Dive 300
Timothy Spann
 
PPTX
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
Altoros
 
[네이버오픈소스세미나] What’s new in Zipkin - Adrian Cole
NAVER Engineering
 
FBTFTP: an opensource framework to build dynamic tftp servers
Angelo Failla
 
"Using Automation Tools To Deploy And Operate Applications In Real World Scen...
ConSol Consulting & Solutions Software GmbH
 
Seven perilous pitfalls to avoid with Java | DevNation Tech Talk
Red Hat Developers
 
Hello istio
Jooho Lee
 
The Integration of Laravel with Swoole
Albert Chen
 
Cloud native - CI/CD
Elad Hirsch
 
2018 10-31 modern-http_routing-lisa18
Sandor Szuecs
 
Docker, Atomic Host and Kubernetes.
Jooho Lee
 
Terraform AWS modules and some best practices - September 2019
Anton Babenko
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
C4Media
 
zebra & openconfigd Introduction
Kentaro Ebisawa
 
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...
Chris Fregly
 
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Flink Forward
 
Building an Observability Platform in 389 Difficult Steps
DigitalOcean
 
Gotchas using Terraform in a secure delivery pipeline
Anton Babenko
 
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...
Chris Fregly
 
ApacheCon 2021 - Apache NiFi Deep Dive 300
Timothy Spann
 
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
Altoros
 

Similar to Going FaaSter, Functions as a Service at Netflix (20)

PDF
"Wie passen Serverless & Autonomous zusammen?"
Volker Linz
 
PDF
Serverless, oui mais pour quels usages ?
VMware Tanzu
 
PDF
Spring Boot & Spring Cloud on Pivotal Application Service
VMware Tanzu
 
PPTX
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...
apidays
 
PPTX
apidays LIVE India - REST the Events - REST APIs for Event-Driven Architectur...
apidays
 
PPTX
How to Create a Service in Choreo
WSO2
 
PDF
What's New in Confluent Platform 5.5
confluent
 
PPTX
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
VMware Tanzu
 
PPTX
Running PHP In The Cloud
Maarten Balliauw
 
PDF
[WSO2 Integration Summit Nairobi 2019] Role of Integration in an API Driven W...
WSO2
 
PDF
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
Tamir Dresher
 
PPTX
F5 Meetup presentation automation 2017
Guy Brown
 
PPTX
IBM BP Session - Multiple CLoud Paks and Cloud Paks Foundational Services.pptx
Georg Ember
 
PDF
.NET Cloud-Native Bootcamp
VMware Tanzu
 
PPTX
Serverless everywhere
Aymeric Weinbach
 
PDF
apidays LIVE Hong Kong - Orchestrating APIs at Scale by Hieu Nguyen Nhu
apidays
 
PDF
Seattle StrongLoop Node.js Workshop
Jimmy Guerrero
 
PDF
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Provectus
 
PPTX
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
VMware Tanzu
 
PDF
MNAssociationEnterpriseArchitectsCloudFoundryJuly2017
Andrew Ripka
 
"Wie passen Serverless & Autonomous zusammen?"
Volker Linz
 
Serverless, oui mais pour quels usages ?
VMware Tanzu
 
Spring Boot & Spring Cloud on Pivotal Application Service
VMware Tanzu
 
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...
apidays
 
apidays LIVE India - REST the Events - REST APIs for Event-Driven Architectur...
apidays
 
How to Create a Service in Choreo
WSO2
 
What's New in Confluent Platform 5.5
confluent
 
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
VMware Tanzu
 
Running PHP In The Cloud
Maarten Balliauw
 
[WSO2 Integration Summit Nairobi 2019] Role of Integration in an API Driven W...
WSO2
 
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
Tamir Dresher
 
F5 Meetup presentation automation 2017
Guy Brown
 
IBM BP Session - Multiple CLoud Paks and Cloud Paks Foundational Services.pptx
Georg Ember
 
.NET Cloud-Native Bootcamp
VMware Tanzu
 
Serverless everywhere
Aymeric Weinbach
 
apidays LIVE Hong Kong - Orchestrating APIs at Scale by Hieu Nguyen Nhu
apidays
 
Seattle StrongLoop Node.js Workshop
Jimmy Guerrero
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Provectus
 
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
VMware Tanzu
 
MNAssociationEnterpriseArchitectsCloudFoundryJuly2017
Andrew Ripka
 
Ad

Recently uploaded (20)

PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PPTX
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PPTX
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
Per Axbom: The spectacular lies of maps
Nexer Digital
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PPTX
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PPTX
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
PDF
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
Per Axbom: The spectacular lies of maps
Nexer Digital
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
Ad

Going FaaSter, Functions as a Service at Netflix

  • 1. 130 million customers in over 190 countries streaming 140 million hrs/day
  • 4. 4
  • 6. We use a data driven approach via A/B testing for most changes to our product — ensuring every change delights our customers source: https://www.optimizely.com/optimization-glossary/ab-testing/
  • 7. 1000s of A/B tests a year
  • 12. TV iOS Android Windows Browsers Remote Service Layer Search MAP GPS Playback … Clients Client API Edge API Backend Services The Netflix API decouples clients from the backend services, providing a integration point for both services and clients
  • 13. BFF The Netflix API uses the BFF (backend for frontend) pattern, where the BFF is tightly coupled to each device — making it easier to define and adapt the UI, and streamlining releases TV iOS Android Windows Browsers Remote Service Layer Search MAP GPS Playback … Clients Client API Edge API Backend Services
  • 14. These BFFs are maintained by the UI teams, since it’s tightly coupled to their UI
  • 15. Netflix API requirements Velocity Reliability Ergonomic No Operations
  • 16. Going FaaSter: Function as a Service at Netflix @ Yunong Xiao, Principal Software Engineer, Netflix
  • 17. FaaS Evolution Others ManageYou Manage Services Platform Application λ Pre-Cloud On Prem Application λ Services Platform FaaS Application λ Services Platform IaaS λ Services Platform Application PaaS
  • 18. Pros Cons No-ops Homogenous architecture Accessible monitoring & debugging Velocity Netflix stack integration Reliable service platform Limits: latency, memory, execution time Build or buy?
  • 20. Others ManageYou Manage Services Platform Application λ Pre-Cloud On Prem Application λ Services Platform FaaS Application λ Services Platform IaaS λ Services Platform Application PaaS
  • 21. We are almost completely hosted in the cloud using AWS
  • 22. EC2 makes up the foundation of infrastructure at Netflix
  • 24. We chose to use containers as the foundation of our FaaS platform, as it gave us advantages which let us build a platform that is ergonomic, efficient, with high deployment velocity Lightweight & Fast Deployments Portability across environments Efficient bin packing
  • 25. We built Titus — our own container management platform — capable of launching millions of containers a day
  • 26. Others ManageYou Manage Services Platform Application λ Pre-Cloud On Prem Application λ Services Platform FaaS Application λ Services Platform IaaS λ Services Platform Application PaaS
  • 27. We have created a reliable, open source services platform
  • 28. We have created a reliable, open source services platform Service Discovery: Eureka https://github.com/Netflix/eureka RPC: Ribbon (HTTP), gRPC https://github.com/Netflix/ribbon Configuration: Archaius https://github.com/Netflix/archaius Metrics: Atlas https://github.com/Netflix/atlas Fault tolerance: Hystrix https://github.com/Netflix/hystrix External LB: Zuul https://github.com/Netflix/zuul Tracing: Mantis, Salp …
  • 29. Assembling these components yourself is time consuming, difficult, and error prone
  • 30. Assembling these components yourself is time consuming, difficult, and error prone
  • 31. You always have to keep components updated to the latest versions yourself
  • 32. You have to ensure that metrics and dashboards are created for your service
  • 33. You’re on the hook for managing and operating the infrastructure
  • 34. 34 You shouldn’t have to set everything up from scratch every time when all you care about is the business logic
  • 35. Others ManageYou Manage Services Platform Application λ Pre-Cloud On Prem Application λ Services Platform FaaS Application λ Services Platform IaaS λ Services Platform Application PaaS
  • 36. 36 We set out to build our runtime FaaS platform that solves these issues No assembly required Automatic updates Observable metrics Managed operations
  • 37. The platform is a services container that has been pre-assembled with all of the components needed for a production ready service Server Service Discovery Daemon Metrics Daemon Log rotation Service Registration Configuration Metrics Stream Processing RPC Clients Auth Throttling
  • 38. All that’s needed is for customers to insert their business logic Server Service Registration Configuration Metrics Stream Processing Service Discovery Daemon Metrics Daemon Log rotation RPC Clients Auth Throttling Route /foo Route /bar …
  • 39. We package and version the platform as a single entity, and can easily upgrade and test the components once and ensure everyone receives the upgrade
  • 40. We control the runtime, the platform can emit a consistent set of application, RPC, and systems metrics for every function Server Service Registration Configuration Metrics Stream Processing Service Discovery Daemon Metrics Daemon Log rotation RPC Clients Auth Throttling Route /foo Route /bar …
  • 41. 41 We set out to build our runtime FaaS platform that solves these issues No assembly required Automatic updates Observable metrics Managed operations
  • 44. { "service": { "org": "iosui", "name": "iphone" }, "platformVersion": "^6.0.0", "routes": { "routes": { "movies": { "get": { "source": “./lib/endpoints/movies.js" } }, "profile": { "post": { "source": “./lib/endpoints/profile.js” } } } }, "sources": ["./lib"], "propertiesPath": "./etc", "startupHooks": [ "./hooks/startupHook.js" ] Functions are managed via a configuration API, where most fields are optional. Service name FaaS platform version Function declarations Additional source code Configuration Lifecycle management
  • 45. Business logic can be implemented using the popular Node.js “Connect” style middleware which handles requests. module.exports = function(req, res, next) { res.send(200, req.query); return next(); }; HTTP Request object HTTP Response callback
  • 46. Platform components such as metrics, loggers, or RPC clients are available via the “req” object — providing a full runtime API for developers module.exports = function ping(req, res, next) { req.log.info('Hello World!'); req.getRequestContext(); // request context req.getAtlas(); // metrics client req.getDNAClient(); // RPC client req.getProperties(); // Configuration Client req.getEdgar(); // Tracing req.getMantis(); // Stream processing client req.getGeo(); // Geo location req.getPassport(); // Auth return next(); };
  • 47. Long lived third party libraries can be managed via startup and shutdown lifecycle hooks. "startupHooks": [ "./hooks/startupHook.js" ], "shutdownHooks": [ "./hooks/shutdownHook.js" ]
  • 48. Hooks are initiated before the platform starts, have access to all platform components, and allow for third party libraries to be made available on the request object // executed before platform starts module.exports = function startuphook(opts, cb) { // access to all platform components opts.atlas; opts.infrastructureInfo; opts.log; ... opts.properties; opts.serviceInfo; // return an object that will be made available // to all functions return cb(null, { foo: 'bar' }); };
  • 49. External dependencies can be imported from
  • 50. Our goal is to create a local function development experience that improves the software development life cycle for developers
  • 51. We created a developer workflow tool called NEWT (Netflix Workflow Toolkit) which simplifies and facilitates common developer tasks  Development Debugging Testing Publishing Deployment
  • 53. One-click setup for a consistent development environment. Installs dependencies and keeps them updated
  • 54. We created a development FaaS platform for local development — enabling engineers to interactively test functions in seconds — reducing friction and increasing velocity Server Service Registration Configuration Metrics Stream Processing Service Discovery Daemon Metrics Daemon Log rotation RPC Clients Auth Throttling Dev FaaS platform local functions live reload
  • 55. Local debugging further increases velocity and reduces friction of the SDLC Serve Service Registration Configuration Metrics Stream Processing Service Discovery Daemon Metrics Daemon Log rotation RPC Clients Auth Throttling Dev FaaS platform Attach debugger local testing Logs
  • 56. The local FaaS platform can be integrated and routed within the Netflix cloud, enabling seamless end to end testing S Servi Confi Metri Strea Servi Metri Log RPC Auth Throt Zuul: Auth, SSL, … Backend servicesDevice Local functions
  • 57. Teams also want to test functions in isolation without having to connect to or depend on upstream and downstream services Isolated local functions Local functions S Servi Confi Metri Strea Servi Metri Log RPC Auth Throt Zuul: Auth, SSL, … Backend servicesDevice
  • 58. The FaaS platform provides mocks and unit test APIs which allows teams to test functions in isolation without having to connect to or depend on upstream and downstream services module.exports = function ping(req, res, next) { req.log.info('Hello World!'); req.getRequestContext(); // request context req.getAtlas(); // metrics client req.getDNAClient(); // RPC client req.getProperties(); // Configuration Client req.getEdgar(); // Tracing req.getMantis(); // Stream processing client req.getGeo(); // Geo location req.getPassport(); // Auth return next(); }; Runtime API requires downstream services to be available
  • 59. The FaaS platform provides mocks and unit test APIs which allows teams to test functions in isolation without having to connect to or depend on upstream and downstream services // Unit test it('should create all mocks', function(done) { mocks.create(function(err, allMocks) { assert.isObject(allMocks); assert.isObject(allMocks.log); assert.isObject(allMocks.properties); ... assert.isObject(allMocks.req); assert.isObject(allMocks.res); return done(); }); }); Mocks are available from the unit test API
  • 60. This development platform can also be easily deployed to Jenkins using NEWT, unlocking CI/CD tests for both the FaaS platform and functions themselves
  • 63. Functions are published using our NEWT tool, and are immutably versioned and saved in a central registry
  • 64. Underneath the hood, a Docker image is created at publish time by combining the functions and the platform into one image, achieving immutability FaaS base platform image S /etc/functions myrepo/config.json myrepo/foo.js myrepo/bar.js Customer Functions S Customer function image
  • 65. The centralized function registry can be used to manage published functions
  • 66. These published functions can be deployed to the cloud via the NEWT deploy commands S
  • 67. Functions are deployed using Titus, with most functions scheduled under a few minutes S Registry Titus Container Scheduler S S S S S S S S …
  • 68. Canary deployment and analysis can be used as part of deployment, minimizing outages and increasing availability
  • 69. Canary deployment and analysis can be used as part of deployment, minimizing outages and increasing availability
  • 70. Each deployed function version can be managed via the control plane, with access to detailed runtime information
  • 71. Detailed historical deployment and managed activity is available to aid debugging
  • 72. Autoscaling is used to automatically scale the infrastructure for each function, saving costs and increasing availability. We require an initial baseline configuration for each function
  • 73. Metrics and dashboards are automatically generated for each function
  • 74. Alerts are automatically generated based on metrics
  • 75. Real time and historical logs are available
  • 76. Profiling and post mortem debugging tools are made available
  • 77. The infrastructure and operations of the platform and application itself is handled by the centralized API platform team. UI teams are only responsible for managing their individual functions
  • 78. Netflix FaaS Platform Runtime platform architecture Developer experience Management & operations
  • 80. 80
  • 81. 81
  • 84. 84