Going Serverless with AWS Lambda at ReportGarden

Going Serverless
with AWS Lambda
Gandhi Jay

Why talk about Serverless?
● Web Scraping - Actual use case from Tribelocal
○ Use case is to find Business is listed correctly, incorrectly or not listed on directory.
○ Process
■ Scrap the directory’s web page with parameter
■ Parse businesses and it’s related information
■ Find matching record from parsed data
■ If any information is incorrect or business is not listed, Notify the user.
○ Each directory might have different way to data.

Traditional approach
● In Monolith,
○ Create Directory DataLoader
○ Write XPaths/Css Selectors
○ Write code to scrap directory’s webpage
○ Write code to parse Businesses
○ Write code to match Business and find out it’s correct, incorrect, duplicate or not
listed.
Repeat the process for another directory.

Traditional approach: is it good?
● As number of locations increases requests are
increasing.
● Simple Web scraping problem suddenly
becomes distributed computing nightmare.

● A service take independent functions
● runs it in parallel "containers"
○ can be monitored separately
○ can be scaled separately and automatically
● Pay per execution. (in ms)
● Lets you focus on application, not infrastructure.
What is Serverless?
There are actual servers, located somewhere but you don’t have to worry about.
Someone said use Serverless,
Backend as a Service Function as a Service Serverless+ =

Serverless Approach
● Choose your runtime.
● Write code for AWS Lambda.
● Put it into S3
● Configure AWS Lambda
● Configure Services like API Gateway, SNS etc.
● Test it.

Benefits of Serverless
● Increases flexibility to scale
○ Provisioning based on usage, not instances
○ Self auto-scaling and auto-provisioning
● Reduced time to write code
● Minimum risk
○ No living host or instances
● Decreased Time to Market
● Encourages a modular, well encapsulated, loosely coupled architecture
● Reduced resource cost
○ Costs based on precise usage (no usage = no cost)

Service Providers - Serverless
● AWS Lambda
● Google Cloud Functions
● Azure Functions
● Auth0 Webtask
● IBM OpenWhisk

At ReportGarden - AWS Lambda
● Function as a Service platform
○ Billed per 100ms
○ Node.js, Python, C# and JRE(OpenJDK)
● Event-driven
● Asynchronous invocation
● Can be integrated with other AWS service
● AWS Serverless Ecosystem
○ Lambda
○ API Gateway
○ SNS
○ SQS
○ DynamoDB
○ S3
○ Kinesis
○ CloudWatch
○ Step Functions

AWS Lambda Runtime Environment
● Memory Available from 128MB to 1.5GB
○ Minimum CPU speed and I/O scale will be based on Memory
● 2 Virtual CPUs
● 512MB /tmp storage
● 50MB compressed(jar/zip) for function, 250MB deployment package
● STDOUT and STDERR, goes to CloudWatch Logs
AWS Lambda - http://docs.aws.amazon.com/lambda/latest/dg/welcome.html

Functions runs when events are triggered
● A POST request is sent to API Gateway
● A scheduled cron triggers
● A new item is added to S3 bucket
● An infrastructure event matching rule is found.
● A SAAS webhook is fired.
● A link in email is clicked.
● A new item is added to DB.

Async, Latency Tolerant WebScraping with AWS Lambda Framework - Scrapie
Request with
JSON spec
AWS API
Gateway
Trigger
Lambda
AWS
SNS
scrape
-job
AWS Lambda
ScraperNotifier
Client
System
AWS Lambda
ScraperJob
Hit Callback
URL with
messageid and
content.
AWS CloudWatch &
Rollbar
(Monitoring & Logging)
AWS CloudFormation
(Orchestration)
Gradle & Serverless
(Build & Deployment)
Helium Scraping
Service
Proxy

Configure Lambda, SNS, API Gateway….
Still lot of work & We are lazy.

DEMO
● Prerequisite
○ Install NVM - Node Version Manager
○ Install serverless
○ Install AWS CLI
○ AWS configure with your AccessKey and SecretKey
○ Good to GO!

Challenges - Serverless
● State
● Latency
● Loss of control
● Testing/Tooling
○ Serverless-offline
○ localstack/localstack
○ But not lot of the options.
● Very low latency
○ High Frequency Trading
● Large Scale, in memory, stateful
○ MFTs of TBs of data.
● Long running, stateful
○ Synchronous external transactions
Terrible use cases -
Serverless

Awesome Serverless use cases.
● Async, Latency tolerant
○ Data Pipelines, Automatic Thumbnail generation, New User welcome emails, PDF Generation
● Sync, Latency tolerant
○ Web apps, APIs
● Glue
○ Infrastructure automation, orchestration
● Analytics Stack
○ AWS Athena, AWS Glue

Going Serverless with AWS Lambda at ReportGarden

More Related Content

What's hot (20)

Similar to Going Serverless with AWS Lambda at ReportGarden (20)

Recently uploaded (20)

Going Serverless with AWS Lambda at ReportGarden

Editor's Notes