SlideShare a Scribd company logo
Hadoop
ImageProcessing
Pipeline(HIP)
June 10, 2015
Russell Foltz-Smith
Anil Gupta
2
Image Processing Pipeline
● Acquire Images of Vehicle
● Identify updates/deletes to Images
● Generate unique URL for Images
● Crop and Resize Images
● Copy images to Asset Servers
● Dedupe Images
3
Image Processing Pipeline Example
HIP
4
Why Hadoop?
● High Scalability
● Store historical data of Images
● Fault tolerance
● Identify updates to images on basis of content of
URL
5
HIP Components
1. HBase: Datastore for Images and archiving Images
2. MapReduce: Computation engine for Image
Processor
3. Kafka: Publisher/Subscriber for pushing images to
Asset Servers
4. OpenCV Java: Image Processing library
5. Avro: Serialization library for storing data on HDFS
6
HBase Data Model
Tables:
1. IMAGE: Store current set of Images with some metadata
2. IMAGE_ARCHIVE: Stores historical data of Vehicles and
Original Images
7
Column Family Description Versions
I • Store all images of vehicle.
• Stores an Image in each Column
1
H • Stores metadata of all Images 1
Table: IMAGE
RowKey: <Vin_Number>
HBase Data Model
Read patterns for “I” and “H” are mutually exclusive
8
Column Family Description Versions
I Store original images of vehicle.
Only 1 column is stored.
10
A Stores fields of Avro Object of Vehicle
and Image for analytics
10
Table: IMAGE_ARCHIVE
RowKey: <Provider_id><Dealer_Id><vehicle_vin><Image_Index>
HBase Data Model
9
HBase Tuning
● Pre-split tables
● Keep Column names short(2-8 letters)
● Region size 8-10 GB
● Asynchronous clients should buffer(autoFlush=false) Put
operations
● Disable periodic Major Compaction
Pipeline Dataflow Overview
10
InventoryProcessor
Output
[Mapper] Parse &
Validate Records
[Reducer] Identify
CRUD Operation
Kafka
HBase
Asset Servers
CRUD in Reducer
11
Start
Is Deleted?
Yes
Delete Row
in HBase
No
Is Insert?
Yes
Download Images
Generate 6 Sizes
of Image
No Get HTTP Headers of
ImageURL and
Compare with Existing
NoHeader
Mismatch?
Do
Nothing
Yes
1. Write to HBase
2. Write to Kafka
Cascading Downloads
12
One JVM
Process
Yes
[ChainReducer]
ImageProcessorReducer
NoSocket timeout in
500 milliseconds?
No
1. Write to HBase
2. Write to Kafka
ImageProcessorMapper
ImageProcessorRetryMapper
Socket timeout
in 5 seconds?
Mark URL as
“Cannot
Process”
1313
Kafka Producer
● One message per Image file
● Producer Message Format:
● Key: ImageFileName (kafka.serializer.StringEncoder)
● Value: Image (kafka.serializer.DefaultEncoder)
Key: /inventory/10584/15/5YJSA1DP0DFP1156/6ZBQHFKBVMY7OTBO-251.jpg
Value:
14
Kafka Producer Tuning
Property Value Default Value
request.required.acks 1 0
message.send.max.retries 30 3
retry.backoff.ms 5000 100
client.id HIP “”
For Producer, to sustain NODE failure:
retry.backoff.ms * message.send.max.retries(default:100*3) > Zookeeper Timeout(default:60000)
Failure recovery in
300ms. Really?
Kafka Brokers Tuning
Property Value Default Value
log.retention.bytes 24 GB -1(unlimited)
socket.send.buffer.bytes 10485760 1048576
socket.receive.buffer.bytes 10485760 1048576
1. Data is purged when any of log.retention.bytes OR log.retention.hours exceeds.
2. log.retention.bytes = diskspace/number_of_partitions on each node
161616
OpenCV
● Used Java bindings of OpenCV to avoid using Hadoop
Streaming
● Java api is quite straight forward to encode, decode, crop
and resize.
Memory Leak:
Mat.release() has to be used to free up memory used by Mat.
17
Performance
0
50
100
150
200
250
300
350
400
3 6 9 12 15 18
H
o
u
r
s
Images(Millions)
HIP
ImageProcessor1.0
HIP scales
Linearly and
at least 10x
faster
18
Cascading Downloads
0
2
4
6
8
10
12
14
3 6 9 12 15 18
H
o
u
r
s
Images(Millions)
HIP with Cascading
HIP without Cascading
20%
performance
gain
19
FUTURE…
Machine Learning!
Thanks!
Questions?
20

More Related Content

What's hot (20)

PDF
Cassandra at eBay - Cassandra Summit 2012
Jay Patel
 
PDF
The Full MySQL and MariaDB Parallel Replication Tutorial
Jean-François Gagné
 
PPTX
Aerospike Architecture
Peter Milne
 
PPTX
Integrating Apache Spark and NiFi for Data Lakes
DataWorks Summit/Hadoop Summit
 
PDF
Release and patching strategy
Jitendra Singh
 
PPTX
Latest performance changes by Scylla - Project optimus / Nolimits
ScyllaDB
 
PPTX
Enhance your multi-cloud application performance using Redis Enterprise P2
Ashnikbiz
 
PDF
Big Data at Riot Games – Using Hadoop to Understand Player Experience - Stamp...
StampedeCon
 
PPTX
RedisConf17- Using Redis at scale @ Twitter
Redis Labs
 
PDF
MySQL Scalability and Reliability for Replicated Environment
Jean-François Gagné
 
PDF
A Day in the Life of a ClickHouse Query Webinar Slides
Altinity Ltd
 
PDF
MySQL Connectors 8.0.19 & DNS SRV
Kenny Gryp
 
PPTX
Azure Synapse Analytics Overview (r2)
James Serra
 
PDF
PostgreSQL at 20TB and Beyond
Chris Travers
 
PDF
[215]네이버콘텐츠통계서비스소개 김기영
NAVER D2
 
PPTX
Microservices Part 3 Service Mesh and Kafka
Araf Karsh Hamid
 
PPTX
Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Data Science London
 
PDF
Introduction to the Disruptor
Trisha Gee
 
PPTX
MySQL Performance Schema in MySQL 8.0
Mayank Prasad
 
PDF
Developing custom transformation in the Kafka connect to minimize data redund...
HostedbyConfluent
 
Cassandra at eBay - Cassandra Summit 2012
Jay Patel
 
The Full MySQL and MariaDB Parallel Replication Tutorial
Jean-François Gagné
 
Aerospike Architecture
Peter Milne
 
Integrating Apache Spark and NiFi for Data Lakes
DataWorks Summit/Hadoop Summit
 
Release and patching strategy
Jitendra Singh
 
Latest performance changes by Scylla - Project optimus / Nolimits
ScyllaDB
 
Enhance your multi-cloud application performance using Redis Enterprise P2
Ashnikbiz
 
Big Data at Riot Games – Using Hadoop to Understand Player Experience - Stamp...
StampedeCon
 
RedisConf17- Using Redis at scale @ Twitter
Redis Labs
 
MySQL Scalability and Reliability for Replicated Environment
Jean-François Gagné
 
A Day in the Life of a ClickHouse Query Webinar Slides
Altinity Ltd
 
MySQL Connectors 8.0.19 & DNS SRV
Kenny Gryp
 
Azure Synapse Analytics Overview (r2)
James Serra
 
PostgreSQL at 20TB and Beyond
Chris Travers
 
[215]네이버콘텐츠통계서비스소개 김기영
NAVER D2
 
Microservices Part 3 Service Mesh and Kafka
Araf Karsh Hamid
 
Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Data Science London
 
Introduction to the Disruptor
Trisha Gee
 
MySQL Performance Schema in MySQL 8.0
Mayank Prasad
 
Developing custom transformation in the Kafka connect to minimize data redund...
HostedbyConfluent
 

Viewers also liked (20)

PDF
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Yahoo Developer Network
 
PPTX
Building a Scalable Web Crawler with Hadoop
Hadoop User Group
 
PDF
Terabyte-scale image similarity search: experience and best practice
Denis Shestakov
 
PDF
Using MapReduce for Large–scale Medical Image Analysis
Institute of Information Systems (HES-SO)
 
PDF
Virtualizing Hadoop
Rommel Garcia
 
PPTX
Video Analysis in Hadoop
DataWorks Summit
 
PPTX
Big Data - The 5 Vs Everyone Must Know
Bernard Marr
 
PPTX
Big data ppt
Nasrin Hussain
 
ODP
Hug Hbase Presentation.
Jack Levin
 
PDF
A Survey on Medical Image Retrieval Based on Hadoop
Akshay Mamulwar
 
PDF
Image Classification and Retrieval logic
Gianvito Siciliano
 
PDF
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Project
 
PPTX
Hipi: Computer Vision at Large Scale
Liu Liu
 
PPTX
Introducing Big Data
Pravin Kumar Singh, PMP, PSM
 
PDF
Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processin...
Cloudera, Inc.
 
PPT
Avanced Image Classification
Bayes Ahmed
 
PDF
15 minute presentation about Thesis
Sven Meys
 
PPTX
Distro-independent Hadoop cluster management
DataWorks Summit
 
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Yahoo Developer Network
 
Building a Scalable Web Crawler with Hadoop
Hadoop User Group
 
Terabyte-scale image similarity search: experience and best practice
Denis Shestakov
 
Using MapReduce for Large–scale Medical Image Analysis
Institute of Information Systems (HES-SO)
 
Virtualizing Hadoop
Rommel Garcia
 
Video Analysis in Hadoop
DataWorks Summit
 
Big Data - The 5 Vs Everyone Must Know
Bernard Marr
 
Big data ppt
Nasrin Hussain
 
Hug Hbase Presentation.
Jack Levin
 
A Survey on Medical Image Retrieval Based on Hadoop
Akshay Mamulwar
 
Image Classification and Retrieval logic
Gianvito Siciliano
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Project
 
Hipi: Computer Vision at Large Scale
Liu Liu
 
Introducing Big Data
Pravin Kumar Singh, PMP, PSM
 
Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processin...
Cloudera, Inc.
 
Avanced Image Classification
Bayes Ahmed
 
15 minute presentation about Thesis
Sven Meys
 
Distro-independent Hadoop cluster management
DataWorks Summit
 
Ad

Similar to A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics (20)

PPTX
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
Yahoo Developer Network
 
PDF
Big Data Journey
Tugdual Grall
 
PPTX
What's the Hadoop-la about Kubernetes?
DataWorks Summit
 
PDF
SQL Engines for Hadoop - The case for Impala
markgrover
 
PDF
Savanna - Elastic Hadoop on OpenStack
Sergey Lukjanov
 
PPT
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
James Chen
 
PPTX
Bringing OLTP woth OLAP: Lumos on Hadoop
DataWorks Summit
 
PDF
Migrating deployment processes and Continuous Integration at SAP SE
B1 Systems GmbH
 
PPT
Taking your site from Drupal 6 to Drupal 7
Phase2
 
DOCX
Prashanth Kumar_Hadoop_NEW
Prashanth Shankar kumar
 
PPTX
Intro to big data analytics using microsoft machine learning server with spark
Alex Zeltov
 
PDF
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Sergey Lukjanov
 
PDF
Pivotal HAWQ 소개
Seungdon Choi
 
PDF
Dok Talks #124 - Intro to Druid on Kubernetes
DoKC
 
PDF
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
spinningmatt
 
PPTX
Paris Data Geek - Spark Streaming
Djamel Zouaoui
 
PDF
SQL and Machine Learning on Hadoop using HAWQ
pivotalny
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
Yahoo Developer Network
 
Big Data Journey
Tugdual Grall
 
What's the Hadoop-la about Kubernetes?
DataWorks Summit
 
SQL Engines for Hadoop - The case for Impala
markgrover
 
Savanna - Elastic Hadoop on OpenStack
Sergey Lukjanov
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
James Chen
 
Bringing OLTP woth OLAP: Lumos on Hadoop
DataWorks Summit
 
Migrating deployment processes and Continuous Integration at SAP SE
B1 Systems GmbH
 
Taking your site from Drupal 6 to Drupal 7
Phase2
 
Prashanth Kumar_Hadoop_NEW
Prashanth Shankar kumar
 
Intro to big data analytics using microsoft machine learning server with spark
Alex Zeltov
 
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Sergey Lukjanov
 
Pivotal HAWQ 소개
Seungdon Choi
 
Dok Talks #124 - Intro to Druid on Kubernetes
DoKC
 
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
spinningmatt
 
Paris Data Geek - Spark Streaming
Djamel Zouaoui
 
SQL and Machine Learning on Hadoop using HAWQ
pivotalny
 
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

Recently uploaded (20)

PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 

A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics

  • 2. 2 Image Processing Pipeline ● Acquire Images of Vehicle ● Identify updates/deletes to Images ● Generate unique URL for Images ● Crop and Resize Images ● Copy images to Asset Servers ● Dedupe Images
  • 4. 4 Why Hadoop? ● High Scalability ● Store historical data of Images ● Fault tolerance ● Identify updates to images on basis of content of URL
  • 5. 5 HIP Components 1. HBase: Datastore for Images and archiving Images 2. MapReduce: Computation engine for Image Processor 3. Kafka: Publisher/Subscriber for pushing images to Asset Servers 4. OpenCV Java: Image Processing library 5. Avro: Serialization library for storing data on HDFS
  • 6. 6 HBase Data Model Tables: 1. IMAGE: Store current set of Images with some metadata 2. IMAGE_ARCHIVE: Stores historical data of Vehicles and Original Images
  • 7. 7 Column Family Description Versions I • Store all images of vehicle. • Stores an Image in each Column 1 H • Stores metadata of all Images 1 Table: IMAGE RowKey: <Vin_Number> HBase Data Model Read patterns for “I” and “H” are mutually exclusive
  • 8. 8 Column Family Description Versions I Store original images of vehicle. Only 1 column is stored. 10 A Stores fields of Avro Object of Vehicle and Image for analytics 10 Table: IMAGE_ARCHIVE RowKey: <Provider_id><Dealer_Id><vehicle_vin><Image_Index> HBase Data Model
  • 9. 9 HBase Tuning ● Pre-split tables ● Keep Column names short(2-8 letters) ● Region size 8-10 GB ● Asynchronous clients should buffer(autoFlush=false) Put operations ● Disable periodic Major Compaction
  • 10. Pipeline Dataflow Overview 10 InventoryProcessor Output [Mapper] Parse & Validate Records [Reducer] Identify CRUD Operation Kafka HBase Asset Servers
  • 11. CRUD in Reducer 11 Start Is Deleted? Yes Delete Row in HBase No Is Insert? Yes Download Images Generate 6 Sizes of Image No Get HTTP Headers of ImageURL and Compare with Existing NoHeader Mismatch? Do Nothing Yes 1. Write to HBase 2. Write to Kafka
  • 12. Cascading Downloads 12 One JVM Process Yes [ChainReducer] ImageProcessorReducer NoSocket timeout in 500 milliseconds? No 1. Write to HBase 2. Write to Kafka ImageProcessorMapper ImageProcessorRetryMapper Socket timeout in 5 seconds? Mark URL as “Cannot Process”
  • 13. 1313 Kafka Producer ● One message per Image file ● Producer Message Format: ● Key: ImageFileName (kafka.serializer.StringEncoder) ● Value: Image (kafka.serializer.DefaultEncoder) Key: /inventory/10584/15/5YJSA1DP0DFP1156/6ZBQHFKBVMY7OTBO-251.jpg Value:
  • 14. 14 Kafka Producer Tuning Property Value Default Value request.required.acks 1 0 message.send.max.retries 30 3 retry.backoff.ms 5000 100 client.id HIP “” For Producer, to sustain NODE failure: retry.backoff.ms * message.send.max.retries(default:100*3) > Zookeeper Timeout(default:60000) Failure recovery in 300ms. Really?
  • 15. Kafka Brokers Tuning Property Value Default Value log.retention.bytes 24 GB -1(unlimited) socket.send.buffer.bytes 10485760 1048576 socket.receive.buffer.bytes 10485760 1048576 1. Data is purged when any of log.retention.bytes OR log.retention.hours exceeds. 2. log.retention.bytes = diskspace/number_of_partitions on each node
  • 16. 161616 OpenCV ● Used Java bindings of OpenCV to avoid using Hadoop Streaming ● Java api is quite straight forward to encode, decode, crop and resize. Memory Leak: Mat.release() has to be used to free up memory used by Mat.
  • 17. 17 Performance 0 50 100 150 200 250 300 350 400 3 6 9 12 15 18 H o u r s Images(Millions) HIP ImageProcessor1.0 HIP scales Linearly and at least 10x faster
  • 18. 18 Cascading Downloads 0 2 4 6 8 10 12 14 3 6 9 12 15 18 H o u r s Images(Millions) HIP with Cascading HIP without Cascading 20% performance gain

Editor's Notes

  • #2: To change OPENING SLIDE background image (placing image inside shape): This must be done on the MASTER LAYOUT: “COVER” Go to “Slide Master View”.  Right-Click on current background image In pop-up display select  "Format Picture“ Below “SHAPE OPTIONS” and under “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” If necessary … Select Crop Tool drop down and select “Fit” (to insure image is not distorted) If necessary … Select Crop Tool again to resize and position image inside shape
  • #3: To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  • #4: To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  • #5: To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  • #6: To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  • #7: To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  • #8: To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  • #9: To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  • #10: To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  • #11: To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  • #12: To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  • #13: To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  • #14: To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  • #15: To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  • #16: To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  • #17: To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  • #18: To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  • #19: To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  • #20: To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  • #21: To change SECTION BREAK SLIDE background image (placing image inside shape): This must be done on the MASTER LAYOUT: “SECTION#0?”. There are 5 “SECTION” master layouts with different background images. Go to “Slide Master View”.  Right-Click on current background image In pop-up display select  "Format Picture“ Below “SHAPE OPTIONS” and under “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” If necessary … Select Crop Tool drop down and select “Fit” (to insure image is not distorted) If necessary … Select Crop Tool again to resize and position image inside shape