SlideShare a Scribd company logo
Dave Wetzel, COO and CTO, MLS Listings
Sergey Ermolin, Solutions Architect, Intel
Real Estate Search Ranking
with BigDL Framework on
Microsoft Azure Platform
#ExpSAIS16
MLSListings Inc, Sunnyvale, California
MLSListings Business Use-Case:
Personalized Visual Search Ranking
Image similarity is an
extra search parameter
along with area, location,
size, price, etc.
If you looked at this house….. You will want to look at this one, too
Business need: real estate search results need to be
sorted based on image similarities of attached photos
4
Implementation Example - 1
#ExpSAIS16
5
Implementation Example - 2
#ExpSAIS16
LIVE DEMO
6
High-Level Data Workflow
7#ExpSAIS16
OData Media
API
Machine
Learning
Process
Property
Details
WebPage
API
Ranked
Images
Top
10
8
Labeled Dataset
of Real Estate
Images
(Bing Search)
BigDL
VGG
Arch
BigDL
Trained
Model
+ =
Data Engineering
Deep
Learning
● Long Compute. 8500 images
● 2 nodes, 28 cores/node. 3 minutes for a one single pass
● Model parameters are changing.
● Repeat until convergence
● But: only do once !
Note-1: Images are *not* stored in the model
Note-2: you can trade compute resources for time.
Compute
TrainingDeep Learning Data Flow
#ExpSAIS16
Deep Learning Data Flow
9
Inference
Feature
Vector
BigDL
Trained
Model
Image Class
(Front, Bdr, Bath,…)
House Style Tag
(Ranch, Victorian,…)
House Levels (1, 2..)
Latent Features (25k
entries)
+
=
● Short Compute. Real Time
● 1 node, 1 core/node
● Model parameters unchanged.
● Only run once per image
● But: need to do for every image in
the searched dataset !
Compute Only
#ExpSAIS16
10
Labeled Dataset
of Real Esate
Images
(Bing Search)
BigDL
VGG
CNN
BigDL
Trained
Model
+ =
Feature
Vector
BigDL
Trained
Model
Image Class
(Front, Bdr, Bath,…)
House Style Tag
(Ranch, Victorian,…)
House Levels (1, 2..)
Latent Features (25k
entries)
+
=
TrainingInference
Deep Learning Data Flow – putting it together
#ExpSAIS16
11
Deep Learning Data Flow
Listing
images
BigDL
Trained
Model
Feature
Vectors
MLS Query
Landing Page
BigDL
Trained
Model
Cosine
Similarity
+ Tag
+ Class
Rank
Top
10
MLS
Listings
Front
End
Real-time
Ranking
#ExpSAIS16
(Weighed)
12
Deep Learning Data Flow
Labeled
Image
Dataset (Bing
Search)
Big DL Model+ =
Listing Images
Big DL
+
Model
Big DL
+
Model
Cosine
Similarity
+ Tag
+ Class
Rank
• Room Type (Class)
• House Style (Tag)
• House Levels (Tag)
• Features (25k vec)
Top
10
MLS
Listings
Front
End
#ExpSAIS16
13
Implementation Example - 3
#ExpSAIS16
Implementation - BigDL Building BigDL Graph
• Prepare Training/Validation data.
• Image Transformer:
– Image scale/crop
– Channel color normalizing
• Caffe Model Import
• Render BigDL as SparkML Transformer
• Create BigDL Linear SoftMax model
• Define Classifier, SparkML Transformer
• Set up SparkML Pipeline
Executing BigDL Graph
.
Referethub
https://github.com/intel-analytics/analytics-zoo
BigDL is an open-source distributed deep
learning library for Apache Spark* that can
run directly on top of existing Spark or
Apache Hadoop* clusters
Feature Parity &
Model Exchange
with TensorFlow*,
Caffe*, Keras, Torch*
Lower TCO and
improved ease of
use with existing
infrastructure
Deep Learning on
Big Data Platform,
Enabling Efficient
Scale-Out
BigDL
Spark Core
High BigDL: Performance Deep Learning for
Apache Spark* on CPU Infrastructure
No need to deploy costly GPUs, duplicate data,
or suffer through scaling headaches!
Designed and Optimized for Intel® Xeon®
Ideal for DL Models TRAINING and INFERENCE
Powered by Intel® MKL and multi-threaded programming
#ExpSAIS16 15
https://github.com/intel-analytics/analytics-zoo
ModelsInteroperability Support
• Model Snapshot
• Long training work checkpoint
• Model deployment and sharing
• Fine-tune
• Caffe/Torch/Tensorflow Model Support
• Model file load
• Easy to migrate your Caffe/Torch/Tensorflow
code base to Spark
• NEW - BigDL supports loading pre-defined
Keras models (Keras 1.2.2)
Caffe Model
File
Torch Model
File
Storage
BigDL
BigDL Model
File
Load
Save
Tensorflow
Model File
16
#ExpSAIS16 #ExpSAIS16
https://github.com/intel-analytics/analytics-zoo
Visualization for Learning
BigDL integration with TensorBoard
• TensorBoard is a suite of web applications from Google for visualizing and
understanding deep learning applications
17
#ExpSAIS16https://github.com/intel-analytics/analytics-zoo
2018-BigDLAnalyticsZooStack
Reference Use Cases
Anomaly detection, sentiment analysis, fraud detection,
chatbot, sequence prediction, etc.
Built-In Algorithms and
Models
Image classification, object detection, text classification,
recommendations, GAN, etc.
Feature Engineering and
Transformations
Image, text, speech, 3D imaging, time series, etc.
High-Level Pipeline APIs
DataFrames, ML Pipelines, Autograd, Transfer Learning,
etc.
Runtime Environment Spark, BigDL, Python, etc.
Making it easier to build end-to-end analytics + AI applications
https://github.com/intel-analytics/analytics-zoo
Engineering Team
19
● Data scientist, proficient in Machine Learning / Deep Learning
● Software Engineer, experience with Apache Spark.
● Technical project manager
Domain Expertise:
● Machine Learning / Deep Learning,
● Python, Scala
● Software Engineer, Web API
● Software Engineer, Web UI
Domain Expertise:
● OData, .net Core MSSQL
● C#, HTML, JavaScript
#ExpSAIS16
“ROAD AHEAD”
20
“Fireplace in the living room”
1. Feature extraction and tagging.
2. Image-based listings search
3. Feature verification based on listing images.
4. Image-based compliance and quality check
21
LIVE DEMO
22
Infrastructure
23#ExpSAIS16
MLS Listings Apps and Services
24#ExpSAIS16
Microsoft
Azure
RESO
ODATA
API
homes.mlslistings.com
MLSL
Azure
Web App
Service
Azure
SQL
DBData Feed Customers
Full
PayloadVOW
PayloadIDX
Payload
OPENIDCONNECT
Big DL
Roles and Responsibilities
• Microsoft: Microsoft's data science team in Mountain View, CA participated in project
discussions and provided Azure Data Science VM to deploy and train the deep learning
model.
• • Microsoft - Apache Spark Cloud Service Provider
• • Intel - BigDL distributed Deep Learning Library
• • MLSListings - RESO Web API Provider
• Intel: Team members worked to integrate MLSListings’s OData Media Services to deploy a
custom real estate image similarity comparison solution on Azure using Big DL.
• MLSListings : MLSListings’s team working on new web portal provided Media API and
worked on the user interface to integrate with Big DL API.
25#ExpSAIS16
BigDL: Python API
• Support deep learning model training,
evaluation, inference
• Support Spark v1.6 - 2.2
• Support Python 2.7/3.5/3.6
• Based on PySpark, Python API in
BigDL allows use of existing Python
libs (Numpy, Scipy, Pandas, Scikit-
learn, NLTK, Matplotlib, etc)
26#ExpSAIS16

More Related Content

What's hot (20)

PDF
Harnessing Spark Catalyst for Custom Data Payloads
Simeon Fitch
 
PPTX
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
Jasjeet Thind
 
PDF
KFServing, Model Monitoring with Apache Spark and a Feature Store
Databricks
 
PDF
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Databricks
 
PDF
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Databricks
 
PDF
When Apache Spark Meets TiDB with Xiaoyu Ma
Databricks
 
PDF
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Databricks
 
PDF
Automating Predictive Modeling at Zynga with PySpark and Pandas UDFs
Databricks
 
PDF
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Databricks
 
PDF
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Databricks
 
PDF
Observability for Data Pipelines With OpenLineage
Databricks
 
PDF
Multi runtime serving pipelines for machine learning
Stepan Pushkarev
 
PDF
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
Databricks
 
PDF
Tuning ML Models: Scaling, Workflows, and Architecture
Databricks
 
PDF
Scaling Ride-Hailing with Machine Learning on MLflow
Databricks
 
PDF
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Databricks
 
PDF
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
Databricks
 
PDF
Accelerating Machine Learning on Databricks Runtime
Databricks
 
PDF
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
VMware Tanzu
 
PDF
Extending Machine Learning Algorithms with PySpark
Databricks
 
Harnessing Spark Catalyst for Custom Data Payloads
Simeon Fitch
 
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
Jasjeet Thind
 
KFServing, Model Monitoring with Apache Spark and a Feature Store
Databricks
 
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Databricks
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Databricks
 
When Apache Spark Meets TiDB with Xiaoyu Ma
Databricks
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Databricks
 
Automating Predictive Modeling at Zynga with PySpark and Pandas UDFs
Databricks
 
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Databricks
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Databricks
 
Observability for Data Pipelines With OpenLineage
Databricks
 
Multi runtime serving pipelines for machine learning
Stepan Pushkarev
 
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
Databricks
 
Tuning ML Models: Scaling, Workflows, and Architecture
Databricks
 
Scaling Ride-Hailing with Machine Learning on MLflow
Databricks
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Databricks
 
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
Databricks
 
Accelerating Machine Learning on Databricks Runtime
Databricks
 
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
VMware Tanzu
 
Extending Machine Learning Algorithms with PySpark
Databricks
 

Similar to Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience at Scale with Sergey Ermolin and Dave Wetzel (20)

PDF
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Maurice Nsabimana
 
PDF
BigDL webinar - Deep Learning Library for Spark
DESMOND YUEN
 
PDF
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
Databricks
 
PDF
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Databricks
 
PDF
Distributed Deep Learning At Scale On Apache Spark With BigDL
Yulia Tell
 
PPTX
BigDL Deep Learning in Apache Spark - AWS re:invent 2017
Dave Nielsen
 
PDF
Very large scale distributed deep learning on BigDL
DESMOND YUEN
 
PDF
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Databricks
 
PDF
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Databricks
 
PDF
End-to-End Big Data AI with Analytics Zoo
Jason Dai
 
PDF
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
Spark Summit
 
PDF
[Giovanni Galloro] How to use machine learning on Google Cloud Platform
MeetupDataScienceRoma
 
PDF
Data Summer Conf 2018, “Build, train, and deploy machine learning models at s...
Provectus
 
PDF
Session 1 - The Current Landscape of Big Data Benchmarks
DataBench
 
PDF
Key projects in AI, ML and Generative AI
Vijayananda Mohire
 
PPTX
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
sparkflows
 
PDF
Machine learning and big data @ uber a tale of two systems
Zhenxiao Luo
 
PDF
DeepScale: Real-Time Perception for Automated Driving
Forrest Iandola
 
PPTX
AI Pipelines - Phoenix Data Conference 2019
Crystal Taggart
 
PDF
Integrating Deep Learning Libraries with Apache Spark
Databricks
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Maurice Nsabimana
 
BigDL webinar - Deep Learning Library for Spark
DESMOND YUEN
 
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
Databricks
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Databricks
 
Distributed Deep Learning At Scale On Apache Spark With BigDL
Yulia Tell
 
BigDL Deep Learning in Apache Spark - AWS re:invent 2017
Dave Nielsen
 
Very large scale distributed deep learning on BigDL
DESMOND YUEN
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Databricks
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Databricks
 
End-to-End Big Data AI with Analytics Zoo
Jason Dai
 
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
Spark Summit
 
[Giovanni Galloro] How to use machine learning on Google Cloud Platform
MeetupDataScienceRoma
 
Data Summer Conf 2018, “Build, train, and deploy machine learning models at s...
Provectus
 
Session 1 - The Current Landscape of Big Data Benchmarks
DataBench
 
Key projects in AI, ML and Generative AI
Vijayananda Mohire
 
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
sparkflows
 
Machine learning and big data @ uber a tale of two systems
Zhenxiao Luo
 
DeepScale: Real-Time Perception for Automated Driving
Forrest Iandola
 
AI Pipelines - Phoenix Data Conference 2019
Crystal Taggart
 
Integrating Deep Learning Libraries with Apache Spark
Databricks
 
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
PPT
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 4
Databricks
 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
 
PDF
Learn to Use Databricks for Data Science
Databricks
 
PDF
Why APM Is Not the Same As ML Monitoring
Databricks
 
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
PDF
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
PDF
Sawtooth Windows for Feature Aggregations
Databricks
 
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
PDF
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
PDF
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Ad

Recently uploaded (20)

PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PDF
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PDF
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
PDF
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PPT
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 

Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience at Scale with Sergey Ermolin and Dave Wetzel

  • 1. Dave Wetzel, COO and CTO, MLS Listings Sergey Ermolin, Solutions Architect, Intel Real Estate Search Ranking with BigDL Framework on Microsoft Azure Platform #ExpSAIS16
  • 3. MLSListings Business Use-Case: Personalized Visual Search Ranking Image similarity is an extra search parameter along with area, location, size, price, etc. If you looked at this house….. You will want to look at this one, too Business need: real estate search results need to be sorted based on image similarities of attached photos
  • 7. High-Level Data Workflow 7#ExpSAIS16 OData Media API Machine Learning Process Property Details WebPage API Ranked Images Top 10
  • 8. 8 Labeled Dataset of Real Estate Images (Bing Search) BigDL VGG Arch BigDL Trained Model + = Data Engineering Deep Learning ● Long Compute. 8500 images ● 2 nodes, 28 cores/node. 3 minutes for a one single pass ● Model parameters are changing. ● Repeat until convergence ● But: only do once ! Note-1: Images are *not* stored in the model Note-2: you can trade compute resources for time. Compute TrainingDeep Learning Data Flow #ExpSAIS16
  • 9. Deep Learning Data Flow 9 Inference Feature Vector BigDL Trained Model Image Class (Front, Bdr, Bath,…) House Style Tag (Ranch, Victorian,…) House Levels (1, 2..) Latent Features (25k entries) + = ● Short Compute. Real Time ● 1 node, 1 core/node ● Model parameters unchanged. ● Only run once per image ● But: need to do for every image in the searched dataset ! Compute Only #ExpSAIS16
  • 10. 10 Labeled Dataset of Real Esate Images (Bing Search) BigDL VGG CNN BigDL Trained Model + = Feature Vector BigDL Trained Model Image Class (Front, Bdr, Bath,…) House Style Tag (Ranch, Victorian,…) House Levels (1, 2..) Latent Features (25k entries) + = TrainingInference Deep Learning Data Flow – putting it together #ExpSAIS16
  • 11. 11 Deep Learning Data Flow Listing images BigDL Trained Model Feature Vectors MLS Query Landing Page BigDL Trained Model Cosine Similarity + Tag + Class Rank Top 10 MLS Listings Front End Real-time Ranking #ExpSAIS16 (Weighed)
  • 12. 12 Deep Learning Data Flow Labeled Image Dataset (Bing Search) Big DL Model+ = Listing Images Big DL + Model Big DL + Model Cosine Similarity + Tag + Class Rank • Room Type (Class) • House Style (Tag) • House Levels (Tag) • Features (25k vec) Top 10 MLS Listings Front End #ExpSAIS16
  • 14. Implementation - BigDL Building BigDL Graph • Prepare Training/Validation data. • Image Transformer: – Image scale/crop – Channel color normalizing • Caffe Model Import • Render BigDL as SparkML Transformer • Create BigDL Linear SoftMax model • Define Classifier, SparkML Transformer • Set up SparkML Pipeline Executing BigDL Graph . Referethub https://github.com/intel-analytics/analytics-zoo
  • 15. BigDL is an open-source distributed deep learning library for Apache Spark* that can run directly on top of existing Spark or Apache Hadoop* clusters Feature Parity & Model Exchange with TensorFlow*, Caffe*, Keras, Torch* Lower TCO and improved ease of use with existing infrastructure Deep Learning on Big Data Platform, Enabling Efficient Scale-Out BigDL Spark Core High BigDL: Performance Deep Learning for Apache Spark* on CPU Infrastructure No need to deploy costly GPUs, duplicate data, or suffer through scaling headaches! Designed and Optimized for Intel® Xeon® Ideal for DL Models TRAINING and INFERENCE Powered by Intel® MKL and multi-threaded programming #ExpSAIS16 15 https://github.com/intel-analytics/analytics-zoo
  • 16. ModelsInteroperability Support • Model Snapshot • Long training work checkpoint • Model deployment and sharing • Fine-tune • Caffe/Torch/Tensorflow Model Support • Model file load • Easy to migrate your Caffe/Torch/Tensorflow code base to Spark • NEW - BigDL supports loading pre-defined Keras models (Keras 1.2.2) Caffe Model File Torch Model File Storage BigDL BigDL Model File Load Save Tensorflow Model File 16 #ExpSAIS16 #ExpSAIS16 https://github.com/intel-analytics/analytics-zoo
  • 17. Visualization for Learning BigDL integration with TensorBoard • TensorBoard is a suite of web applications from Google for visualizing and understanding deep learning applications 17 #ExpSAIS16https://github.com/intel-analytics/analytics-zoo
  • 18. 2018-BigDLAnalyticsZooStack Reference Use Cases Anomaly detection, sentiment analysis, fraud detection, chatbot, sequence prediction, etc. Built-In Algorithms and Models Image classification, object detection, text classification, recommendations, GAN, etc. Feature Engineering and Transformations Image, text, speech, 3D imaging, time series, etc. High-Level Pipeline APIs DataFrames, ML Pipelines, Autograd, Transfer Learning, etc. Runtime Environment Spark, BigDL, Python, etc. Making it easier to build end-to-end analytics + AI applications https://github.com/intel-analytics/analytics-zoo
  • 19. Engineering Team 19 ● Data scientist, proficient in Machine Learning / Deep Learning ● Software Engineer, experience with Apache Spark. ● Technical project manager Domain Expertise: ● Machine Learning / Deep Learning, ● Python, Scala ● Software Engineer, Web API ● Software Engineer, Web UI Domain Expertise: ● OData, .net Core MSSQL ● C#, HTML, JavaScript #ExpSAIS16
  • 20. “ROAD AHEAD” 20 “Fireplace in the living room” 1. Feature extraction and tagging. 2. Image-based listings search 3. Feature verification based on listing images. 4. Image-based compliance and quality check
  • 21. 21
  • 24. MLS Listings Apps and Services 24#ExpSAIS16 Microsoft Azure RESO ODATA API homes.mlslistings.com MLSL Azure Web App Service Azure SQL DBData Feed Customers Full PayloadVOW PayloadIDX Payload OPENIDCONNECT Big DL
  • 25. Roles and Responsibilities • Microsoft: Microsoft's data science team in Mountain View, CA participated in project discussions and provided Azure Data Science VM to deploy and train the deep learning model. • • Microsoft - Apache Spark Cloud Service Provider • • Intel - BigDL distributed Deep Learning Library • • MLSListings - RESO Web API Provider • Intel: Team members worked to integrate MLSListings’s OData Media Services to deploy a custom real estate image similarity comparison solution on Azure using Big DL. • MLSListings : MLSListings’s team working on new web portal provided Media API and worked on the user interface to integrate with Big DL API. 25#ExpSAIS16
  • 26. BigDL: Python API • Support deep learning model training, evaluation, inference • Support Spark v1.6 - 2.2 • Support Python 2.7/3.5/3.6 • Based on PySpark, Python API in BigDL allows use of existing Python libs (Numpy, Scipy, Pandas, Scikit- learn, NLTK, Matplotlib, etc) 26#ExpSAIS16