SlideShare a Scribd company logo
Transitioning from
Traditional DW to Spark
in OR Predictive Modeling
Ayad Shammoutand Denny Lee
October 21st,2015
About Ayad Shammout
• Director of BusinessIntelligence,Beth IsraelDeaconess
Medical Center
• Helped build BusinessIntelligence, highlyavailable /
disaster recoveryinfrastructure for BIDMC
2
About Denny Lee
• TechnologyEvangelist, Databricks
• Former Sr. Director of Data SciencesEng, Concur
• Helped bring Hadoop onto Windows and Azure
3
We are Databricks, the company behind Spark
Founded by the creators of
Apache Spark in 2013
Share of Spark code
contributed by Databricks
in 2014
75%
4
Data Value
Created Databricks on top of Spark to make big data simple.
Why is Operating Room Scheduling
Predictive Modeling Important?
6
$15-$20 / minute for a
basic surgicalprocedure
Time is an OR's most valuable resource
Lack of OR availability
means loss of patient
OR efficiency differs depending on the
OR staffing and allocation (8, 10, 13, or 16h),
not the workload (i.e. cases)
7
“You are notgoing to getthe elephantto shrink or change itssize. You
need to face the fact that the elephantis8 OR tall and 11hrwide”
Steven Shafer, MD
8
Operating Room
Better utilization =
Better profit margins
Reduce support and
maintenance costs
Medical Staff
Better utilization =
Better profit margins
Better medical staff
efficiencies= Better
outcomes
Patients
Shorter wait times
and lesscancellations
Better medical staff
efficiencies= Better
outcomes
Develop Predictive Model
• Develop a predictive model that would identify
available OR time 15 business days in advance.
• Allowus to confirm wait list cases two weeks in
advance, instead of when the blocks normally release
four days out.
9
Forecast OR Schedule
• Case load 15 businessdays in advance
• Book more cases weeks in advance to prevent under-
utilization
• Reduce staff overtime and idle time
10
Background
• Threesurgical groups
• GYN, urology, generalsurgery,colorectal, surgical
oncology
• Eyes, plastics, ENT
• Orthopedics, podiatry
• Currentlybuilt using SQL ServerData Mining
11
Using Traditional Data Warehousing
Techniques
OR DW
SSAS Data
Mining
Data
Sources
OR
Reports
Traditional Data Warehousing & Data Mining
OR Predictive Model
Process mining model
every3 hours
OR Prediction DB
Data inserts every
3 hours
Predictionresults
14
Original Design
• Multiple data sourcespushing data into SQL Server and SQL
ServerAnalysis ServerData Mining
• Hand built 225 different DM modules (5 days, 15 businessdays
ahead, 3 differentgroups)
• Pipeline processhad to run225 times / day (3 pools x 75
modules)
15
Regression Calculations
SSAS Data Mining T-SQL Code
Intercept R2
Mean Adjusted R2
Coefficients Standard Deviation
Variance Standard Error
Taking advantage of Spark’s DW
Capabilities and MLlib
OR DWData
Sources
OR
Reports
OR Predictive Model in Spark
Data inserts every
3 hours
18
demoOR Block Scheduling
Extract History data and run linear regression with SGD
with multiple variables
19
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predictive Modeling
21
22
23
24
25
26
27
OR Schedule Report (example)
28
Why the model is working
• Can coordinate waitlist schedulinglogistics with physicians and
patients within two weeks of the surgery
• Plan staff schedulingand resourcessothere are less last-minute
staffing issuesfor nursing and anesthesia
• Utilization metrics are showing us where we can maximize our
elective surgicalscheduleand level demand
Key Learnings when Migrating from
Traditional DW to Spark
30
Transitioning to the Cloud
Beth Israel DeaconessMedical Center is increasingly moving to cloud
infrastructure services with the hopesof closing itsdata center when the
hospital's lease is up in the next five years. CIO John Halamka says he's
decommissioning HP and Dell servers as he movesmore of hiscompute
workloads to Amazon Web Services, where he's currently using 30 virtual
machines to test and develop new applications. "It is no longer cost
effective to deal with server hosting ourselves because our challenge isn't
real estate, it's power and cooling," he says.
31
Transitioning to the Cloud
• Need time for engineers,analysts, and data scientists to learn
how to build for the cloud
• Build for security right from start – processheavy, a lot of
documentation, audits / reviews
• Differentiating data engineersand engineers(REST APIs,
services, elasticity, etc.)
32
Transitioning to Spark
• No more stored proceduresor indexes
• Good for Spark SQL, services design
• Prototype, prototype, prototype
• Leverage existing languagesand skill sets
• Leverage the MOOCsand other Spark training
• Break down the silos of data engineers,engineers,data
scientists, and analysts
33
Transitioning DW to Spark
• Understand Partitioning,Broadcast Joins, and Parquet
• Not all Hive functionsare available in Spark (99%of the time that is okay) due
to Hive context
• Don’t limit yourselfto build star-schemas / snowflake schemas
• Expand outside of traditional DW: machine learning,streaming
Thank you.
For more information, please contact
ayad.shammout@hotmail.com
denny@databricks.com

More Related Content

What's hot (20)

PDF
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Databricks
 
PDF
Spark Under the Hood - Meetup @ Data Science London
Databricks
 
PDF
New Developments in Spark
Databricks
 
PDF
Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...
Databricks
 
PDF
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Spark Summit
 
PDF
New Directions for Spark in 2015 - Spark Summit East
Databricks
 
PDF
Spark Summit 2015 keynote: Making Big Data Simple with Spark
Databricks
 
PDF
Big Telco - Yousun Jeong
Spark Summit
 
PDF
What to Expect for Big Data and Apache Spark in 2017
Databricks
 
PDF
Koalas: Pandas on Apache Spark
Databricks
 
PDF
H2O World - H2O Rains with Databricks Cloud
Sri Ambati
 
PDF
Optimizing Delta/Parquet Data Lakes for Apache Spark
Databricks
 
PDF
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
Modern Data Stack France
 
PDF
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Spark Summit
 
PDF
RISELab:Enabling Intelligent Real-Time Decisions
Jen Aman
 
PDF
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
Spark Summit
 
PDF
Hugfr SPARK & RIAK -20160114_hug_france
Modern Data Stack France
 
PPTX
Building a Virtual Data Lake with Apache Arrow
Dremio Corporation
 
PDF
What's new in pandas and the SciPy stack for financial users
Wes McKinney
 
PPTX
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Databricks
 
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Databricks
 
Spark Under the Hood - Meetup @ Data Science London
Databricks
 
New Developments in Spark
Databricks
 
Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...
Databricks
 
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Spark Summit
 
New Directions for Spark in 2015 - Spark Summit East
Databricks
 
Spark Summit 2015 keynote: Making Big Data Simple with Spark
Databricks
 
Big Telco - Yousun Jeong
Spark Summit
 
What to Expect for Big Data and Apache Spark in 2017
Databricks
 
Koalas: Pandas on Apache Spark
Databricks
 
H2O World - H2O Rains with Databricks Cloud
Sri Ambati
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Databricks
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
Modern Data Stack France
 
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Spark Summit
 
RISELab:Enabling Intelligent Real-Time Decisions
Jen Aman
 
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
Spark Summit
 
Hugfr SPARK & RIAK -20160114_hug_france
Modern Data Stack France
 
Building a Virtual Data Lake with Apache Arrow
Dremio Corporation
 
What's new in pandas and the SciPy stack for financial users
Wes McKinney
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Databricks
 

Similar to Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predictive Modeling (20)

PDF
Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...
Domino Data Lab
 
PDF
Healthcare Predictive Analytics with the OR-(Denny Lee and Ayad Shammout, Dat...
Spark Summit
 
PPTX
Predicting Patient Outcomes in Real-Time at HCA
Sri Ambati
 
PPTX
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Yuanyuan Tian
 
PPTX
IBM Strategy for Spark
Mark Kerzner
 
PPTX
An initiative to healthcare analytics with office 365 and power bi spsparis2017
Thuan Ng
 
PPTX
HPE and Hortonworks join forces to Deliver Healthcare Transformation
Hortonworks
 
PDF
Use Machine Learning to Get the Most out of Your Big Data Clusters
Databricks
 
PDF
Scalable Data Computing for Healthcare and Life Sciences Industry
Paula Koziol
 
PPTX
B6 - An initiative to healthcare analytics with Office 365 & PowerBI - Thuan ...
SPS Paris
 
PPTX
Using The Hadoop Ecosystem to Drive Healthcare Innovation
Dan Wellisch
 
PDF
Big Data in Healthcare and Medical Devices
PremNarayanan6
 
PPTX
Become Data Driven With Hadoop as-a-Service
Mammoth Data
 
PDF
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...
Databricks
 
PDF
FlorenceAI: Reinventing Data Science at Humana
Databricks
 
PDF
Big Data Analytics for Healthcare Decision Support- Operational and Clinical
Adrish Sannyasi
 
PPTX
Machine Learning with Apache Spark
IBM Cloud Data Services
 
PDF
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
Matt Stubbs
 
PDF
Six Steps to Modernize Your Data Ecosystem - Mindtree
samirandev1
 
PDF
Steps to Modernize Your Data Ecosystem with Mindtree Blog
sameerroshan
 
Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...
Domino Data Lab
 
Healthcare Predictive Analytics with the OR-(Denny Lee and Ayad Shammout, Dat...
Spark Summit
 
Predicting Patient Outcomes in Real-Time at HCA
Sri Ambati
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Yuanyuan Tian
 
IBM Strategy for Spark
Mark Kerzner
 
An initiative to healthcare analytics with office 365 and power bi spsparis2017
Thuan Ng
 
HPE and Hortonworks join forces to Deliver Healthcare Transformation
Hortonworks
 
Use Machine Learning to Get the Most out of Your Big Data Clusters
Databricks
 
Scalable Data Computing for Healthcare and Life Sciences Industry
Paula Koziol
 
B6 - An initiative to healthcare analytics with Office 365 & PowerBI - Thuan ...
SPS Paris
 
Using The Hadoop Ecosystem to Drive Healthcare Innovation
Dan Wellisch
 
Big Data in Healthcare and Medical Devices
PremNarayanan6
 
Become Data Driven With Hadoop as-a-Service
Mammoth Data
 
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...
Databricks
 
FlorenceAI: Reinventing Data Science at Humana
Databricks
 
Big Data Analytics for Healthcare Decision Support- Operational and Clinical
Adrish Sannyasi
 
Machine Learning with Apache Spark
IBM Cloud Data Services
 
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
Matt Stubbs
 
Six Steps to Modernize Your Data Ecosystem - Mindtree
samirandev1
 
Steps to Modernize Your Data Ecosystem with Mindtree Blog
sameerroshan
 
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
PPT
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 4
Databricks
 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
 
PDF
Learn to Use Databricks for Data Science
Databricks
 
PDF
Why APM Is Not the Same As ML Monitoring
Databricks
 
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
PDF
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
PDF
Sawtooth Windows for Feature Aggregations
Databricks
 
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
PDF
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
PDF
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Ad

Recently uploaded (20)

PPTX
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
PPTX
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
PDF
Electrical Engineer operation Supervisor
ssaruntatapower143
 
PDF
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
PPTX
Day2 B2 Best.pptx
helenjenefa1
 
PDF
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
PPTX
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
PPTX
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
PPTX
DATA BASE MANAGEMENT AND RELATIONAL DATA
gomathisankariv2
 
DOCX
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
PPTX
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
PDF
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
PPTX
Element 7. CHEMICAL AND BIOLOGICAL AGENT.pptx
merrandomohandas
 
PPTX
Server Side Web Development Unit 1 of Nodejs.pptx
sneha852132
 
PDF
Zilliz Cloud Demo for performance and scale
Zilliz
 
PDF
Viol_Alessandro_Presentazione_prelaurea.pdf
dsecqyvhbowrzxshhf
 
PDF
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
PPTX
Depth First Search Algorithm in 🧠 DFS in Artificial Intelligence (AI)
rafeeqshaik212002
 
PDF
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
PPTX
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
Electrical Engineer operation Supervisor
ssaruntatapower143
 
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
Day2 B2 Best.pptx
helenjenefa1
 
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
DATA BASE MANAGEMENT AND RELATIONAL DATA
gomathisankariv2
 
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
Element 7. CHEMICAL AND BIOLOGICAL AGENT.pptx
merrandomohandas
 
Server Side Web Development Unit 1 of Nodejs.pptx
sneha852132
 
Zilliz Cloud Demo for performance and scale
Zilliz
 
Viol_Alessandro_Presentazione_prelaurea.pdf
dsecqyvhbowrzxshhf
 
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
Depth First Search Algorithm in 🧠 DFS in Artificial Intelligence (AI)
rafeeqshaik212002
 
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
artificial intelligence applications in Geomatics
NawrasShatnawi1
 

Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predictive Modeling

  • 1. Transitioning from Traditional DW to Spark in OR Predictive Modeling Ayad Shammoutand Denny Lee October 21st,2015
  • 2. About Ayad Shammout • Director of BusinessIntelligence,Beth IsraelDeaconess Medical Center • Helped build BusinessIntelligence, highlyavailable / disaster recoveryinfrastructure for BIDMC 2
  • 3. About Denny Lee • TechnologyEvangelist, Databricks • Former Sr. Director of Data SciencesEng, Concur • Helped bring Hadoop onto Windows and Azure 3
  • 4. We are Databricks, the company behind Spark Founded by the creators of Apache Spark in 2013 Share of Spark code contributed by Databricks in 2014 75% 4 Data Value Created Databricks on top of Spark to make big data simple.
  • 5. Why is Operating Room Scheduling Predictive Modeling Important?
  • 6. 6 $15-$20 / minute for a basic surgicalprocedure Time is an OR's most valuable resource Lack of OR availability means loss of patient OR efficiency differs depending on the OR staffing and allocation (8, 10, 13, or 16h), not the workload (i.e. cases)
  • 7. 7 “You are notgoing to getthe elephantto shrink or change itssize. You need to face the fact that the elephantis8 OR tall and 11hrwide” Steven Shafer, MD
  • 8. 8 Operating Room Better utilization = Better profit margins Reduce support and maintenance costs Medical Staff Better utilization = Better profit margins Better medical staff efficiencies= Better outcomes Patients Shorter wait times and lesscancellations Better medical staff efficiencies= Better outcomes
  • 9. Develop Predictive Model • Develop a predictive model that would identify available OR time 15 business days in advance. • Allowus to confirm wait list cases two weeks in advance, instead of when the blocks normally release four days out. 9
  • 10. Forecast OR Schedule • Case load 15 businessdays in advance • Book more cases weeks in advance to prevent under- utilization • Reduce staff overtime and idle time 10
  • 11. Background • Threesurgical groups • GYN, urology, generalsurgery,colorectal, surgical oncology • Eyes, plastics, ENT • Orthopedics, podiatry • Currentlybuilt using SQL ServerData Mining 11
  • 12. Using Traditional Data Warehousing Techniques
  • 13. OR DW SSAS Data Mining Data Sources OR Reports Traditional Data Warehousing & Data Mining OR Predictive Model Process mining model every3 hours OR Prediction DB Data inserts every 3 hours Predictionresults
  • 14. 14 Original Design • Multiple data sourcespushing data into SQL Server and SQL ServerAnalysis ServerData Mining • Hand built 225 different DM modules (5 days, 15 businessdays ahead, 3 differentgroups) • Pipeline processhad to run225 times / day (3 pools x 75 modules)
  • 15. 15 Regression Calculations SSAS Data Mining T-SQL Code Intercept R2 Mean Adjusted R2 Coefficients Standard Deviation Variance Standard Error
  • 16. Taking advantage of Spark’s DW Capabilities and MLlib
  • 17. OR DWData Sources OR Reports OR Predictive Model in Spark Data inserts every 3 hours
  • 18. 18 demoOR Block Scheduling Extract History data and run linear regression with SGD with multiple variables
  • 19. 19
  • 21. 21
  • 22. 22
  • 23. 23
  • 24. 24
  • 25. 25
  • 26. 26
  • 28. 28 Why the model is working • Can coordinate waitlist schedulinglogistics with physicians and patients within two weeks of the surgery • Plan staff schedulingand resourcessothere are less last-minute staffing issuesfor nursing and anesthesia • Utilization metrics are showing us where we can maximize our elective surgicalscheduleand level demand
  • 29. Key Learnings when Migrating from Traditional DW to Spark
  • 30. 30 Transitioning to the Cloud Beth Israel DeaconessMedical Center is increasingly moving to cloud infrastructure services with the hopesof closing itsdata center when the hospital's lease is up in the next five years. CIO John Halamka says he's decommissioning HP and Dell servers as he movesmore of hiscompute workloads to Amazon Web Services, where he's currently using 30 virtual machines to test and develop new applications. "It is no longer cost effective to deal with server hosting ourselves because our challenge isn't real estate, it's power and cooling," he says.
  • 31. 31 Transitioning to the Cloud • Need time for engineers,analysts, and data scientists to learn how to build for the cloud • Build for security right from start – processheavy, a lot of documentation, audits / reviews • Differentiating data engineersand engineers(REST APIs, services, elasticity, etc.)
  • 32. 32 Transitioning to Spark • No more stored proceduresor indexes • Good for Spark SQL, services design • Prototype, prototype, prototype • Leverage existing languagesand skill sets • Leverage the MOOCsand other Spark training • Break down the silos of data engineers,engineers,data scientists, and analysts
  • 33. 33 Transitioning DW to Spark • Understand Partitioning,Broadcast Joins, and Parquet • Not all Hive functionsare available in Spark (99%of the time that is okay) due to Hive context • Don’t limit yourselfto build star-schemas / snowflake schemas • Expand outside of traditional DW: machine learning,streaming