SlideShare a Scribd company logo
EXTENDING YOUR HADOOP
IMPLEMENTATION TO THE CLOUD
Matt Winkler
Principal Lead Program Manager
Big Data @ Microsoft  We’re Hiring
@mwinkle
AGENDA
Decisions
IaaS
Cloud Storage
Hadoop as a Service
Hybrid Scenarios
Next Steps
WHY CLOUD?
Elasticity
Cost Optimization
Economic flexibility
Support for bursting workloads
Global footprint
WHY ON-PREMISES?
Compliance requirements
Specific control over hardware/networking
Integration requirements for additional apps to be close to cluster
WHEN CLOUD?
Data born in the cloud
Global apps
Satisfy geopolitical or compliance constraints
Dev/Test
Backup
Geo-Redundancy
Bursting to cloud
IAAS – RUN YOUR HADOOP IN THE CLOUD
IaaS offerings across the cloud providers offer:
 OS choice
 Node configuration
 Customized networking topology
 Repeatable, scriptable deployments
You still have to:
 Set up the cluster
 Manage data movement into the cluster
 Integrate with your other applications
 Manage patching and updates of OS and apps
 Obtain support and/or licenses
DEMO
Deploying a Hadoop Cluster to Azure
LEVERAGE CLOUD STORAGE FOR
FLEXIBILITY
Cloud storage enables economic flexibility, scale and rich features
 Size clusters independent of storage needs
 Clusters become stateless to operate across the data
 Price continues decreasing
 Geo-Redundancy allows for business continuity/disaster recover planning
CLOUD STORAGE USAGE PATTERNS
HDFS within the cluster
 Move data in from cloud storage on boot
 (optional) backup/age data to cloud storage
 (optional) move data out to cloud storage to rebuild cluster
Default file system using cloud storage connectors
 To Hadoop apps, they just see a path to data and most things “just work”
 Apps which rely specifically on HDFS may encounter compat issues
 The physics change in exchange for flexibility
LEVERAGING HADOOP AS A SERVICE
Hadoop Services
 Cluster creation on demand
 Default integration with cloud storage
 Integration across services and apps
 Higher level abstractions
 API set for integrating into apps
Azure HDInsight
 Clusters provisioned on top of Azure Blob storage
 Deploy clusters of any size
 Entire stack supported by Microsoft
Azure Active Directory
Service Bus
Scheduler
Multi-Factor
Authentication
Express Route
Azure SQL Database
Azure Web
Site
Some example services
DEMO
Getting Started with HDInsight
HYBRID SCENARIOS
Key scenarios
 Offsite backup
 Dev/Test
 Burst to Cloud
The decision is not an XOR, it’s on-premises AND cloud
Microsoft
Azure
Azure Storage
HDInsight (Hadoop)
Hadoop cluster
deployed to IaaS
DEMO
On-Premises
Hadoop Cluster (HDP 2.1)
Running on CentOS
HDFS
YARN
Tez
Hive
MR
Falcon
GETTING STARTED
Get started in the cloud (getting
started cards available @ the
Microsoft booth and up here at the
stage)
Create an HDInsight cluster, or try
out deploying a Hadoop cluster to
Azure
http://aka.ms/howtohdinsight
Falcon command line
Falcon configuration files
FALCON CONFIGURATION…
Register & schedule
Data being landed into
HDFS, on-prem
Syncing to blob store
New file @ 1:38pm
@ 2:01 pm
From HDI Cluster

More Related Content

PPT
Cloud computing and Hadoop introduction
christian.perez
 
PPT
Cloud Computing: Hadoop
darugar
 
PPTX
Hadoop in the Cloud: Common Architectural Patterns
DataWorks Summit
 
PDF
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Alluxio, Inc.
 
PPTX
Case study on big data
Khushboo Kumari
 
PPTX
Databricks for Dummies
Rodney Joyce
 
PPTX
Tropos.io - Hadoop in the Cloud - BA4ALL 2016
Tropos.io
 
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio, Inc.
 
Cloud computing and Hadoop introduction
christian.perez
 
Cloud Computing: Hadoop
darugar
 
Hadoop in the Cloud: Common Architectural Patterns
DataWorks Summit
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Alluxio, Inc.
 
Case study on big data
Khushboo Kumari
 
Databricks for Dummies
Rodney Joyce
 
Tropos.io - Hadoop in the Cloud - BA4ALL 2016
Tropos.io
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio, Inc.
 

What's hot (20)

PDF
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
PDF
Data Orchestration for AI, Big Data, and Cloud
Alluxio, Inc.
 
PPTX
Hadoop Tutorial For Beginners
Dataflair Web Services Pvt Ltd
 
PDF
Hd insight essentials quick view
Rajesh Nadipalli
 
PPTX
Big data architecture on cloud computing infrastructure
datastack
 
PPTX
Big data vahidamiri-datastack.ir
datastack
 
PPTX
The Fundamentals Guide to HDP and HDInsight
Gert Drapers
 
PPTX
BigData- On - AWS Cloud -1
Milind gunjan
 
PDF
The Pandemic Changes Everything, the Need for Speed and Resiliency
Alluxio, Inc.
 
PPTX
عصر کلان داده، چرا و چگونه؟
datastack
 
PDF
Enabling big data & AI workloads on the object store at DBS
Alluxio, Inc.
 
PPT
Hadoop distributions - ecosystem
Jakub Stransky
 
PPTX
Hadoop
Oded Rotter
 
PDF
Alluxio Use Cases and Future Directions
Alluxio, Inc.
 
PPTX
Hd insight overview
vhrocca
 
PPT
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Cloudera, Inc.
 
PDF
Orchestrate a Data Symphony
Alluxio, Inc.
 
PDF
Enabling Apache Spark for Hybrid Cloud
Alluxio, Inc.
 
PPTX
Hadoop
thisisnabin
 
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio, Inc.
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
Data Orchestration for AI, Big Data, and Cloud
Alluxio, Inc.
 
Hadoop Tutorial For Beginners
Dataflair Web Services Pvt Ltd
 
Hd insight essentials quick view
Rajesh Nadipalli
 
Big data architecture on cloud computing infrastructure
datastack
 
Big data vahidamiri-datastack.ir
datastack
 
The Fundamentals Guide to HDP and HDInsight
Gert Drapers
 
BigData- On - AWS Cloud -1
Milind gunjan
 
The Pandemic Changes Everything, the Need for Speed and Resiliency
Alluxio, Inc.
 
عصر کلان داده، چرا و چگونه؟
datastack
 
Enabling big data & AI workloads on the object store at DBS
Alluxio, Inc.
 
Hadoop distributions - ecosystem
Jakub Stransky
 
Hadoop
Oded Rotter
 
Alluxio Use Cases and Future Directions
Alluxio, Inc.
 
Hd insight overview
vhrocca
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Cloudera, Inc.
 
Orchestrate a Data Symphony
Alluxio, Inc.
 
Enabling Apache Spark for Hybrid Cloud
Alluxio, Inc.
 
Hadoop
thisisnabin
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio, Inc.
 
Ad

Viewers also liked (15)

PDF
Azure Machine Learning using R
Herman Wu
 
PPTX
Azure Machine Learning 101
Andrew Badera
 
PPTX
Data lake – On Premise VS Cloud
Idan Tohami
 
PDF
Forrester whos hot in bsm
IBM_BSM
 
PPTX
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
DataWorks Summit
 
PPTX
Spark crash course workshop at Hadoop Summit
DataWorks Summit
 
PDF
Cortana Analytics Workshop: Azure Data Lake
MSAdvAnalytics
 
PPTX
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Jason L Brugger
 
PPTX
The How and Why of Feature Engineering
Alice Zheng
 
PPTX
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
PPTX
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
DataWorks Summit
 
PPTX
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Ilyas F ☁☁☁
 
PPTX
Azure Data Lake Analytics Deep Dive
Ilyas F ☁☁☁
 
PPTX
Improve the Development Process with DevOps Practices by Fedorov Vadim
SoftServe
 
PPTX
Overview of Machine Learning and Feature Engineering
Turi, Inc.
 
Azure Machine Learning using R
Herman Wu
 
Azure Machine Learning 101
Andrew Badera
 
Data lake – On Premise VS Cloud
Idan Tohami
 
Forrester whos hot in bsm
IBM_BSM
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
DataWorks Summit
 
Spark crash course workshop at Hadoop Summit
DataWorks Summit
 
Cortana Analytics Workshop: Azure Data Lake
MSAdvAnalytics
 
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Jason L Brugger
 
The How and Why of Feature Engineering
Alice Zheng
 
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
DataWorks Summit
 
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Ilyas F ☁☁☁
 
Azure Data Lake Analytics Deep Dive
Ilyas F ☁☁☁
 
Improve the Development Process with DevOps Practices by Fedorov Vadim
SoftServe
 
Overview of Machine Learning and Feature Engineering
Turi, Inc.
 
Ad

Similar to Extending your Hadoop Implementation to the Cloud (20)

PPT
Cloud computing
gd1410
 
PDF
Introduction to Big Data Analytics on Apache Hadoop
Avkash Chauhan
 
PDF
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Alluxio, Inc.
 
PPTX
Introduction to SQL Azure
Kevin Hazzard
 
PDF
Agile Infrastructure with Windows Azure
HARMAN Services
 
PPTX
Cloud1 Computing 01
Heartin Jacob
 
PDF
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld
 
PDF
Data Orchestration Platform for the Cloud
Alluxio, Inc.
 
PDF
From limited Hadoop compute capacity to increased data scientist efficiency
Alluxio, Inc.
 
PDF
OpenStack and CloudForms Do's and Dont's
Frederik Bijlsma
 
PDF
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld
 
PDF
Dallas Breakfast Seminar
NuoDB
 
PPTX
Intro to Windows Azure
alicerpang
 
PPTX
Vmware Serengeti - Based on Infochimps Ironfan
Jim Kaskade
 
PPTX
Microsoft-Azure-Overvi2222222222222ew.pptx
saidbilgen
 
PPTX
ICS-Azure Migrations & Application Modernization_V2.pptx
mustafa435048
 
PPTX
Azure Data Factory ETL Patterns in the Cloud
Mark Kromer
 
PDF
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio, Inc.
 
PDF
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Alluxio, Inc.
 
PPTX
Introducing Azure SQL Data Warehouse
James Serra
 
Cloud computing
gd1410
 
Introduction to Big Data Analytics on Apache Hadoop
Avkash Chauhan
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Alluxio, Inc.
 
Introduction to SQL Azure
Kevin Hazzard
 
Agile Infrastructure with Windows Azure
HARMAN Services
 
Cloud1 Computing 01
Heartin Jacob
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld
 
Data Orchestration Platform for the Cloud
Alluxio, Inc.
 
From limited Hadoop compute capacity to increased data scientist efficiency
Alluxio, Inc.
 
OpenStack and CloudForms Do's and Dont's
Frederik Bijlsma
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld
 
Dallas Breakfast Seminar
NuoDB
 
Intro to Windows Azure
alicerpang
 
Vmware Serengeti - Based on Infochimps Ironfan
Jim Kaskade
 
Microsoft-Azure-Overvi2222222222222ew.pptx
saidbilgen
 
ICS-Azure Migrations & Application Modernization_V2.pptx
mustafa435048
 
Azure Data Factory ETL Patterns in the Cloud
Mark Kromer
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio, Inc.
 
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Alluxio, Inc.
 
Introducing Azure SQL Data Warehouse
James Serra
 

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

Recently uploaded (20)

PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PDF
REPORT: Heating appliances market in Poland 2024
SPIUG
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
REPORT: Heating appliances market in Poland 2024
SPIUG
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 

Extending your Hadoop Implementation to the Cloud

  • 1. EXTENDING YOUR HADOOP IMPLEMENTATION TO THE CLOUD Matt Winkler Principal Lead Program Manager Big Data @ Microsoft  We’re Hiring @mwinkle
  • 2. AGENDA Decisions IaaS Cloud Storage Hadoop as a Service Hybrid Scenarios Next Steps
  • 3. WHY CLOUD? Elasticity Cost Optimization Economic flexibility Support for bursting workloads Global footprint
  • 4. WHY ON-PREMISES? Compliance requirements Specific control over hardware/networking Integration requirements for additional apps to be close to cluster
  • 5. WHEN CLOUD? Data born in the cloud Global apps Satisfy geopolitical or compliance constraints Dev/Test Backup Geo-Redundancy Bursting to cloud
  • 6. IAAS – RUN YOUR HADOOP IN THE CLOUD IaaS offerings across the cloud providers offer:  OS choice  Node configuration  Customized networking topology  Repeatable, scriptable deployments You still have to:  Set up the cluster  Manage data movement into the cluster  Integrate with your other applications  Manage patching and updates of OS and apps  Obtain support and/or licenses
  • 7. DEMO Deploying a Hadoop Cluster to Azure
  • 8. LEVERAGE CLOUD STORAGE FOR FLEXIBILITY Cloud storage enables economic flexibility, scale and rich features  Size clusters independent of storage needs  Clusters become stateless to operate across the data  Price continues decreasing  Geo-Redundancy allows for business continuity/disaster recover planning
  • 9. CLOUD STORAGE USAGE PATTERNS HDFS within the cluster  Move data in from cloud storage on boot  (optional) backup/age data to cloud storage  (optional) move data out to cloud storage to rebuild cluster Default file system using cloud storage connectors  To Hadoop apps, they just see a path to data and most things “just work”  Apps which rely specifically on HDFS may encounter compat issues  The physics change in exchange for flexibility
  • 10. LEVERAGING HADOOP AS A SERVICE Hadoop Services  Cluster creation on demand  Default integration with cloud storage  Integration across services and apps  Higher level abstractions  API set for integrating into apps Azure HDInsight  Clusters provisioned on top of Azure Blob storage  Deploy clusters of any size  Entire stack supported by Microsoft Azure Active Directory Service Bus Scheduler Multi-Factor Authentication Express Route Azure SQL Database Azure Web Site Some example services
  • 12. HYBRID SCENARIOS Key scenarios  Offsite backup  Dev/Test  Burst to Cloud The decision is not an XOR, it’s on-premises AND cloud
  • 13. Microsoft Azure Azure Storage HDInsight (Hadoop) Hadoop cluster deployed to IaaS DEMO On-Premises Hadoop Cluster (HDP 2.1) Running on CentOS HDFS YARN Tez Hive MR Falcon
  • 14. GETTING STARTED Get started in the cloud (getting started cards available @ the Microsoft booth and up here at the stage) Create an HDInsight cluster, or try out deploying a Hadoop cluster to Azure http://aka.ms/howtohdinsight
  • 19. Data being landed into HDFS, on-prem
  • 21. New file @ 1:38pm