SlideShare a Scribd company logo
Microsoft ignite 2018  SQL Server 2019 big data clusters - intro session
The Future of SQL Server 2019
and Big Data
*IDC White Paper, Data Age 2025: The Evolution of Data to Life-Critical
163 ZBs
of data will be generated
In 2025In 2016
16.1 ZBs
of data was generated
Barriers to insights are
barriers to success
The task of generating insights from ever-increasing data is tough
Organizations that transform data into insights
outperform the competition
Source: Keystone Strategy interviews Oct 2015 - Mar 2016
74% of leaders use predictive models37% of leaders dynamically update
data models
Leaders combine structured and
unstructured data in a data lake 8X
as often
Integrate data
without ETL
Combine data in a
central data store
Perform
predictive analytics
What do these organizations do differently?
Build intelligent apps and
AI with all your data
Analyzing all data
Easily and securely manage
data big and small
Managing all data
Simplified management and analysis through a unified deployment, governance, and tooling
SQL Server enables
intelligence over all your data
Unified access to all your data with
unparalleled performance
Integrating all data
Integrating
all data
Data movement is a barrier to
faster insights
Costs
Duplicated storage costs
Engineering effort to build and
maintain data pipelines
Delays in integrate data before it
can be used
Increased data latency
Increased attack surface area
Inconsistent security models
Data quality issues can be created
by ETL pipelines
Increased governance
issues
No, 19%
Don't
Know, 5%
Yes, 76%
3/4 of respondents say that
untimely data has inhibited business opportunities
Speed
Security
Quality
Compliance
*IDC 3rd Platform Information Management Requirements Survey, Oct 2016
Data virtualization
creates solutions
Costs
Lower storage costs
Less dev time spent on integration
Rapid iterations and prototypes
Timely data
Smaller attach surface area
Consistent security model
Fresh and accurate data
Easier data governance
Speed
Security
Quality
Compliance
Data virtualization integrates data from disparate
sources, locations and formats, without replicating or
moving the data, to create a single "virtual" data fabric
SQL Server
T-SQLAnalytics Apps
ODBC NoSQL Relational databases Big Data
PolyBase external tables
SQL Server is the hub for integrating data
Easily combine across relational and non-relational data stores
Microsoft ignite 2018  SQL Server 2019 big data clusters - intro session
Microsoft ignite 2018  SQL Server 2019 big data clusters - intro session
Managing
all data
Complex scale-out deployment
Time-consuming patching and upgrades
Cumbersome security management
Easily deploy and manage a
SQL Server + Big Data cluster
Easily deploy and manage a Big Data cluster using Microsoft’s
Kubernetes-based Big Data solution built-in to SQL Server
Hadoop Distributed File System (HDFS) storage, SQL Server
relational engine, and Spark analytics are deployed as containers
on Kubernetes in one easy-to manage package
Microsoft ignite 2018  SQL Server 2019 big data clusters - intro session
Simplified deployment with
containers & Kubernetes
A container is a standardized unit of software that includes
everything needed to run it
Kubernetes is a container hosting platform
Benefits of containers and Kubernetes:
1. Fast to deploy
2. Self-contained – no installation required
3. Upgrades are easy because - just upload a new image
4. Scalable, multi-tenant, designed for elasticity
Kubernetes pod
SQL Server
HDFS Data Node
Spark
Microsoft ignite 2018  SQL Server 2019 big data clusters - intro session
SQL Server can now read directly from HDFS files
Elastically scale compute and storage using HDFS-based
storage pools with SQL Server and Spark built in
Apps, BI, and analytics access Big Data through the
SQL Server master instance
Scale Big Data on demand
SQL Server
master instance
Persistent storage
Custom apps AnalyticsBI
SQL
Server
HDFS Data Node
Spark
Kubernetes pod
SQL
Server
HDFS Data Node
Spark
SQL
Server
HDFS Data Node
Spark
Node Node Node
SQL
Microsoft ignite 2018  SQL Server 2019 big data clusters - intro session
Scale-out data pools combine and cache data from many
sources for fast querying
Scenario
 A global car manufacturing company wants to join data
from across multiple sources including HDFS, SQL Server,
and Cosmos DB
Solution
• Query data in relational and non-relational data stores with
new PolyBase connectors
• Create a scale-out data pool cache of combined data
• Expose the datasets as a shared data source, without
writing code to move and integrate data
SQL Server
Scale-out data pool
HDFS Cosmos DB SQL Server
Polybase
connectors
Shard 1 Shard nShard 2
Persistent storage
SQL Server
Scale-out data pool
IoT data
Extend SQL Server with a scale-out storage tier by
partitioning the data across multiple instances
Speed up query performance by scaling out the filtering
and local aggregation across multiple instances
Shard 1 Shard nShard 2
Increase analytics and apps performance
Compute pool
SQL Compute
Node
SQL Compute
Node
SQL Compute
Node
…
Compute pool
SQL Compute
Node
IoT data
Directly
read from
HDFS
Persistent storage
…
Storage pool
SQL
Server
Spark
HDFS Data Node
SQL
Server
Spark
HDFS Data Node
SQL
Server
Spark
HDFS Data Node
Kubernetes pod
Analytics
Custom
apps BI
SQL Server
master instance
Node Node Node Node Node Node Node
SQL
Data pool
SQL Data
Node
SQL Data
Node
Compute pool
SQL Compute
Node
Storage Storage
Azure Data Studio provides a unified tool for
querying data using a notebook experience for
both T-SQL and Spark
Easily access all your data across SQL Server and
HDFS
The cluster administration portal provides easy to
use cloud-style managed services for HA,
monitoring, backup/recovery, security, and
provisioning.
The REST API and command line tools simplify
automation
The development and management experience is
consistent regardless of where you run – on prem
or any of the major cloud providers
Integrated Big Data and SQL Server security model
Simple, single sign-on with Active Directory authentication
Manage data access with SQL Server security roles
Access reporting for audit and compliance
Central security
and governance
External data sources
Active Directory
App and AI Developer
Impersonation
Active Directory
Microsoft ignite 2018  SQL Server 2019 big data clusters - intro session
Analyzing
all data
Developers struggle to access
insights from Big Data
Data science is siloed from
operational data
Lengthy time to train and
operationalize models
Storage pool
Access relational and non-relational data using familiar T-
SQL commands and development frameworks
Enrich apps with data from other sources like Oracle
database, Mongo DB
Build intelligent applications with access to unstructured,
high volume, and high velocity data
Train R and Python models against Big Data stored in
Hadoop and score your application data without ever
leaving SQL Server
Apply easy to use tools like Azure Data Studio and Visual
Studio Code
SQL Server master instance
Django framework
SQL
Server
HDFS Data Node
Spark
SQL
Server
HDFS Data Node
Spark
SQL
Server
HDFS Data Node
Spark
Data scientists can use familiar tools to analyze
structured and unstructured data
1. Use Azure Data Studio notebooks run a Spark
job over structured and unstructured data
2. Spark jobs can access data in SQL Server
through JDBC, Tedious, etc.
3. Queries can be access data from other sources
like Oracle Database and Mongo DB via
external tables
4. The Spark job returns the data to the notebook
SQL Server master instance
External data
sources
Storage pool
Spark Spark Spark
SQL Ops
Studio
Model & serve
Business/custom apps
(Structured)
Logs, files and media
(unstructured)
Sensors and IoT
(unstructured)
Predictive
apps
BI tools
Store
HDFS
SQL Server data
pools
Ingest
Spark streaming
Prep & train
Spark
Spark ML
SQL Server
ML Services
SQL Server
master instance
Simplified management and analysis through a unified deployment, governance, and tooling
Integrate structured and unstructured data
SQL Server
master instance
REST API containers
for models
SQL Server
Integration Services
VolumeVarietyVelocity Veracity
Mount and manage remote stores through HDFS
Mount various on-prem and cloud data stores
Accelerate computation by caching data locally
Disaster recovery/Data backup
Storage pool
SQL Server Master instance/Spark
SQL
Server
HDFS Data Node
Spark
SQL
Server
HDFS Data Node
Spark
SQL
Server
HDFS Data Node
Spark
Other HDFS store Remote cloud
store
Microsoft ignite 2018  SQL Server 2019 big data clusters - intro session
Microsoft ignite 2018  SQL Server 2019 big data clusters - intro session
SQL Server 2019 big data & analytics
Managed SQL Server, Spark,
and data lake
Store high volume data in a data lake and access
it easily using either SQL or Spark
Management services, admin portal, and
integrated security make it all easy to manage
SQL
Server
Data virtualization
Combine data from many sources without
moving or replicating it
Scale out compute and caching to boost
performance
T-SQL
Analytics Apps
Open
database
connectivity
NoSQL Relational
databases
HDFS
Complete AI platform
Easily feed integrated data from many sources to
your model training
Ingest and prep data and then train, store, and
operationalize your models all in one system
SQL Server External Tables
Compute pools and data pools
Spark
Scalable, shared storage (HDFS)
External
data sources
Admin portal and management services
Integrated AD-based security
SQL Server
ML Services
Spark &
Spark ML
HDFS
REST API containers
for models
Intelligence
over all data
drives innovation
Simplified management and analysis through a unified deployment, governance, and tooling model
Analyzing all dataManaging all dataIntegrating all data
Apply to join the SQL Server 2019
Early Adoption Program
Microsoft ignite 2018  SQL Server 2019 big data clusters - intro session

More Related Content

What's hot (20)

PDF
Novinky v Oracle Database 18c
MarketingArrowECS_CZ
 
PPTX
What’s new in SQL Server 2017
James Serra
 
PPTX
Nordic infrastructure Conference 2017 - SQL Server on Linux Overview
Travis Wright
 
PPTX
SQL Server on Linux - march 2017
Sorin Peste
 
PPTX
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
avanttic Consultoría Tecnológica
 
PPTX
Jax Cloud 2016 Microsoft Ignite Recap
Ben Stegink
 
PPTX
Bootcamp 2017 - SQL Server on Linux
Maximiliano Accotto
 
PPTX
Extending Windows Admin Center to manage your applications and infrastructure...
Microsoft Tech Community
 
PPTX
Spark
fatemehjamalii
 
PPTX
Expert summit SQL Server 2016
Łukasz Grala
 
PPTX
Exploring microservices in a Microsoft landscape
Alex Thissen
 
PPTX
What's new in SQL Server 2017
Hasan Savran
 
PPTX
Introducing Azure SQL Database
James Serra
 
PPTX
Azure data platform overview
James Serra
 
PDF
SUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UK
huguk
 
PPTX
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
Rui Quintino
 
PPTX
SQL Server 2017 Overview and Partner Opportunities
Travis Wright
 
PDF
Hadoop Virtualization - Intel White Paper
BlueData, Inc.
 
PPTX
Db2 analytics accelerator on ibm integrated analytics system technical over...
Daniel Martin
 
PDF
SQL Server 2019 hotlap - WARDY IT Solutions
Michaela Murray
 
Novinky v Oracle Database 18c
MarketingArrowECS_CZ
 
What’s new in SQL Server 2017
James Serra
 
Nordic infrastructure Conference 2017 - SQL Server on Linux Overview
Travis Wright
 
SQL Server on Linux - march 2017
Sorin Peste
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
avanttic Consultoría Tecnológica
 
Jax Cloud 2016 Microsoft Ignite Recap
Ben Stegink
 
Bootcamp 2017 - SQL Server on Linux
Maximiliano Accotto
 
Extending Windows Admin Center to manage your applications and infrastructure...
Microsoft Tech Community
 
Expert summit SQL Server 2016
Łukasz Grala
 
Exploring microservices in a Microsoft landscape
Alex Thissen
 
What's new in SQL Server 2017
Hasan Savran
 
Introducing Azure SQL Database
James Serra
 
Azure data platform overview
James Serra
 
SUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UK
huguk
 
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
Rui Quintino
 
SQL Server 2017 Overview and Partner Opportunities
Travis Wright
 
Hadoop Virtualization - Intel White Paper
BlueData, Inc.
 
Db2 analytics accelerator on ibm integrated analytics system technical over...
Daniel Martin
 
SQL Server 2019 hotlap - WARDY IT Solutions
Michaela Murray
 

Similar to Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session (20)

PPTX
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Travis Wright
 
PPTX
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
PPTX
SQL Server Ground to Cloud.pptx
saidbilgen
 
PPTX
Azure Data.pptx
FedoRam1
 
PPTX
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Hortonworks
 
PPTX
SQL Server Versions & Migration Paths
Jeannette Browning
 
PPTX
Build Big Data Enterprise solutions faster on Azure HDInsight
DataWorks Summit
 
PPTX
SQL Server 2019 Modern Data Platform.pptx
QuyVo27
 
PPTX
Data Lake Overview
James Serra
 
PPTX
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
Mark Kromer
 
PPTX
Modernizing Your Data Warehouse using APS
Stéphane Fréchette
 
PDF
Prague data management meetup 2018-03-27
Martin Bém
 
PPTX
Azure SQL DB Managed Instances Built to easily modernize application data layer
Microsoft Tech Community
 
PPTX
Azure Data Factory ETL Patterns in the Cloud
Mark Kromer
 
PPTX
Exploring Microsoft Azure Infrastructures
CCG
 
PPTX
SQL Server 2019 hotlap - WARDY IT Solutions
Michaela Murray
 
PPTX
Data Analytics Meetup: Introduction to Azure Data Lake Storage
CCG
 
PDF
Azure databricks c sharp corner toronto feb 2019 heather grandy
Nilesh Shah
 
PDF
What are the features of SQL server standard editions.pdf
Direct Deals, LLC
 
PDF
Customer Migration to Azure SQL Database_2024.pdf
George Walters
 
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Travis Wright
 
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
SQL Server Ground to Cloud.pptx
saidbilgen
 
Azure Data.pptx
FedoRam1
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Hortonworks
 
SQL Server Versions & Migration Paths
Jeannette Browning
 
Build Big Data Enterprise solutions faster on Azure HDInsight
DataWorks Summit
 
SQL Server 2019 Modern Data Platform.pptx
QuyVo27
 
Data Lake Overview
James Serra
 
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
Mark Kromer
 
Modernizing Your Data Warehouse using APS
Stéphane Fréchette
 
Prague data management meetup 2018-03-27
Martin Bém
 
Azure SQL DB Managed Instances Built to easily modernize application data layer
Microsoft Tech Community
 
Azure Data Factory ETL Patterns in the Cloud
Mark Kromer
 
Exploring Microsoft Azure Infrastructures
CCG
 
SQL Server 2019 hotlap - WARDY IT Solutions
Michaela Murray
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
CCG
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Nilesh Shah
 
What are the features of SQL server standard editions.pdf
Direct Deals, LLC
 
Customer Migration to Azure SQL Database_2024.pdf
George Walters
 
Ad

More from Travis Wright (15)

PPTX
PASS Summit - SQL Server 2017 Deep Dive
Travis Wright
 
PPTX
SQL Server 2017 Deep Dive - @Ignite 2017
Travis Wright
 
PPTX
Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift
Travis Wright
 
PPTX
SQL Server 2017 on Linux Introduction
Travis Wright
 
PPTX
SQL Server 2017 on Linux Introduction
Travis Wright
 
PPTX
Data Amp South Africa - Keynote
Travis Wright
 
PPTX
Data Amp South Africa - SQL Server 2017
Travis Wright
 
PPTX
NYC Data Amp - SQL Server 2017
Travis Wright
 
PPTX
NYC Data Amp - Microsoft Azure and Data Services Overview
Travis Wright
 
PPTX
Build 2017 SQL Server in Dev Ops
Travis Wright
 
PPTX
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Travis Wright
 
PPTX
SQL Server in DevOps Town Hall Webinar
Travis Wright
 
PPTX
SQL Server vNext on Linux
Travis Wright
 
PPTX
SUSE Webinar - Introduction to SQL Server on Linux
Travis Wright
 
PPTX
Nordic infrastructure Conference 2017 - SQL Server in DevOps
Travis Wright
 
PASS Summit - SQL Server 2017 Deep Dive
Travis Wright
 
SQL Server 2017 Deep Dive - @Ignite 2017
Travis Wright
 
Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open Shift
Travis Wright
 
SQL Server 2017 on Linux Introduction
Travis Wright
 
SQL Server 2017 on Linux Introduction
Travis Wright
 
Data Amp South Africa - Keynote
Travis Wright
 
Data Amp South Africa - SQL Server 2017
Travis Wright
 
NYC Data Amp - SQL Server 2017
Travis Wright
 
NYC Data Amp - Microsoft Azure and Data Services Overview
Travis Wright
 
Build 2017 SQL Server in Dev Ops
Travis Wright
 
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Travis Wright
 
SQL Server in DevOps Town Hall Webinar
Travis Wright
 
SQL Server vNext on Linux
Travis Wright
 
SUSE Webinar - Introduction to SQL Server on Linux
Travis Wright
 
Nordic infrastructure Conference 2017 - SQL Server in DevOps
Travis Wright
 
Ad

Recently uploaded (20)

PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
July Patch Tuesday
Ivanti
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
July Patch Tuesday
Ivanti
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Biography of Daniel Podor.pdf
Daniel Podor
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 

Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session

  • 2. The Future of SQL Server 2019 and Big Data
  • 3. *IDC White Paper, Data Age 2025: The Evolution of Data to Life-Critical 163 ZBs of data will be generated In 2025In 2016 16.1 ZBs of data was generated
  • 4. Barriers to insights are barriers to success The task of generating insights from ever-increasing data is tough
  • 5. Organizations that transform data into insights outperform the competition Source: Keystone Strategy interviews Oct 2015 - Mar 2016 74% of leaders use predictive models37% of leaders dynamically update data models Leaders combine structured and unstructured data in a data lake 8X as often Integrate data without ETL Combine data in a central data store Perform predictive analytics What do these organizations do differently?
  • 6. Build intelligent apps and AI with all your data Analyzing all data Easily and securely manage data big and small Managing all data Simplified management and analysis through a unified deployment, governance, and tooling SQL Server enables intelligence over all your data Unified access to all your data with unparalleled performance Integrating all data
  • 8. Data movement is a barrier to faster insights Costs Duplicated storage costs Engineering effort to build and maintain data pipelines Delays in integrate data before it can be used Increased data latency Increased attack surface area Inconsistent security models Data quality issues can be created by ETL pipelines Increased governance issues No, 19% Don't Know, 5% Yes, 76% 3/4 of respondents say that untimely data has inhibited business opportunities Speed Security Quality Compliance *IDC 3rd Platform Information Management Requirements Survey, Oct 2016
  • 9. Data virtualization creates solutions Costs Lower storage costs Less dev time spent on integration Rapid iterations and prototypes Timely data Smaller attach surface area Consistent security model Fresh and accurate data Easier data governance Speed Security Quality Compliance Data virtualization integrates data from disparate sources, locations and formats, without replicating or moving the data, to create a single "virtual" data fabric
  • 10. SQL Server T-SQLAnalytics Apps ODBC NoSQL Relational databases Big Data PolyBase external tables SQL Server is the hub for integrating data Easily combine across relational and non-relational data stores
  • 14. Complex scale-out deployment Time-consuming patching and upgrades Cumbersome security management
  • 15. Easily deploy and manage a SQL Server + Big Data cluster Easily deploy and manage a Big Data cluster using Microsoft’s Kubernetes-based Big Data solution built-in to SQL Server Hadoop Distributed File System (HDFS) storage, SQL Server relational engine, and Spark analytics are deployed as containers on Kubernetes in one easy-to manage package
  • 17. Simplified deployment with containers & Kubernetes A container is a standardized unit of software that includes everything needed to run it Kubernetes is a container hosting platform Benefits of containers and Kubernetes: 1. Fast to deploy 2. Self-contained – no installation required 3. Upgrades are easy because - just upload a new image 4. Scalable, multi-tenant, designed for elasticity Kubernetes pod SQL Server HDFS Data Node Spark
  • 19. SQL Server can now read directly from HDFS files Elastically scale compute and storage using HDFS-based storage pools with SQL Server and Spark built in Apps, BI, and analytics access Big Data through the SQL Server master instance Scale Big Data on demand SQL Server master instance Persistent storage Custom apps AnalyticsBI SQL Server HDFS Data Node Spark Kubernetes pod SQL Server HDFS Data Node Spark SQL Server HDFS Data Node Spark Node Node Node SQL
  • 21. Scale-out data pools combine and cache data from many sources for fast querying Scenario  A global car manufacturing company wants to join data from across multiple sources including HDFS, SQL Server, and Cosmos DB Solution • Query data in relational and non-relational data stores with new PolyBase connectors • Create a scale-out data pool cache of combined data • Expose the datasets as a shared data source, without writing code to move and integrate data SQL Server Scale-out data pool HDFS Cosmos DB SQL Server Polybase connectors Shard 1 Shard nShard 2
  • 22. Persistent storage SQL Server Scale-out data pool IoT data Extend SQL Server with a scale-out storage tier by partitioning the data across multiple instances Speed up query performance by scaling out the filtering and local aggregation across multiple instances Shard 1 Shard nShard 2
  • 23. Increase analytics and apps performance Compute pool SQL Compute Node SQL Compute Node SQL Compute Node … Compute pool SQL Compute Node IoT data Directly read from HDFS Persistent storage … Storage pool SQL Server Spark HDFS Data Node SQL Server Spark HDFS Data Node SQL Server Spark HDFS Data Node Kubernetes pod Analytics Custom apps BI SQL Server master instance Node Node Node Node Node Node Node SQL Data pool SQL Data Node SQL Data Node Compute pool SQL Compute Node Storage Storage
  • 24. Azure Data Studio provides a unified tool for querying data using a notebook experience for both T-SQL and Spark Easily access all your data across SQL Server and HDFS The cluster administration portal provides easy to use cloud-style managed services for HA, monitoring, backup/recovery, security, and provisioning. The REST API and command line tools simplify automation The development and management experience is consistent regardless of where you run – on prem or any of the major cloud providers
  • 25. Integrated Big Data and SQL Server security model Simple, single sign-on with Active Directory authentication Manage data access with SQL Server security roles Access reporting for audit and compliance Central security and governance External data sources Active Directory App and AI Developer Impersonation Active Directory
  • 28. Developers struggle to access insights from Big Data Data science is siloed from operational data Lengthy time to train and operationalize models
  • 29. Storage pool Access relational and non-relational data using familiar T- SQL commands and development frameworks Enrich apps with data from other sources like Oracle database, Mongo DB Build intelligent applications with access to unstructured, high volume, and high velocity data Train R and Python models against Big Data stored in Hadoop and score your application data without ever leaving SQL Server Apply easy to use tools like Azure Data Studio and Visual Studio Code SQL Server master instance Django framework SQL Server HDFS Data Node Spark SQL Server HDFS Data Node Spark SQL Server HDFS Data Node Spark
  • 30. Data scientists can use familiar tools to analyze structured and unstructured data 1. Use Azure Data Studio notebooks run a Spark job over structured and unstructured data 2. Spark jobs can access data in SQL Server through JDBC, Tedious, etc. 3. Queries can be access data from other sources like Oracle Database and Mongo DB via external tables 4. The Spark job returns the data to the notebook SQL Server master instance External data sources Storage pool Spark Spark Spark SQL Ops Studio
  • 31. Model & serve Business/custom apps (Structured) Logs, files and media (unstructured) Sensors and IoT (unstructured) Predictive apps BI tools Store HDFS SQL Server data pools Ingest Spark streaming Prep & train Spark Spark ML SQL Server ML Services SQL Server master instance Simplified management and analysis through a unified deployment, governance, and tooling Integrate structured and unstructured data SQL Server master instance REST API containers for models SQL Server Integration Services
  • 33. Mount and manage remote stores through HDFS Mount various on-prem and cloud data stores Accelerate computation by caching data locally Disaster recovery/Data backup Storage pool SQL Server Master instance/Spark SQL Server HDFS Data Node Spark SQL Server HDFS Data Node Spark SQL Server HDFS Data Node Spark Other HDFS store Remote cloud store
  • 36. SQL Server 2019 big data & analytics Managed SQL Server, Spark, and data lake Store high volume data in a data lake and access it easily using either SQL or Spark Management services, admin portal, and integrated security make it all easy to manage SQL Server Data virtualization Combine data from many sources without moving or replicating it Scale out compute and caching to boost performance T-SQL Analytics Apps Open database connectivity NoSQL Relational databases HDFS Complete AI platform Easily feed integrated data from many sources to your model training Ingest and prep data and then train, store, and operationalize your models all in one system SQL Server External Tables Compute pools and data pools Spark Scalable, shared storage (HDFS) External data sources Admin portal and management services Integrated AD-based security SQL Server ML Services Spark & Spark ML HDFS REST API containers for models
  • 37. Intelligence over all data drives innovation Simplified management and analysis through a unified deployment, governance, and tooling model Analyzing all dataManaging all dataIntegrating all data
  • 38. Apply to join the SQL Server 2019 Early Adoption Program

Editor's Notes

  • #9: Source: 3rd Platform Information Management Requirements Survey, IDC, October, 2016, n=110 An IDC InfoBrief | May 2017 | “Choosing a DBMS to Address the Challenges of the Third Platform”