SlideShare a Scribd company logo
Building Big data solutions in Azure
Session Objectives And Takeaways
 Understanding HDInsight cluster types & tiers in Azure
 HBase as a Hadoop NoSQL database
 Hive is a data warehouse software to manage large datasets
using SQL
 Understanding data processing options in Hadoop ecosystem
using Storm and Spark
• HDInsight is a cloud implementation on Microsoft Azure of the rapidly expanding Apache
Hadoop technology stack that is the go-to solution for big data analysis.
• It includes implementations of Apache Spark, HBase, Storm, Pig, Hive, Sqoop, Oozie,
Ambari, and so on.
• HDInsight also integrates with business intelligence (BI) tools such as Power BI, Excel, SQL
Server Analysis Services, and SQL Server Reporting Services.
• HDInsight is available on Windows and Linux
• HDInsight on Linux: A Hadoop cluster on Ubuntu
• HDInsight on Windows: A Hadoop cluster on Win Server 2012 R2
What is HDInsight
• HDInsight provides cluster Types & custom configurations for:
• Hadoop (HDFS)
• HBase
• Storm
• Spark
• R Server (Preview)
• Skip maintaining and purchasing hardware
• HDInsight has powerful programming extensions for languages including C#, Java,
and .NET. Use your programming language of choice on Hadoop to create, configure,
submit, and monitor Hadoop jobs.
HDInsight clusters on Azure
HDInsight clusters on Azure
• Apache HBase is an open-source, NoSQL database that is built on Hadoop and modeled
after Google BigTable.
• HBase provides random access and strong consistency for large amounts of unstructured
and semistructured data in a schemaless database organized by column families
• Data is stored in the rows of a table, and data within a row is grouped by column family.
• The open-source code scales linearly to handle petabytes of data on thousands of nodes.
It can rely on data redundancy, batch processing, and other features that are provided by
distributed applications in the Hadoop ecosystem.
What is HBase
Order No Customer Name Customer Phone Company Name Company
Address
12012015 Mostafa 101-232-2345 Microsoft Redmond, WA
Customer Company
Order No Customer Name Customer Phone Company Name Company Address
12012015 Mostafa 101-232-2345 Microsoft Redmond, WA
• HBase Commands:
• create  Equivalent to Create table in T-SQL
• get  Equivalent to Select statements in T-SQL
• put  Equivalent to Update, Insert statement in T-SQL
• scan  Equivalent to Select (no where condition) in T-SQL
• HBase shell is your query tool to execute in CRUD commands to a HBase cluster.
• Data can also be managed using the HBase C# API, which provides a client library on top
of the HBase REST API.
• An HBase database can also be queried by using Hive using SQLHive.
What is HBase
• Apache Hive is a data warehouse system for Hadoop, which enables data summarization,
querying, and analysis of data by using HiveQL (a query language similar to SQL).
• Hive understands how to work with structured and semi-structured data, such as text files
where the fields are delimited by specific characters.
• Hive also supports custom serializer/deserializers (SerDe) for complex or irregularly
structured data.
• Hive can also be extended through user-defined functions (UDF).
• A UDF allows you to implement functionality or logic that isn't easily modeled in HiveQL.
What is Hive
Building Big data solutions in Azure
• Apache Storm is a distributed, fault-tolerant, open-source computation system that allows
you to process data in real-time with Hadoop.
• Apache Storm on HDInsight allows you to create distributed, real-time analytics solutions
in the Azure environment by using Apache Hadoop.
• Storm solutions can also provide guaranteed processing of data, with the ability to replay
data that was not successfully processed the first time.
• Ability to write Storm components in C#, JAVA and Python.
• Azure Scale up or Scale down without an impact for running Storm topologies.
• Ease of provision and use in Azure portal.
• Visual Studio project templates for Storm apps
What is Apache Storm
• Apache Storm apps are submitted as Topologies.
• A topology is a graph of computation that processes streams
• Stream: An unbound collection of tuples. Streams are produced by spouts and bolts, and
they are consumed by bolts.
• Tuple: A named list of dynamically typed values.
• Spout: Consumes data from a data source and emits one or more streams.
• Bolt: Consumes streams, performs processing on tuples, and may emit streams. Bolts are
also responsible for writing data to external storage, such as a queue, HDInsight, HBase, a
blob, or other data store.
• Nimbus: JobTracker in Hadoop that distribute jobs, monitoring failures.
Apache Storm Components
Building Big data solutions in Azure
• Apache Spark™ is a fast and general engine for large-scale data processing.
• Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on
disk.
• Write applications quickly in Java, Scala, Python, R.
• Combine SQL, streaming, and complex analytics.
• Spark's in-memory computation capabilities
make it a good choice for iterative algorithms in
ML and graph computations.
• Spark is also compatible with Azure Blob storage (WASB) so your existing data stored in
Azure can easily be processed via Spark.
• Support for R Server & Azure Data Lake.
What is Apache Spark
Building Big data solutions in Azure
Session Objectives And Takeaways
 Understanding HDInsight cluster types & tiers in Azure
 HBase as a Hadoop NoSQL database
 Hive is a data warehouse software to manage large datasets
using SQL
 Understanding data processing options in Hadoop ecosystem
using Storm and Spark
Building Big data solutions in Azure

More Related Content

What's hot (19)

PPTX
The Fundamentals Guide to HDP and HDInsight
Gert Drapers
 
PPTX
Introduction to Dremio
Dremio Corporation
 
PPTX
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Cloudera, Inc.
 
PPTX
Azure HDInsight
Ashish Thapliyal
 
PPTX
Hd insight overview
vhrocca
 
PPTX
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
PPTX
Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
PyData
 
PPTX
Webinar - Introduction to Azure Data Lake
Josh Lane
 
PPTX
Azure data lake sql konf 2016
Kenneth Michael Nielsen
 
PPTX
PASS Summit - SQL Server 2017 Deep Dive
Travis Wright
 
PDF
What database
Regunath B
 
PPTX
Microsoft Azure Data Warehouse Overview
Justin Munsters
 
PPTX
Hadoop Ecosystem at a Glance
Neev Technologies
 
PPTX
Hadoop in the Cloud: Common Architectural Patterns
DataWorks Summit
 
PDF
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Data Con LA
 
PPTX
Tomer Shiran, MapR_Hadoop&SQL
The Hive
 
PDF
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
Data Con LA
 
PDF
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
DataWorks Summit
 
PPTX
Big data vahidamiri-datastack.ir
datastack
 
The Fundamentals Guide to HDP and HDInsight
Gert Drapers
 
Introduction to Dremio
Dremio Corporation
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Cloudera, Inc.
 
Azure HDInsight
Ashish Thapliyal
 
Hd insight overview
vhrocca
 
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
PyData
 
Webinar - Introduction to Azure Data Lake
Josh Lane
 
Azure data lake sql konf 2016
Kenneth Michael Nielsen
 
PASS Summit - SQL Server 2017 Deep Dive
Travis Wright
 
What database
Regunath B
 
Microsoft Azure Data Warehouse Overview
Justin Munsters
 
Hadoop Ecosystem at a Glance
Neev Technologies
 
Hadoop in the Cloud: Common Architectural Patterns
DataWorks Summit
 
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Data Con LA
 
Tomer Shiran, MapR_Hadoop&SQL
The Hive
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
Data Con LA
 
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
DataWorks Summit
 
Big data vahidamiri-datastack.ir
datastack
 

Viewers also liked (20)

PPTX
Patterns and Practices in Building Office Add-ins
Mostafa
 
PPTX
Build intelligent solutions using Azure
Mostafa
 
PPTX
Extending Product Outreach with Outlook Connectors
Mostafa
 
PPTX
Introducing Power BI Embedded
Mostafa
 
PPTX
Data science essentials in azure ml
Mostafa
 
PPTX
Build Interactive Analytics using Power BI
Mostafa
 
PPTX
PnP in building office add ins - public
Mostafa
 
PPTX
Build intelligent solutions using ms azure
Mostafa
 
PPTX
Mistakes that kill startups
Mostafa
 
PPTX
TypeScript Jump Start
Mostafa
 
PPTX
Azure architecture
Amal Dev
 
PPTX
Windows Azure and the Hybrid Cloud
Windows Azure
 
PDF
Building predictive models in Azure Machine Learning
Mostafa
 
PPTX
Improving Application Security With Azure
Softchoice Corporation
 
PDF
Big data on Azure for Architects
Tomasz Kopacz
 
PPT
Architecting azure IaaS Solutions
swapnilrkambli
 
PDF
Azure Machine Learning
Mostafa
 
PPTX
Microsoft Azure Hybrid Cloud - Getting Started For Techies
Aidan Finn
 
PDF
Machine Learning Classifiers
Mostafa
 
PDF
Azure Stack - Azure in your own Data Center
Adnan Hashmi
 
Patterns and Practices in Building Office Add-ins
Mostafa
 
Build intelligent solutions using Azure
Mostafa
 
Extending Product Outreach with Outlook Connectors
Mostafa
 
Introducing Power BI Embedded
Mostafa
 
Data science essentials in azure ml
Mostafa
 
Build Interactive Analytics using Power BI
Mostafa
 
PnP in building office add ins - public
Mostafa
 
Build intelligent solutions using ms azure
Mostafa
 
Mistakes that kill startups
Mostafa
 
TypeScript Jump Start
Mostafa
 
Azure architecture
Amal Dev
 
Windows Azure and the Hybrid Cloud
Windows Azure
 
Building predictive models in Azure Machine Learning
Mostafa
 
Improving Application Security With Azure
Softchoice Corporation
 
Big data on Azure for Architects
Tomasz Kopacz
 
Architecting azure IaaS Solutions
swapnilrkambli
 
Azure Machine Learning
Mostafa
 
Microsoft Azure Hybrid Cloud - Getting Started For Techies
Aidan Finn
 
Machine Learning Classifiers
Mostafa
 
Azure Stack - Azure in your own Data Center
Adnan Hashmi
 
Ad

Similar to Building Big data solutions in Azure (20)

PPTX
Big data solutions in azure
Mostafa
 
PDF
Big data talking stories in Healthcare
Mostafa
 
PPTX
Getting started big data
Kibrom Gebrehiwot
 
PPTX
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Dataconomy Media
 
PDF
5 Comparing Microsoft Big Data Technologies for Analytics
Jen Stirrup
 
PDF
What is Apache Hadoop and its ecosystem?
tommychauhan
 
PDF
USQL Trivadis Azure Data Lake Event
Trivadis
 
PDF
BIGDATA ppts
Krisshhna Daasaarii
 
PDF
Using Machine Learning with HDInsight
Eng Teong Cheah
 
PPTX
Apache Hadoop Hive
Some corner at the Laboratory
 
ODP
Hadoop Introduction
sheetal sharma
 
PPTX
Hive - A theoretical overview in Detail.pptx
Mithun DSouza
 
ODP
Apache hive1
sheetal sharma
 
PPTX
Hadoop in a Nutshell
Anthony Thomas
 
PPTX
Comparison - RDBMS vs Hadoop vs Apache
SandeepTaksande
 
PPTX
Hive
Manas Nayak
 
PPT
Introduction to Apache hadoop
Omar Jaber
 
ODP
Hadoop introduction
葵慶 李
 
PPTX
BDA: Introduction to HIVE, PIG and HBASE
tripathineeharika
 
PDF
The ABC of Big Data
André Faria Gomes
 
Big data solutions in azure
Mostafa
 
Big data talking stories in Healthcare
Mostafa
 
Getting started big data
Kibrom Gebrehiwot
 
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Dataconomy Media
 
5 Comparing Microsoft Big Data Technologies for Analytics
Jen Stirrup
 
What is Apache Hadoop and its ecosystem?
tommychauhan
 
USQL Trivadis Azure Data Lake Event
Trivadis
 
BIGDATA ppts
Krisshhna Daasaarii
 
Using Machine Learning with HDInsight
Eng Teong Cheah
 
Apache Hadoop Hive
Some corner at the Laboratory
 
Hadoop Introduction
sheetal sharma
 
Hive - A theoretical overview in Detail.pptx
Mithun DSouza
 
Apache hive1
sheetal sharma
 
Hadoop in a Nutshell
Anthony Thomas
 
Comparison - RDBMS vs Hadoop vs Apache
SandeepTaksande
 
Introduction to Apache hadoop
Omar Jaber
 
Hadoop introduction
葵慶 李
 
BDA: Introduction to HIVE, PIG and HBASE
tripathineeharika
 
The ABC of Big Data
André Faria Gomes
 
Ad

More from Mostafa (12)

PPTX
The role of intelligent sensors in the cloud public
Mostafa
 
PPTX
Skill up in machine learning using Azure ML
Mostafa
 
PDF
Architecting big data solutions in the cloud
Mostafa
 
PPTX
Programming in Spark using PySpark
Mostafa
 
PPTX
How to migrate Console Apps as a cloud service
Mostafa
 
PPTX
HBase introduction in azure
Mostafa
 
PDF
eRecall
Mostafa
 
PPTX
Get your site microsoft edge ready
Mostafa
 
PPTX
Developing cross platform mobile apps using Apache Cordova
Mostafa
 
PPTX
Identity and o365 on Azure
Mostafa
 
PPTX
Azure Data platform
Mostafa
 
PPTX
Building IoT solutions using Windows 10 IoT Core & Azure
Mostafa
 
The role of intelligent sensors in the cloud public
Mostafa
 
Skill up in machine learning using Azure ML
Mostafa
 
Architecting big data solutions in the cloud
Mostafa
 
Programming in Spark using PySpark
Mostafa
 
How to migrate Console Apps as a cloud service
Mostafa
 
HBase introduction in azure
Mostafa
 
eRecall
Mostafa
 
Get your site microsoft edge ready
Mostafa
 
Developing cross platform mobile apps using Apache Cordova
Mostafa
 
Identity and o365 on Azure
Mostafa
 
Azure Data platform
Mostafa
 
Building IoT solutions using Windows 10 IoT Core & Azure
Mostafa
 

Recently uploaded (20)

PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
PPTX
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
The Future of Artificial Intelligence (AI)
Mukul
 
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 

Building Big data solutions in Azure

  • 2. Session Objectives And Takeaways  Understanding HDInsight cluster types & tiers in Azure  HBase as a Hadoop NoSQL database  Hive is a data warehouse software to manage large datasets using SQL  Understanding data processing options in Hadoop ecosystem using Storm and Spark
  • 3. • HDInsight is a cloud implementation on Microsoft Azure of the rapidly expanding Apache Hadoop technology stack that is the go-to solution for big data analysis. • It includes implementations of Apache Spark, HBase, Storm, Pig, Hive, Sqoop, Oozie, Ambari, and so on. • HDInsight also integrates with business intelligence (BI) tools such as Power BI, Excel, SQL Server Analysis Services, and SQL Server Reporting Services. • HDInsight is available on Windows and Linux • HDInsight on Linux: A Hadoop cluster on Ubuntu • HDInsight on Windows: A Hadoop cluster on Win Server 2012 R2 What is HDInsight
  • 4. • HDInsight provides cluster Types & custom configurations for: • Hadoop (HDFS) • HBase • Storm • Spark • R Server (Preview) • Skip maintaining and purchasing hardware • HDInsight has powerful programming extensions for languages including C#, Java, and .NET. Use your programming language of choice on Hadoop to create, configure, submit, and monitor Hadoop jobs. HDInsight clusters on Azure
  • 6. • Apache HBase is an open-source, NoSQL database that is built on Hadoop and modeled after Google BigTable. • HBase provides random access and strong consistency for large amounts of unstructured and semistructured data in a schemaless database organized by column families • Data is stored in the rows of a table, and data within a row is grouped by column family. • The open-source code scales linearly to handle petabytes of data on thousands of nodes. It can rely on data redundancy, batch processing, and other features that are provided by distributed applications in the Hadoop ecosystem. What is HBase
  • 7. Order No Customer Name Customer Phone Company Name Company Address 12012015 Mostafa 101-232-2345 Microsoft Redmond, WA Customer Company Order No Customer Name Customer Phone Company Name Company Address 12012015 Mostafa 101-232-2345 Microsoft Redmond, WA
  • 8. • HBase Commands: • create  Equivalent to Create table in T-SQL • get  Equivalent to Select statements in T-SQL • put  Equivalent to Update, Insert statement in T-SQL • scan  Equivalent to Select (no where condition) in T-SQL • HBase shell is your query tool to execute in CRUD commands to a HBase cluster. • Data can also be managed using the HBase C# API, which provides a client library on top of the HBase REST API. • An HBase database can also be queried by using Hive using SQLHive. What is HBase
  • 9. • Apache Hive is a data warehouse system for Hadoop, which enables data summarization, querying, and analysis of data by using HiveQL (a query language similar to SQL). • Hive understands how to work with structured and semi-structured data, such as text files where the fields are delimited by specific characters. • Hive also supports custom serializer/deserializers (SerDe) for complex or irregularly structured data. • Hive can also be extended through user-defined functions (UDF). • A UDF allows you to implement functionality or logic that isn't easily modeled in HiveQL. What is Hive
  • 11. • Apache Storm is a distributed, fault-tolerant, open-source computation system that allows you to process data in real-time with Hadoop. • Apache Storm on HDInsight allows you to create distributed, real-time analytics solutions in the Azure environment by using Apache Hadoop. • Storm solutions can also provide guaranteed processing of data, with the ability to replay data that was not successfully processed the first time. • Ability to write Storm components in C#, JAVA and Python. • Azure Scale up or Scale down without an impact for running Storm topologies. • Ease of provision and use in Azure portal. • Visual Studio project templates for Storm apps What is Apache Storm
  • 12. • Apache Storm apps are submitted as Topologies. • A topology is a graph of computation that processes streams • Stream: An unbound collection of tuples. Streams are produced by spouts and bolts, and they are consumed by bolts. • Tuple: A named list of dynamically typed values. • Spout: Consumes data from a data source and emits one or more streams. • Bolt: Consumes streams, performs processing on tuples, and may emit streams. Bolts are also responsible for writing data to external storage, such as a queue, HDInsight, HBase, a blob, or other data store. • Nimbus: JobTracker in Hadoop that distribute jobs, monitoring failures. Apache Storm Components
  • 14. • Apache Spark™ is a fast and general engine for large-scale data processing. • Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. • Write applications quickly in Java, Scala, Python, R. • Combine SQL, streaming, and complex analytics. • Spark's in-memory computation capabilities make it a good choice for iterative algorithms in ML and graph computations. • Spark is also compatible with Azure Blob storage (WASB) so your existing data stored in Azure can easily be processed via Spark. • Support for R Server & Azure Data Lake. What is Apache Spark
  • 16. Session Objectives And Takeaways  Understanding HDInsight cluster types & tiers in Azure  HBase as a Hadoop NoSQL database  Hive is a data warehouse software to manage large datasets using SQL  Understanding data processing options in Hadoop ecosystem using Storm and Spark

Editor's Notes

  • #2: The session covers how to get started to build big data solutions in Azure. Azure provides different Hadoop clusters for Hadoop ecosystem. The session covers the basic understanding of HDInsight clusters including: Apache Hadoop, HBase, Storm and Spark. The session covers how to integrate with HDInsight in .NET using different Hadoop integration frameworks and libraries. The session is a jump start for engineers and DBAs with RDBMS experience who are looking for a jump start working and developing Hadoop solutions. The session is a demo driven and will cover the basics of Hadoop open source products.
  • #3: The session covers how to get started to build big data solutions in Azure. Azure provides different Hadoop clusters for Hadoop ecosystem. The session covers the basic understanding of HDInsight clusters including: Apache Hadoop, HBase, Storm and Spark. The session covers how to integrate with HDInsight in .NET using different Hadoop integration frameworks and libraries. The session is a jump start for engineers and DBAs with RDBMS experience who are looking for a jump start working and developing Hadoop solutions. The session is a demo driven and will cover the basics of Hadoop open source products.
  • #4: Ref: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-introduction/
  • #9: Ref: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hbase-overview/
  • #10: Ref: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hbase-overview/
  • #11: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hbase-tutorial-get-started/ A) Working with hbase shell: Create a table. Insert a record. Update a record. Delete a record. Create a hive table that maps to hbase table we just created. B) Working with Hive: use the dashboard to create database and tables.
  • #12: Apache Storm in HDInsight https://azure.microsoft.com/en-us/documentation/articles/hdinsight-storm-overview/
  • #13: Apache Storm in HDInsight https://azure.microsoft.com/en-us/documentation/articles/hdinsight-storm-overview/ Tips: The Nimbus node provides similar functionality to the Hadoop JobTracker, and it assigns tasks to other nodes in the cluster through Zookeeper.
  • #14: Demo: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-storm-develop-csharp-visual-studio-topology/ Overview in HDInsight project templates in Visual Studio 2015: Create storm application Create Hive Application
  • #15: Ref: http://spark.apache.org/ https://azure.microsoft.com/en-us/documentation/articles/hdinsight-apache-spark-overview/
  • #16: Demo: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-apache-spark-ipython-notebook-machine-learning/ Apache Spark notepads https://azure.microsoft.com/en-us/documentation/articles/hdinsight-apache-spark-jupyter-spark-sql/
  • #18: HD Insight main documentation: https://azure.microsoft.com/en-us/documentation/services/hdinsight/