SlideShare a Scribd company logo
APACHE HADOOP
            ON AZURE AND WINDOWS
                 MICROSOFT’S APACHE HADOOP-BASED SERVICES FOR AZURE AND ENTERPRISE




ELASTIC MAPREDUCE FOR AZURE AND ENTERPRISE PRIVATE CLOUDS
Brad Sarsfield
Engineering Architect
Microsoft Big Data | Haodoop
March 2012 | revision 1.02
ISOTOPE BRIDGES BI TO COLLABORATION TO CLOUD


       “The next frontier is all about uniting the power of the cloud
       with the power of data to gain insights that simply weren’t
       possible even just a few years ago”
                                                    Ted Kummert, CVP Business Platforms
                                                    SQL PASS, October 2011
BIG DATA IS HERE AND HADOOP IS CENTER STAGE
15 out of 17
sectors in the US have more data
stored per company than the
US Library of Congress
                                                                140,000-190,000
                                                                more deep analytical talent positions
                             1.5 million                                                       50-60%
                             more data savvy managers
                                                                 increase in the number of Hadoop developers
                             in the US alone                        within organizations already using Hadoop
                                                                                                  within a year
   €250 billion
   Potential annual value to
   Europe’s public sector
                                                    $300 billion
                                                    Potential annual value to US healthcare

 ECONOMIC CONTEXT AND EXEMPLAR

                                   Special Report: The CEO’s Guide to Hadoop
                                    Learn how large corporations are coping with the increasing flow of
                                    unstructured data by using a free software program called Hadoop


                    http://www.businessweek.com/technology/special-reports/ceo-guide-to-hadoop.html
THE 4Vs OF BIG DATA: VOLUME, VELOCITY, VARIABILITY, AND VARIETY
Isotope is designed to enable solution building with all key dimensions in mind
Deep integration and coordination with existing Microsoft enterprise, cloud, and BI tools
Cassandra             Hadoop                 BackType                MR/GFS                  SimpleDB
      Hive                  Oozie                  Hadoop                  Bigtable                Dynamo
      Scribe                PigLatin               Pig HBase               Dremel                  EC2/EMR/S3
      Hadoop                …                      Cassandra               …                       …




                         Internal [ Dryad | Cosmos] and External [ Isotope | Azure | Excel | BI | SQL DW | LTH ]




VIBRANT ECOSYSTEM IN ENTERPRISE AND CLOUD WITH MICROSOFT
Scalable machine learning and data mining [Mahout]
Statistical modeling and analysis [R]
Coordination and workflow [Oozie, Cascading]
Data integration and transformation [SQOOP, Flume]
Social network analytics and petascale graph learning [Pegasus]
Real-time stream analytics and business intelligence merged with petascale computation[HStreamming]
Scale-out caching and storage [Cassandra, HBase, Riak, Redis, Couchbase, S3]
Cloud-oriented data warehousing, pattern discovery, and transformation [Hive, Pig]
ENTER ISOTOPE
Isotope is the internal codename for Microsoft’s suite of products to support Hadoop in Windows and Azure
Un- and Semi-Structured
         Sensors
         Crawlers
                                             SQL REPORTING
          Devices                                                     Interactive Reports
                                                                         with Crescent
            Bots
           Apps
                                                                                                                 Business
                          HADOOP              SQL ANALYSIS
                                                                                                                  Users
                                                                          Excel with
                                                                          PowerPivot

         EIS
         ERP                                   SQL DATA
                                              WAREHOUSING
         CRM
         LOB
                                                                      Embedded BI Apps

       Structured

OUR DIFFERENTIATORS FOR CLOUD AND ENTERPRISE
Self-service business intelligence at any scale on premise or cloud
Complete integration of information assets from log files to collaboration artifacts to enterprise data stores
Familiar and integrated tools for analytics, insight, exploration, modeling, and strategic decision making
Transparent, federated identity and security management for all big data services
High availability data protection and recovery services for enterprises through cloud
Enterprise-grade support for all service, frameworks, and tools
HADOOP
                                           [Azure and Enterprise]


   Java OM        Streaming OM   HiveQL                   PigLatin               .NET/C#/F#           (T)SQL




                                              OCEAN OF DATA
              NOSQL              [unstructured, semi-structured, structured]                  ETL




                                             HDFS



A SEAMLESS OCEAN OF INFORMATION PROCESSING AND ANALYTICS




  EIS / ERP           RDBMS                  File System                       OData [RSS]          Azure Storage
PROJECT ISOTOPE OFFERINGS
•   Bi-directional connectors between Hadoop and SQL and PDW
•   ODBC driver for Hadoop
•   Hive plug-in for Excel
•   Hosted elastic Hadoop service on Azure
•   Microsoft’s Apache Hadoop-based solution for Windows Azure
•   Microsoft’s Apache Hadoop-based solution for Windows Server
•   JavaScript support for Hadoop, with web-based interactive environment
•   Contributions back to the open source community via the Apache Foundation
HIVE PLUG-IN FOR EXCEL
•   Connect Excel directly to Hive
•   Browse Hive objects – tables, columns, etc.
•   Construct and issue queries
HOSTED ELASTIC HADOOP SERVICE ON AZURE
•   Elastic MapReduce, Hive, PigLatin, .Net, Javascript, and integration with BI, DW, and Office Collaboration tools
•   Simple management UI
•   Full Hadoop compatibility
•   Native support for Azure Blob Storage from HDFS
Apache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azure
MICROSOFT’S APACHE HADOOP-BASED SOLUTION FOR WINDOWS AZURE

•   One-click deployment of Hadoop on Azure cluster
MICROSOFT’S APACHE HADOOP-BASED SOLUTION FOR WINDOWS
•   All standard Hadoop modules supported:
             Hadoop | HDFS | Pig | Hive | Monitoring Pages
•   One-click installer
•   Simplified cluster configuration
•   Integration with Microsoft ecosystem
           System Center | Active Directory | etc.
// Map Reduce function in JavaScript
// -------------------------------------------------------
-----------

var map = function (key, value, context) {
           var words = value.split(/[^a-zA-Z]/);
           for (var i = 0; i < words.length; i++) {
                      if (words[i] !== "") {

           context.write(words[i].toLowerCase(), 1);
                      }
           }
};

var reduce = function (key, values, context) {
           var sum = 0;
           while (values.hasNext()) {
                      sum += parseInt(values.next());
           }
           context.write(key, sum);
};




 ISOTOPE.JS: OUR VB MOMENT FOR BIG DATA
 •   Write MapReduce jobs in JavaScript
 •   Interactive development environment
 •   Interactive data query and analytics of petascale datasets
 •   HIVE command line for interactive HIVE
 •   Charting and graphing for insight and analytics visualization
“We are excited to work with Microsoft to help make Apache
      Hadoop a compelling platform for storing and processing data.
      Hortonworks welcomes Microsoft to the Hadoop ecosystem
      and looks forward to lending our deep domain expertise to
      help accelerate the delivery of Microsoft’s Apache Hadoop-
      based solution for Windows Server and service for Windows
      Azure.”
                                                  Eric Baldeschwieler
                                                  CEO

GIVING BACK AND PARTICIPATING IN THE HADOOP COMMUNITY
Microsoft will be working with the community to contribute back significant code to the Apache Foundation
Microsoft has announced a partnership with Hortonworks to help accelerate our open source support
APACHE HADOOP
            ON AZURE AND WINDOWS
                 MICROSOFT’S APACHE HADOOP-BASED SERVICES FOR AZURE AND ENTERPRISE




SUMMARY
Please visit HadoopOnAzure.com to start using Microsoft’s elastic services for Apache Hadoop
Please visit www.microsoft.com/bigdata to learn more about project codename “Isotope” and the broader ecosystem of
products and services Microsoft is delivering in 2012 an beyond

More Related Content

What's hot (18)

PDF
10 things ever architect should know about the Windows Azure Platform - ericnel
Eric Nelson
 
PPTX
Understanding The Azure Platform March 2010
DavidGristwood
 
PPTX
Move to azure
feature[23]
 
PPTX
Understanding The Azure Platform Jan
DavidGristwood
 
PPTX
A Lap Around Azure
DavidGristwood
 
PDF
Spring in the Cloud
Eberhard Wolff
 
PPTX
Microsoft Cloud BI Update 2012 for SQL Saturday Philly
Mark Kromer
 
PPTX
Migrating Data and Databases to Azure
Karen Lopez
 
PDF
Seeding The Cloud
Ted Leung
 
PPTX
Optimizing Cloud Foundry and OpenStack for large scale deployments
Animesh Singh
 
ODP
Farming hadoop in_the_cloud
Steve Loughran
 
PPTX
Windows Azure Design Patterns
David Pallmann
 
PDF
Azure Data services
Rajesh Kolla
 
PPTX
Windows Azure for Developers - Service Management
Michael Collier
 
PPT
Architecture Best Practices on Windows Azure
Nuno Godinho
 
PPTX
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp
 
PPTX
Windows Azure for Developers - Building Block Services
Michael Collier
 
PDF
Windows Azure Platform: Articles from the Trenches, Volume One
Eric Nelson
 
10 things ever architect should know about the Windows Azure Platform - ericnel
Eric Nelson
 
Understanding The Azure Platform March 2010
DavidGristwood
 
Move to azure
feature[23]
 
Understanding The Azure Platform Jan
DavidGristwood
 
A Lap Around Azure
DavidGristwood
 
Spring in the Cloud
Eberhard Wolff
 
Microsoft Cloud BI Update 2012 for SQL Saturday Philly
Mark Kromer
 
Migrating Data and Databases to Azure
Karen Lopez
 
Seeding The Cloud
Ted Leung
 
Optimizing Cloud Foundry and OpenStack for large scale deployments
Animesh Singh
 
Farming hadoop in_the_cloud
Steve Loughran
 
Windows Azure Design Patterns
David Pallmann
 
Azure Data services
Rajesh Kolla
 
Windows Azure for Developers - Service Management
Michael Collier
 
Architecture Best Practices on Windows Azure
Nuno Godinho
 
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp
 
Windows Azure for Developers - Building Block Services
Michael Collier
 
Windows Azure Platform: Articles from the Trenches, Volume One
Eric Nelson
 

Viewers also liked (15)

PPTX
Hadoop in a Windows Shop - CHUG - 20120416
Chicago Hadoop Users Group
 
PPTX
Installing Hortonworks Hadoop for Windows
Jonathan Bloom
 
PPTX
Hadoop on Windows 8
Vala Ali Rohani
 
PPTX
Enterprise architecture as practice
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
DOCX
Case study haad operating model improvement model
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
PPT
Integrating Zachman and TOGAF-ADM
Tetradian Consulting
 
PPT
Zachman Tutorial
Amol Kulkarni
 
PPTX
Togaf introduction and core concepts
Paul Sullivan
 
PPS
Understanding and Applying The Open Group Architecture Framework (TOGAF)
Nathaniel Palmer
 
PDF
Introduction to Enterprise Architecture and TOGAF 9.1
iasaglobal
 
PPTX
Learn Togaf 9.1 in 100 slides!
Sam Mandebvu
 
PDF
Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...
Chandrashekhar More
 
PPT
Enterprise Architecture for Dummies - TOGAF 9 enterprise architecture overview
Winton Winton
 
Hadoop in a Windows Shop - CHUG - 20120416
Chicago Hadoop Users Group
 
Installing Hortonworks Hadoop for Windows
Jonathan Bloom
 
Hadoop on Windows 8
Vala Ali Rohani
 
Enterprise architecture as practice
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Case study haad operating model improvement model
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Integrating Zachman and TOGAF-ADM
Tetradian Consulting
 
Zachman Tutorial
Amol Kulkarni
 
Togaf introduction and core concepts
Paul Sullivan
 
Understanding and Applying The Open Group Architecture Framework (TOGAF)
Nathaniel Palmer
 
Introduction to Enterprise Architecture and TOGAF 9.1
iasaglobal
 
Learn Togaf 9.1 in 100 slides!
Sam Mandebvu
 
Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...
Chandrashekhar More
 
Enterprise Architecture for Dummies - TOGAF 9 enterprise architecture overview
Winton Winton
 
Ad

Similar to Apache hadoop for windows server and windwos azure (20)

PPT
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
Cloudera, Inc.
 
PPTX
Microsoft's Hadoop Story
Michael Rys
 
PPTX
Big Data and NoSQL for Database and BI Pros
Andrew Brust
 
PDF
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Cloudera, Inc.
 
PDF
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
Amr Awadallah
 
PPTX
SQL Server 2012 and Big Data
Microsoft TechNet - Belgium and Luxembourg
 
PPTX
Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Serie...
Cloudera, Inc.
 
PDF
Keynote from ApacheCon NA 2011
Hortonworks
 
PPT
Big Data = Big Decisions
InnoTech
 
PDF
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Cloudera, Inc.
 
PPTX
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra
 
PPTX
Introduction to Microsoft HDInsight and BI Tools
DataWorks Summit
 
PPTX
Microsoft's Big Play for Big Data
Andrew Brust
 
PPTX
Anexinet Big Data Solutions
Mark Kromer
 
PPT
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Andrew Brust
 
PDF
Hadoop Trends
Hortonworks
 
PPTX
Apache Hadoop Now Next and Beyond
DataWorks Summit
 
PDF
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Cloudera, Inc.
 
PDF
Self-Service Access and Exploration of Big Data
Inside Analysis
 
PDF
Apache hadoop bigdata-in-banking
m_hepburn
 
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
Cloudera, Inc.
 
Microsoft's Hadoop Story
Michael Rys
 
Big Data and NoSQL for Database and BI Pros
Andrew Brust
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Cloudera, Inc.
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
Amr Awadallah
 
SQL Server 2012 and Big Data
Microsoft TechNet - Belgium and Luxembourg
 
Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Serie...
Cloudera, Inc.
 
Keynote from ApacheCon NA 2011
Hortonworks
 
Big Data = Big Decisions
InnoTech
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Cloudera, Inc.
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra
 
Introduction to Microsoft HDInsight and BI Tools
DataWorks Summit
 
Microsoft's Big Play for Big Data
Andrew Brust
 
Anexinet Big Data Solutions
Mark Kromer
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Andrew Brust
 
Hadoop Trends
Hortonworks
 
Apache Hadoop Now Next and Beyond
DataWorks Summit
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Cloudera, Inc.
 
Self-Service Access and Exploration of Big Data
Inside Analysis
 
Apache hadoop bigdata-in-banking
m_hepburn
 
Ad

Apache hadoop for windows server and windwos azure

  • 1. APACHE HADOOP ON AZURE AND WINDOWS MICROSOFT’S APACHE HADOOP-BASED SERVICES FOR AZURE AND ENTERPRISE ELASTIC MAPREDUCE FOR AZURE AND ENTERPRISE PRIVATE CLOUDS Brad Sarsfield Engineering Architect Microsoft Big Data | Haodoop March 2012 | revision 1.02
  • 2. ISOTOPE BRIDGES BI TO COLLABORATION TO CLOUD “The next frontier is all about uniting the power of the cloud with the power of data to gain insights that simply weren’t possible even just a few years ago” Ted Kummert, CVP Business Platforms SQL PASS, October 2011
  • 3. BIG DATA IS HERE AND HADOOP IS CENTER STAGE
  • 4. 15 out of 17 sectors in the US have more data stored per company than the US Library of Congress 140,000-190,000 more deep analytical talent positions 1.5 million 50-60% more data savvy managers increase in the number of Hadoop developers in the US alone within organizations already using Hadoop within a year €250 billion Potential annual value to Europe’s public sector $300 billion Potential annual value to US healthcare ECONOMIC CONTEXT AND EXEMPLAR Special Report: The CEO’s Guide to Hadoop Learn how large corporations are coping with the increasing flow of unstructured data by using a free software program called Hadoop http://www.businessweek.com/technology/special-reports/ceo-guide-to-hadoop.html
  • 5. THE 4Vs OF BIG DATA: VOLUME, VELOCITY, VARIABILITY, AND VARIETY Isotope is designed to enable solution building with all key dimensions in mind Deep integration and coordination with existing Microsoft enterprise, cloud, and BI tools
  • 6. Cassandra Hadoop BackType MR/GFS SimpleDB Hive Oozie Hadoop Bigtable Dynamo Scribe PigLatin Pig HBase Dremel EC2/EMR/S3 Hadoop … Cassandra … … Internal [ Dryad | Cosmos] and External [ Isotope | Azure | Excel | BI | SQL DW | LTH ] VIBRANT ECOSYSTEM IN ENTERPRISE AND CLOUD WITH MICROSOFT Scalable machine learning and data mining [Mahout] Statistical modeling and analysis [R] Coordination and workflow [Oozie, Cascading] Data integration and transformation [SQOOP, Flume] Social network analytics and petascale graph learning [Pegasus] Real-time stream analytics and business intelligence merged with petascale computation[HStreamming] Scale-out caching and storage [Cassandra, HBase, Riak, Redis, Couchbase, S3] Cloud-oriented data warehousing, pattern discovery, and transformation [Hive, Pig]
  • 7. ENTER ISOTOPE Isotope is the internal codename for Microsoft’s suite of products to support Hadoop in Windows and Azure
  • 8. Un- and Semi-Structured Sensors Crawlers SQL REPORTING Devices Interactive Reports with Crescent Bots Apps Business HADOOP SQL ANALYSIS Users Excel with PowerPivot EIS ERP SQL DATA WAREHOUSING CRM LOB Embedded BI Apps Structured OUR DIFFERENTIATORS FOR CLOUD AND ENTERPRISE Self-service business intelligence at any scale on premise or cloud Complete integration of information assets from log files to collaboration artifacts to enterprise data stores Familiar and integrated tools for analytics, insight, exploration, modeling, and strategic decision making Transparent, federated identity and security management for all big data services High availability data protection and recovery services for enterprises through cloud Enterprise-grade support for all service, frameworks, and tools
  • 9. HADOOP [Azure and Enterprise] Java OM Streaming OM HiveQL PigLatin .NET/C#/F# (T)SQL OCEAN OF DATA NOSQL [unstructured, semi-structured, structured] ETL HDFS A SEAMLESS OCEAN OF INFORMATION PROCESSING AND ANALYTICS EIS / ERP RDBMS File System OData [RSS] Azure Storage
  • 10. PROJECT ISOTOPE OFFERINGS • Bi-directional connectors between Hadoop and SQL and PDW • ODBC driver for Hadoop • Hive plug-in for Excel • Hosted elastic Hadoop service on Azure • Microsoft’s Apache Hadoop-based solution for Windows Azure • Microsoft’s Apache Hadoop-based solution for Windows Server • JavaScript support for Hadoop, with web-based interactive environment • Contributions back to the open source community via the Apache Foundation
  • 11. HIVE PLUG-IN FOR EXCEL • Connect Excel directly to Hive • Browse Hive objects – tables, columns, etc. • Construct and issue queries
  • 12. HOSTED ELASTIC HADOOP SERVICE ON AZURE • Elastic MapReduce, Hive, PigLatin, .Net, Javascript, and integration with BI, DW, and Office Collaboration tools • Simple management UI • Full Hadoop compatibility • Native support for Azure Blob Storage from HDFS
  • 27. MICROSOFT’S APACHE HADOOP-BASED SOLUTION FOR WINDOWS AZURE • One-click deployment of Hadoop on Azure cluster
  • 28. MICROSOFT’S APACHE HADOOP-BASED SOLUTION FOR WINDOWS • All standard Hadoop modules supported: Hadoop | HDFS | Pig | Hive | Monitoring Pages • One-click installer • Simplified cluster configuration • Integration with Microsoft ecosystem System Center | Active Directory | etc.
  • 29. // Map Reduce function in JavaScript // ------------------------------------------------------- ----------- var map = function (key, value, context) { var words = value.split(/[^a-zA-Z]/); for (var i = 0; i < words.length; i++) { if (words[i] !== "") { context.write(words[i].toLowerCase(), 1); } } }; var reduce = function (key, values, context) { var sum = 0; while (values.hasNext()) { sum += parseInt(values.next()); } context.write(key, sum); }; ISOTOPE.JS: OUR VB MOMENT FOR BIG DATA • Write MapReduce jobs in JavaScript • Interactive development environment • Interactive data query and analytics of petascale datasets • HIVE command line for interactive HIVE • Charting and graphing for insight and analytics visualization
  • 30. “We are excited to work with Microsoft to help make Apache Hadoop a compelling platform for storing and processing data. Hortonworks welcomes Microsoft to the Hadoop ecosystem and looks forward to lending our deep domain expertise to help accelerate the delivery of Microsoft’s Apache Hadoop- based solution for Windows Server and service for Windows Azure.” Eric Baldeschwieler CEO GIVING BACK AND PARTICIPATING IN THE HADOOP COMMUNITY Microsoft will be working with the community to contribute back significant code to the Apache Foundation Microsoft has announced a partnership with Hortonworks to help accelerate our open source support
  • 31. APACHE HADOOP ON AZURE AND WINDOWS MICROSOFT’S APACHE HADOOP-BASED SERVICES FOR AZURE AND ENTERPRISE SUMMARY Please visit HadoopOnAzure.com to start using Microsoft’s elastic services for Apache Hadoop Please visit www.microsoft.com/bigdata to learn more about project codename “Isotope” and the broader ecosystem of products and services Microsoft is delivering in 2012 an beyond

Editor's Notes

  • #5: Key Message: Big Data is a real problem, and Hadoop’s star is rising. It is economically transformative in the way LAMP was in the previous decade. (Linux, Apache, MySQL, Php/Python)Reference numbers from McKinsey Global Institute – Big Data: The next frontier for innovation competition (http://www.mckinsey.com/mgi/publications/big_data/index.asp)http://www.karmasphere.com/images/documents/Karmasphere-HadoopDeveloperResearch.pdfHadoop is moving into mainstream consciousness now. Businessweek recently had a special report dedicated to Hadoop, with half a dozen articles.http://www.businessweek.com/technology/special-reports/ceo-guide-to-hadoop.html
  • #6: http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-dataKEY POINT: Hadoop is part of the solution -
  • #7: Hadoop is an AND, not an OR. But it requires a certain philosophy that MSFT has not historically embraced. A key benefit of Hadoop is the large, vibrant open source community around it. To succeed, Microsoft needs to not only acknowledge but thrive in this community.
  • #9: BIG self service BIBillions+ of data itemsUnstructured, semi-structured, log dataReal-time feedsNew analysis types leveraging large server clusters Leverage the Hadoop ecosystem and ride its momentumIW centric designGive business users direct access to the Big Data storeDeliver IW-centric experiences optimized for unstructured and semi-structured queriesCreate, enrich, visualize and share big data sets through fun and immersive experiencesDo it all in the tool they already use - ExcelIncrease the number of questions, reduce the cost of exploratory mining to zeroLeverage new class of analytics and visualizationEnable new types of questions with new types of data and visualizationsLeverage analysis of text, sentiment, clickstream, time windows, classification, clusteringVisualize big data in impactful ways: tag clouds, graphs, timelines, tree maps, etc. Natural extension of our BI platformMaintain a consistent semantic model, consistent expression languageProvide an iterative, experimental, business-driven workflow from the desktop to the Big Data clusterBuild on existing IW skills with the Microsoft BI platform (Excel, PowerPivot, Crescent)Optimized for cloudIntegrate with Azure DataMarket to connect to Bing and other public data sourcesHost big data sets on Azure , integrated with MyDataLeverage Isotope to run analytics clusters
  • #10: Isotope is the all-up effort around Microsoft and Hadoop. It includes several components:A full distribution of Apache Hadoop that runs on standard windows hardware.A full version of Apache Hadoop that runs on the Azure cloudConnectors from Hadoop (any Hadoop, not just Microsoft’s) to Microsoft’s key products – SQL, Excel, PDW, etc.Jscript shell for live scripting of Hadoop from the browserAdmin, monitoring, and authoring tools to make Microsoft Hadoop best-in-class