SlideShare a Scribd company logo
Tackling Big Data with Hadoop and
Open Source Integration




                         Ciaran Dynes
                         Remy Dubois
Agenda



 1. Talend’s Goal: Democratizing Integration

 2. What is Big Data (integration)?

 3. Big Data for the Masses: Talend’s strategy and vision




© Talend 2011                                               2
Our goal
Talend – The Market Leading Unified Integration Platform

                                     Talend Enterprise


                 Data            Data
                                              MDM     ESB         BPM
                Quality       Integration

                                                                          ¾  Commercial license
                                                                          ¾  Subscription model

         Studio            Repository Deployment Execution   Monitoring



                                                                          ¾  Open source license

                           Talend Open Studio          for
                                                                          ¾  Free of charge
                                                                          ¾  Optional support

                  Data             Data
                 Quality        Integration   MDM     ESB




Recognized as the open source leader in each of its market
            category by all industry analysts
© Talend 2011                                                                                       4
Who uses Talend?

 A high adoption rate

  § 20 million downloads
  § 950,000 users
  § 3,500 customers


                1 product download   150 new customers
                 every 30 seconds        per month

© Talend 2011                                            5
Trying to get from this…




 © Talend 2011 – Stri2y Private & Confidential
 © Talend 2011                                   6
to this…




 Why Talend…

 ONLY Talend generates code that is executed within map reduce. This
 open approach removes the limitation of a proprietary “engine” to
 provide a truly unique and powerful set of tools for big data.
Big data is….



                                          Hans Rosling – uses big data to analyze world health trends




     Key Takeaway #1
    transactions, interactions, observations

© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                                           8
Big Data = Transactions + Interactions + Observations


                                                       Sensors/RFID/Devices                                   User Generated Content
                                                                                       Big Data
                Mega, Giga, Tera, Peta bytes


                                                            Sentiment                                        Social Interactions & Feeds
                                                            Mobile Web
                                                                                                             Spatial & GPS coordinates
                                                            User Clicks
                                                                                                               External Demographics

                                                   Web logs                WEB                                  Business Data Feeds
                                                 Offer history                             A/B testing          Video, Audio, Images
                                                                                         Dynamic pricing             SMS/MMS
                                                             CRM Segmentation           Affiliate Networks
                                                                                        Search Marketing
                                                    ERP              Offer details
                                               Purchase detail   Customer Touchpoints Behavioral Targeting
                                               Purchase record     Support Contacts     Dynamic Funnels
                                               Payment record




                                                             Increasing Data Variety and Complexity



                                                                                                                Source: Hortonworks

© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                                                                              9
What is Big Data integration?
Traditional Data Flows


          CRM


                                                 ETL
                                                               Normalized   Traditional Data
          ERP                                    Data             Data
                                                                              Warehouse
                                                Quality

       Finance




 •  Scheduled–daily or weekly,
    sometimes more frequently.                                               Business           Business
                                                                             Analyst            User
 •  Volumes rarely exceed
    terabytes                                           Warehouse
                                                      Administrator
                                                                                               Executives
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                                          11
The new world of big data

                                                             Social
                                                           Networking
          CRM




          ERP
                                                Big Data


       Finance




© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                           12
The new world of big data

                                                              Social
                                                            Networking
          CRM


                                                           Mobile Devices

          ERP



                                                Big Data
       Finance




© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                               13
The new world of big data

                                                              Social
                                                            Networking
          CRM


                                                           Mobile Devices

          ERP

                                                            Transactions


       Finance

                                                Big Data




© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                               14
The new world of big data

                                                               Social
                                                             Networking
          CRM


                                                           Mobile Devices

          ERP

                                                            Transactions


       Finance
                                                           Network Devices



                                                Big Data       Sensors




© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                15
Key Takeaway #2

                 Forces us to think
© Talend 2011
                 differently
© Talend 2011 – Stri2y Private & Confidential   16
But for Talend…. Big data is…




                …everything that is old, is new again!

© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                            17
Data driven business


                            enables
          data            governance




                                                         supports
                                  information                                       decisions


                                                                                          drives
  Information provides
  value to the business
  If you can't rely on your information then                                           Your
  the result can be missed opportunities, or                                         business
  higher costs.
      Matthew West and Julian Fowler (1999). Developing High Quality Data Models.
      The European Process Industries STEP Technical Liaison Executive (EPISTLE).
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                                      18
BIG data driven business

                            enables
     BIG data             governance




                                                         supports
                                      BIG                                            BIG
                                  information                                       decisions

                                                                                          drives
  Information provides
  value to the business
  If you can't rely on your information then
  the result can be missed opportunities, or                                         BIG
  higher costs.                                                                      business

      Matthew West and Julian Fowler (1999). Developing High Quality Data Models.
      The European Process Industries STEP Technical Liaison Executive (EPISTLE).
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                                      19
“Big Data for the Masses”
Goal: Democratize Big Data


                                                 Talend Open Studio for Big Data
                                                 ¾  “Big Data for the Masses”
                                                   ¾  Improves efficiency of big data job
                                                      design with graphic interface
                                                   ¾  Abstracts and generates code
                                                   ¾  Run transforms inside Hadoop

                                          Pig
                                                   ¾  Native support for HDFS, Pig, HBase,
                                                      Sqoop and Hive
                                                   ¾  Apache License 2.0
                                                   ¾  Embedded in Hortonworks Data
         …an open source                              Platform
           ecosystem
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                                 21
Let us show you…




© Talend 2012
Where to next?




© Talend 2012
How is big data integration being used?

 Use Cases
 •     Recommendation Engine
 •     Sentiment Analysis
 •     Risk Modeling
 •     Fraud Detection
 •     Marketing Campaign Analysis
 •     Customer Churn Analysis
 •     Social Graph Analysis
 •     Customer Experience Analytics
 •     Network Monitoring
 •     Research And Development

 BUT: to what level is DQ required for your use
 case?
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                     24
Poor Data Quality + Big Data = Big Problems
Poor Data Quality * Big Data = Big Problems^2




           Key Takeaway #3
           In big data…
           poor data quality can be magnified at huge scale

© Talend 2011                                                 25
Two methods for inserting data quality into a big data job




 1.  Pipelining: as part of the load process


 2.  Load the cluster than implement and execute
     a data quality map reduce job




© Talend 2011                                                 26
E-T-L - Load
      Extract – Transform

© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                   27
E- DQ -L
      Extract – Improve/Cleanse - Load
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                   28
Pipelining: data quality with big data



               CRM
                                                DQ


               ERP



                                                DQ
            Finance
                                                            Big Data

           Social
         Networking
                                                     •  Use traditional data quality tools
                                                     •  No new programming, no PHDs
                                                     •  Once and done
      Mobile Devices



© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                                29
Big data alternative: Load and improve within the cluster



               CRM

                                                      DQ

               ERP
                                                            DQ

            Finance
                                                         Big Data

           Social
         Networking
                                                •    Load first, improve later
                                                •    Really complex to build, limited tools
                                                •    Constant on, increments
      Mobile Devices
                                                •    Insane performance


© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                                 30
big
2012
         data                                   now   Q4   2013


Talend Open Studio for Big Data
¾ Packaged within Hortonworks Data Platform
     …Eclipse tools for HIVE, HDFS, PIG, SCOOP

     …supports Oozie, Hcatalog, Kerberos


¾ Free to download and use under the Apache license
   …democratizing big data through intuitive tools




© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                     31
Thanks for attending
Sessions will resume at 11:25am




                             Page 33

More Related Content

What's hot (20)

PPTX
Search2012 ibm vf
Isabelle Claverie-Berge
 
PDF
BI Forum 2009 - BI Mega Trends
OKsystem
 
PDF
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase
 
PPTX
OWF12/Java Michael hirt
Paris Open Source Summit
 
PDF
IBM Stream au Hadoop User Group
Modern Data Stack France
 
PDF
Mike Stolz Dramatic Scalability
deimos
 
PDF
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
BigDataCloud
 
PDF
The Best Analytics Tools
Datalicious
 
PDF
Big Data for Everyman
Michael Wilde
 
PDF
Katrina marques presentation
Ark Group Australia Pty Ltd
 
PDF
Analyse prédictive en assurance santé par Julien Cabot
Modern Data Stack France
 
PPTX
MWG Big Data & Media - Nick North (GfK UK)
MWG verbindt media
 
PDF
Vision - The Agile Data Center
incommoninc
 
PPTX
Module 3 Adapative Customer Experience Final
Vivastream
 
PDF
HCLT Brochure: E-Discovery and Document Review Solutions
HCL Technologies
 
PDF
Le Cloud de proximité by Monaco Telecom et Interxion
Yannick Quentel
 
PDF
Open Video Customer Presentation
MetroFiber
 
PDF
2012.04.26 big insights streams im forum2
Wilfried Hoge
 
PDF
Enterprise Security Architecture: From Access to Audit
Bob Rhubart
 
PPT
Striving for an Outstanding IT Organization
Huberto Garza
 
Search2012 ibm vf
Isabelle Claverie-Berge
 
BI Forum 2009 - BI Mega Trends
OKsystem
 
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase
 
OWF12/Java Michael hirt
Paris Open Source Summit
 
IBM Stream au Hadoop User Group
Modern Data Stack France
 
Mike Stolz Dramatic Scalability
deimos
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
BigDataCloud
 
The Best Analytics Tools
Datalicious
 
Big Data for Everyman
Michael Wilde
 
Katrina marques presentation
Ark Group Australia Pty Ltd
 
Analyse prédictive en assurance santé par Julien Cabot
Modern Data Stack France
 
MWG Big Data & Media - Nick North (GfK UK)
MWG verbindt media
 
Vision - The Agile Data Center
incommoninc
 
Module 3 Adapative Customer Experience Final
Vivastream
 
HCLT Brochure: E-Discovery and Document Review Solutions
HCL Technologies
 
Le Cloud de proximité by Monaco Telecom et Interxion
Yannick Quentel
 
Open Video Customer Presentation
MetroFiber
 
2012.04.26 big insights streams im forum2
Wilfried Hoge
 
Enterprise Security Architecture: From Access to Audit
Bob Rhubart
 
Striving for an Outstanding IT Organization
Huberto Garza
 

Viewers also liked (20)

PPTX
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Hortonworks
 
PDF
The Next Generation of Big Data Analytics
Hortonworks
 
PDF
vBACD July 2012 - Apache Hadoop, Now and Beyond
CloudStack - Open Source Cloud Computing Project
 
PDF
Spark Streaming
Edureka!
 
PPTX
Practical Kerberos with Apache HBase
Josh Elser
 
PPTX
Apache Phoenix Query Server
Josh Elser
 
PDF
啟程:Data Technology 的待客之道
Etu Solution
 
PDF
台灣 Hadoop Big Data 2014 趨勢預測與企業策略藍圖
Etu Solution
 
PDF
Data Leaders in Action - 資料價值領袖風範與關鍵行動
Etu Solution
 
PDF
那些你知道的,但還沒看過的 Big Data 風景
Etu Solution
 
PDF
Interface fonctionnelle, Lambda expression, méthode par défaut, référence de...
MICHRAFY MUSTAFA
 
PDF
Scala: Pattern matching, Concepts and Implementations
MICHRAFY MUSTAFA
 
PDF
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
Kai Wähner
 
PPTX
Apache Phoenix: Transforming HBase into a SQL Database
DataWorks Summit
 
PPTX
Apache phoenix: Past, Present and Future of SQL over HBAse
enissoz
 
PDF
Scala : programmation fonctionnelle
MICHRAFY MUSTAFA
 
PPTX
Mobile to Mainframe - the Challenges of Enterprise DevOps Adoption
Sanjeev Sharma
 
PDF
Spark RDD : Transformations & Actions
MICHRAFY MUSTAFA
 
PPTX
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
HARMAN Services
 
PDF
資料科學團隊人才培育分享 ─ 以 DSP 為例
Fred Chiang
 
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Hortonworks
 
The Next Generation of Big Data Analytics
Hortonworks
 
vBACD July 2012 - Apache Hadoop, Now and Beyond
CloudStack - Open Source Cloud Computing Project
 
Spark Streaming
Edureka!
 
Practical Kerberos with Apache HBase
Josh Elser
 
Apache Phoenix Query Server
Josh Elser
 
啟程:Data Technology 的待客之道
Etu Solution
 
台灣 Hadoop Big Data 2014 趨勢預測與企業策略藍圖
Etu Solution
 
Data Leaders in Action - 資料價值領袖風範與關鍵行動
Etu Solution
 
那些你知道的,但還沒看過的 Big Data 風景
Etu Solution
 
Interface fonctionnelle, Lambda expression, méthode par défaut, référence de...
MICHRAFY MUSTAFA
 
Scala: Pattern matching, Concepts and Implementations
MICHRAFY MUSTAFA
 
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
Kai Wähner
 
Apache Phoenix: Transforming HBase into a SQL Database
DataWorks Summit
 
Apache phoenix: Past, Present and Future of SQL over HBAse
enissoz
 
Scala : programmation fonctionnelle
MICHRAFY MUSTAFA
 
Mobile to Mainframe - the Challenges of Enterprise DevOps Adoption
Sanjeev Sharma
 
Spark RDD : Transformations & Actions
MICHRAFY MUSTAFA
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
HARMAN Services
 
資料科學團隊人才培育分享 ─ 以 DSP 為例
Fred Chiang
 
Ad

Similar to Tackling big data with hadoop and open source integration (20)

PDF
Talend Open Studio and Hortonworks Data Platform
Hortonworks
 
PDF
The Comprehensive Approach: A Unified Information Architecture
Inside Analysis
 
PDF
Hortonworks roadshow
Accenture
 
PDF
Hadoop: What It Is and What It's Not
Inside Analysis
 
PDF
Powering Next Generation Data Architecture With Apache Hadoop
Hortonworks
 
PDF
Unified big data architecture
DataWorks Summit
 
PPTX
The Evolution of Platforms - Drew Kurth and Matt Comstock
Razorfish
 
PPTX
2012 06 hortonworks paris hug
Modern Data Stack France
 
PDF
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks
 
PPTX
Break Through the Traditional Advertisement Services with Big Data and Apache...
Hortonworks
 
PPTX
Enterprise Services Solutions
Karya Technologies
 
PDF
Integrating social media monitoring, analytics and engagment marshall sponde...
Marshall Sponder
 
PPT
4.4.2013 Software, System, & IT Architecture - Good Design is Good Business:...
IBM Rational
 
PDF
Intel Cloud summit: Big Data by Nick Knupffer
IntelAPAC
 
PDF
Monitoring analytics workshop marshall sponder for london - march 26th prese...
Marshall Sponder
 
PDF
Evento Sugar Crm 2009 - Talend
DRI - Discovery/Reinvention/Integration/
 
PPTX
OpTier McKinsey Big Data Overview
nickychu
 
PPTX
McKinsey Big Data Overview
optier
 
PDF
Asug SAP HANA Presentation - Perceptive Technologies SAP
Brendan Kane
 
PPTX
McKinsey Big Data Overview
optier
 
Talend Open Studio and Hortonworks Data Platform
Hortonworks
 
The Comprehensive Approach: A Unified Information Architecture
Inside Analysis
 
Hortonworks roadshow
Accenture
 
Hadoop: What It Is and What It's Not
Inside Analysis
 
Powering Next Generation Data Architecture With Apache Hadoop
Hortonworks
 
Unified big data architecture
DataWorks Summit
 
The Evolution of Platforms - Drew Kurth and Matt Comstock
Razorfish
 
2012 06 hortonworks paris hug
Modern Data Stack France
 
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks
 
Break Through the Traditional Advertisement Services with Big Data and Apache...
Hortonworks
 
Enterprise Services Solutions
Karya Technologies
 
Integrating social media monitoring, analytics and engagment marshall sponde...
Marshall Sponder
 
4.4.2013 Software, System, & IT Architecture - Good Design is Good Business:...
IBM Rational
 
Intel Cloud summit: Big Data by Nick Knupffer
IntelAPAC
 
Monitoring analytics workshop marshall sponder for london - march 26th prese...
Marshall Sponder
 
Evento Sugar Crm 2009 - Talend
DRI - Discovery/Reinvention/Integration/
 
OpTier McKinsey Big Data Overview
nickychu
 
McKinsey Big Data Overview
optier
 
Asug SAP HANA Presentation - Perceptive Technologies SAP
Brendan Kane
 
McKinsey Big Data Overview
optier
 
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

Recently uploaded (20)

PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 

Tackling big data with hadoop and open source integration

  • 1. Tackling Big Data with Hadoop and Open Source Integration Ciaran Dynes Remy Dubois
  • 2. Agenda 1. Talend’s Goal: Democratizing Integration 2. What is Big Data (integration)? 3. Big Data for the Masses: Talend’s strategy and vision © Talend 2011 2
  • 4. Talend – The Market Leading Unified Integration Platform Talend Enterprise Data Data MDM ESB BPM Quality Integration ¾  Commercial license ¾  Subscription model Studio Repository Deployment Execution Monitoring ¾  Open source license Talend Open Studio for ¾  Free of charge ¾  Optional support Data Data Quality Integration MDM ESB Recognized as the open source leader in each of its market category by all industry analysts © Talend 2011 4
  • 5. Who uses Talend? A high adoption rate § 20 million downloads § 950,000 users § 3,500 customers 1 product download 150 new customers every 30 seconds per month © Talend 2011 5
  • 6. Trying to get from this… © Talend 2011 – Stri2y Private & Confidential © Talend 2011 6
  • 7. to this… Why Talend… ONLY Talend generates code that is executed within map reduce. This open approach removes the limitation of a proprietary “engine” to provide a truly unique and powerful set of tools for big data.
  • 8. Big data is…. Hans Rosling – uses big data to analyze world health trends Key Takeaway #1 transactions, interactions, observations © Talend 2011 – Stri2y Private & Confidential © Talend 2011 8
  • 9. Big Data = Transactions + Interactions + Observations Sensors/RFID/Devices User Generated Content Big Data Mega, Giga, Tera, Peta bytes Sentiment Social Interactions & Feeds Mobile Web Spatial & GPS coordinates User Clicks External Demographics Web logs WEB Business Data Feeds Offer history A/B testing Video, Audio, Images Dynamic pricing SMS/MMS CRM Segmentation Affiliate Networks Search Marketing ERP Offer details Purchase detail Customer Touchpoints Behavioral Targeting Purchase record Support Contacts Dynamic Funnels Payment record Increasing Data Variety and Complexity Source: Hortonworks © Talend 2011 – Stri2y Private & Confidential © Talend 2011 9
  • 10. What is Big Data integration?
  • 11. Traditional Data Flows CRM ETL Normalized Traditional Data ERP Data Data Warehouse Quality Finance •  Scheduled–daily or weekly, sometimes more frequently. Business Business Analyst User •  Volumes rarely exceed terabytes Warehouse Administrator Executives © Talend 2011 – Stri2y Private & Confidential © Talend 2011 11
  • 12. The new world of big data Social Networking CRM ERP Big Data Finance © Talend 2011 – Stri2y Private & Confidential © Talend 2011 12
  • 13. The new world of big data Social Networking CRM Mobile Devices ERP Big Data Finance © Talend 2011 – Stri2y Private & Confidential © Talend 2011 13
  • 14. The new world of big data Social Networking CRM Mobile Devices ERP Transactions Finance Big Data © Talend 2011 – Stri2y Private & Confidential © Talend 2011 14
  • 15. The new world of big data Social Networking CRM Mobile Devices ERP Transactions Finance Network Devices Big Data Sensors © Talend 2011 – Stri2y Private & Confidential © Talend 2011 15
  • 16. Key Takeaway #2 Forces us to think © Talend 2011 differently © Talend 2011 – Stri2y Private & Confidential 16
  • 17. But for Talend…. Big data is… …everything that is old, is new again! © Talend 2011 – Stri2y Private & Confidential © Talend 2011 17
  • 18. Data driven business enables data governance supports information decisions drives Information provides value to the business If you can't rely on your information then Your the result can be missed opportunities, or business higher costs. Matthew West and Julian Fowler (1999). Developing High Quality Data Models. The European Process Industries STEP Technical Liaison Executive (EPISTLE). © Talend 2011 – Stri2y Private & Confidential © Talend 2011 18
  • 19. BIG data driven business enables BIG data governance supports BIG BIG information decisions drives Information provides value to the business If you can't rely on your information then the result can be missed opportunities, or BIG higher costs. business Matthew West and Julian Fowler (1999). Developing High Quality Data Models. The European Process Industries STEP Technical Liaison Executive (EPISTLE). © Talend 2011 – Stri2y Private & Confidential © Talend 2011 19
  • 20. “Big Data for the Masses”
  • 21. Goal: Democratize Big Data Talend Open Studio for Big Data ¾  “Big Data for the Masses” ¾  Improves efficiency of big data job design with graphic interface ¾  Abstracts and generates code ¾  Run transforms inside Hadoop Pig ¾  Native support for HDFS, Pig, HBase, Sqoop and Hive ¾  Apache License 2.0 ¾  Embedded in Hortonworks Data …an open source Platform ecosystem © Talend 2011 – Stri2y Private & Confidential © Talend 2011 21
  • 22. Let us show you… © Talend 2012
  • 23. Where to next? © Talend 2012
  • 24. How is big data integration being used? Use Cases •  Recommendation Engine •  Sentiment Analysis •  Risk Modeling •  Fraud Detection •  Marketing Campaign Analysis •  Customer Churn Analysis •  Social Graph Analysis •  Customer Experience Analytics •  Network Monitoring •  Research And Development BUT: to what level is DQ required for your use case? © Talend 2011 – Stri2y Private & Confidential © Talend 2011 24
  • 25. Poor Data Quality + Big Data = Big Problems Poor Data Quality * Big Data = Big Problems^2 Key Takeaway #3 In big data… poor data quality can be magnified at huge scale © Talend 2011 25
  • 26. Two methods for inserting data quality into a big data job 1.  Pipelining: as part of the load process 2.  Load the cluster than implement and execute a data quality map reduce job © Talend 2011 26
  • 27. E-T-L - Load Extract – Transform © Talend 2011 – Stri2y Private & Confidential © Talend 2011 27
  • 28. E- DQ -L Extract – Improve/Cleanse - Load © Talend 2011 – Stri2y Private & Confidential © Talend 2011 28
  • 29. Pipelining: data quality with big data CRM DQ ERP DQ Finance Big Data Social Networking •  Use traditional data quality tools •  No new programming, no PHDs •  Once and done Mobile Devices © Talend 2011 – Stri2y Private & Confidential © Talend 2011 29
  • 30. Big data alternative: Load and improve within the cluster CRM DQ ERP DQ Finance Big Data Social Networking •  Load first, improve later •  Really complex to build, limited tools •  Constant on, increments Mobile Devices •  Insane performance © Talend 2011 – Stri2y Private & Confidential © Talend 2011 30
  • 31. big 2012 data now Q4 2013 Talend Open Studio for Big Data ¾ Packaged within Hortonworks Data Platform …Eclipse tools for HIVE, HDFS, PIG, SCOOP …supports Oozie, Hcatalog, Kerberos ¾ Free to download and use under the Apache license …democratizing big data through intuitive tools © Talend 2011 – Stri2y Private & Confidential © Talend 2011 31
  • 33. Sessions will resume at 11:25am Page 33