1
©2015 Talend Inc
Embedding Machine Learning for Actionable
Insight
Introducing Talend 6.1
2
Connecting the Data-Driven Enterprise
Data-Driven companies…
· 23 times greater customer acquisition
· 6 times greater customer retention
· 19 times more profitability
McKinsey’s DataMatics 2013 Survey - Using customer analytics to boost corporate performance
3
• Data-Driven Opportunities and
Challenges
• Introducing Talend 6.1
• Demo
• Next Steps
Agenda
4
Data
Explosion
44
Trillion
Gigabytes
Cloud
Success
$7B
AWS Growing
at 81%
80%
time fixing
data
Self-service
Data
1. 7th Annual IDC DigitalUniverse Study estimatesthe digitaluniverse will be 44 Zettabytesby 2020
2. Re:Invent 2015 Keynote - Andy Jassy
3. Recent report by Crowdflower found that data scientistsspend 80% of their time wrangling data.
What We Believe: Market Changes
5
INTERNET OF THINGS
Potential OpEx Savings
(15 Year Timeframe)
+276B
SMART ENERGY
(Oil & Gas)
1% Reduction in Capital
Expenditures
+90B
SMART UTILITIES
1% Fuel Savings
+66B
+30B
+27B
SMART HEALTHCARE
1% Improvement in
Operational Efficiency
SMART AVIATION
1% Fuel Savings
CONNECTED
TRANSPORTAION
1% Improvement in
Operational Efficiency
+63B
Source: http://www.ge.com/docs/chapters/Industrial_Internet.pdf
6
• Data Science
• Extracting information from large
volumes of data to improve decisions
• Machine Learning
• A form of artificial intelligence, where
computers can learn and act to make
decisions
• Examples:
• Fraud detection
• Predict machine failure
• Customer recommendations
The Importance of Data Science and Machine Learning
Benefit: Making data applications intelligent
7
• Acquiring skills to implement analytics
• Analysis is often on a sample of data
instead of all your data
• 67% state cleaning and organizing data is
the most time-consuming task
• It takes a long time to put a model into
production (months)
• Analytics models change frequently
requiring system updates
• Many tasks are unproductive hand-coding
New Challenges In Becoming Data-Driven
8
?
How Do You Turn Data Science Into Production?
• Use data science tools
• Program in R, Spark
• Acquire data, model,
analyze
• Create a prediction,
model, score
• Use data integration tools
• Little coding
• Continuous integration –
dev, test, deploy, maintain
• Deploy a prediction,
model, score
Need to move from deploying models in months to days
9
What if there was an
easier way to
operationalize analytics?
10
Faster
deployment of
advanced
analytics
Protect all your
data at the
speed of Spark
Expands your
big data
ecosystem
Improves
usability and
collaboration
Introducing Talend 6.1
Talend Continues Big Data Integration Innovation
#4 is a buc
rest of the
inc Talend
Manager (
want TMM
11
Benefits: Make decisions faster. Tremendous developer productivity.
• Visually develop jobs that run 100% on Spark
• 5X times faster using independent benchmarks
• 10X developer productivity gained over hand-
coding Spark
• Over 100 new drag-n-drop Spark components
• HDFS, RDBMS, NoSQL, Cloud Storage,
Transformation, Messaging, In-memory
analytics & machine learning
recommendations, and much more
• In-memory data caching & “windowed”
computations
• Click to enable Spark Streaming for real-time
data processing
Talend 6 Introduced Talend Real-time Big Data
1st Data Integration Platform on Spark
12
New Components to Operationalize Analytics
Talend 6.1 Continues Big Data Innovation
Entity Question Model Type Talend Components
(MLlib)
Customer
Buy / No Buy,
Fraud / Not Fraud
Classification Random Forest
Naïve Bayes (6.0)
Predict Churn,
Forecasting
Regression Logistic Regression
Gold Customers
(Segmentation)
Clustering K-means
Product Recommendation Collaborative
Filtering
Alternating Least
Squares (ALS) (6.0)
Yann, Isabe
review. Al
would we
“Encoding
data featu
changes in T
13
• Fast data masking performance running on
Hadoop and Spark
Benefit: Meet compliance mandates and prevent data breaches.
Data
Masking
More Secure Data – Now Runs on Spark
14
Building Intelligent Data Applications
Talend 6.1 Operationalizes Analytics
Data Integrate Learn Act Value
15
Building Intelligent Data Applications
Talend 6.1 Operationalizes Analytics
Data
Fuel
learning
Apply
learning
Integrate Learn Act Value
• Graphical tools simplify
development
• Continuous integration
speeds delivery
• Deploy on Spark and
Hadoop
• Batch and streaming
• Data cleansing and masking
• Machine learning
• Time-based analysis
• 900+ components
Benefit: Easily move data science into data processing applications
16
Real-Time Analytics Use Cases
Predictive Maintenance
Personalized
Patient Care Smart Cities
Product Recommendation Precision Farming
Dynamic Pricing
17
©2015 Talend Inc
Demo Time
18
1. Highlight Predictive Analytics scenario
2. Show Data Masking
3. Introduction to the Talend 6 Real-time Big Data Sandbox
Talend 6.1 Demonstration
19
DiscoverSparkwithTalendSandbox
Create a streaming
data flow
with Kafka
Create a
recommendation
model with Spark
MLlib
Create a real-time
Spark
recommendation
engine
<Sandbox link>
#TalendConnect
20
©2015 Talend Inc
Talend Real-Time Big Data Sandbox
Big Data Insights
21
Where do I get the Sandbox?
http://www.talend.com/products/real-time-big-data
22
Faster
deployment of
advanced
analytics
Protect all your
data at the
speed of Spark
Expands your
big data
ecosystem
Improves
usability and
collaboration
Introducing Talend 6.1
Talend Continues Big Data Integration Innovation
#4 is
rest
inc T
Man
wa
23
BIG DATA
Cloudera Navigator 2.3 certification
New and updated connectors
• Cassandra 2.2
• Cloudera 5.5
• Hortonworks 2.3
• MarkLogic 8
• Microsoft Azure HDInsight 3.2 on Spark
• MapR 5.0
DATA INTEGRATION AND DATA QUALITY
Git distributed revision control system
24
MDM
Graphical entity / relationship visualizer
Hierarchy search panel
ESB / DATA MAPPER
Continuous delivery for agile development
Routes and routelets improves reuse
XA transaction support for complex
transactions
Easier mapping of SAP Idoc
Application profiles for provisioning service
25
http://www.talend.com/products/real-time-big-data
Learn More
Free Spark Real-Time Big Data Sandbox!
https://info.talend.com/prodevaltpbdrealtimesandbox.html
Talend 6.0 Webinar-on-Demand
http://www.talend.com/resources/webinars/what’s-new-in-talend-6

More Related Content

PDF
Embracing Cloud Agility to Maximize Flexibility & Performance
PDF
Achieving Agility and Scale for Your Data Lake - Talend
PDF
Unleash the Power of Big Data and Machine Learning
PPTX
Mike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
PDF
Big Data Expo 2015 - Talend Delivering Real Time
PDF
5 Simple Steps to Unleash Big Data Talend Connect
PPTX
Who is Talend?
PPTX
Altis Webinar: Use Cases For The Modern Data Platform
Embracing Cloud Agility to Maximize Flexibility & Performance
Achieving Agility and Scale for Your Data Lake - Talend
Unleash the Power of Big Data and Machine Learning
Mike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
Big Data Expo 2015 - Talend Delivering Real Time
5 Simple Steps to Unleash Big Data Talend Connect
Who is Talend?
Altis Webinar: Use Cases For The Modern Data Platform

What's hot (20)

PDF
Modernizing to a Cloud Data Architecture
PDF
Overcoming DataOps hurdles for ML in Production
PDF
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
PDF
Talend winter 2017 overview webinar
PDF
5 Myths about Spark and Big Data by Nik Rouda
PDF
Dickey's Barbecue Pit Heats Up Analytics with Amazon Web Services
PDF
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
PPTX
Webinar: BI in the Sky - The New Rules of Cloud Analytics
PDF
Self -Service Data preparation for Data-Driven marketing
PPTX
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
PDF
Modern Data Platform Part 1: Data Ingestion
PPTX
Spark Summit Keynote by Shaun Connolly
PDF
Case study: Hadoop as ELT for Leading US Retailer - Happiest Minds
PDF
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
PPTX
2020 Big Data & Analytics Maturity Survey Results
PDF
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
PPTX
Rethink Analytics with an Enterprise Data Hub
PDF
Unlocking the value of your data assets with talend 6
PPTX
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
PDF
Webinar: Big Data Integration - Why Same Old, Same Old Won't Cut It
Modernizing to a Cloud Data Architecture
Overcoming DataOps hurdles for ML in Production
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
Talend winter 2017 overview webinar
5 Myths about Spark and Big Data by Nik Rouda
Dickey's Barbecue Pit Heats Up Analytics with Amazon Web Services
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Webinar: BI in the Sky - The New Rules of Cloud Analytics
Self -Service Data preparation for Data-Driven marketing
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
Modern Data Platform Part 1: Data Ingestion
Spark Summit Keynote by Shaun Connolly
Case study: Hadoop as ELT for Leading US Retailer - Happiest Minds
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
2020 Big Data & Analytics Maturity Survey Results
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Rethink Analytics with an Enterprise Data Hub
Unlocking the value of your data assets with talend 6
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
Webinar: Big Data Integration - Why Same Old, Same Old Won't Cut It
Ad

Viewers also liked (8)

PPTX
An Introduction to Talend Integration Cloud
PDF
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
PPSX
Intro to Talend Open Studio for Data Integration
PPTX
Talend Big Data Capabilities Overview
PDF
Open Source ETL using Talend Open Studio
PDF
Talend Open Studio Data Integration
PDF
ETL using Big Data Talend
PPTX
The Impala Cookbook
An Introduction to Talend Integration Cloud
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
Intro to Talend Open Studio for Data Integration
Talend Big Data Capabilities Overview
Open Source ETL using Talend Open Studio
Talend Open Studio Data Integration
ETL using Big Data Talend
The Impala Cookbook
Ad

Similar to Talend 6.1 - What's New in Talend? (20)

PDF
Delivering real time analytics in 1 click
PDF
Big data Analytics
PDF
3 джозеп курто превращаем вашу организацию в big data компанию
PPTX
OpenText Presents: Mastering the Digital Economy through Big Data and Custome...
PDF
How to Build a Successful Data Team - Florian Douetteau @ PAPIs Connect
PPTX
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
PDF
Talend Summer 16 launch présentation: Open Data Preparation for Everyone
PDF
How to make your data scientists happy
PDF
Talend introduction v1
PPTX
Data Analytics in Digital Transformation
PDF
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...
PDF
Big Data Trends and Challenges Report - Whitepaper
PDF
Thinkful DC - Intro to Data Science
PDF
DAMA - Innovations in DG Architecture and Analytics (online)
PDF
Intro to Data Science
PDF
Philip carnelley predictions big data
PDF
Career in Data Science (July 2017, DTLA)
PDF
SuanIct-Bigdata desktop-final
PPTX
Why Everything You Know About bigdata Is A Lie
PDF
Revolution in Business Analytics-Zika Virus Example
Delivering real time analytics in 1 click
Big data Analytics
3 джозеп курто превращаем вашу организацию в big data компанию
OpenText Presents: Mastering the Digital Economy through Big Data and Custome...
How to Build a Successful Data Team - Florian Douetteau @ PAPIs Connect
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
Talend Summer 16 launch présentation: Open Data Preparation for Everyone
How to make your data scientists happy
Talend introduction v1
Data Analytics in Digital Transformation
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...
Big Data Trends and Challenges Report - Whitepaper
Thinkful DC - Intro to Data Science
DAMA - Innovations in DG Architecture and Analytics (online)
Intro to Data Science
Philip carnelley predictions big data
Career in Data Science (July 2017, DTLA)
SuanIct-Bigdata desktop-final
Why Everything You Know About bigdata Is A Lie
Revolution in Business Analytics-Zika Virus Example

More from Talend (8)

PDF
Infographie - Délivrez des données fiables à la vitesse du Cloud
PDF
Infographic - Data Economics Are Broken
PDF
Talend Employee Spotlight - Kevin McMahon
PDF
Team Talend Employee Spotlight - Alyce Gershon
PDF
Talend Summer '17 Release: New Features and Tech Overview
PDF
VirtusaPolaris Corporate Fact Sheet
PDF
VirtusaPolaris’ Enterprise Information Management
PDF
#TeamTalend Spotlight - Cyril Amsellem
Infographie - Délivrez des données fiables à la vitesse du Cloud
Infographic - Data Economics Are Broken
Talend Employee Spotlight - Kevin McMahon
Team Talend Employee Spotlight - Alyce Gershon
Talend Summer '17 Release: New Features and Tech Overview
VirtusaPolaris Corporate Fact Sheet
VirtusaPolaris’ Enterprise Information Management
#TeamTalend Spotlight - Cyril Amsellem

Recently uploaded (20)

PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PPTX
Build Your First AI Agent with UiPath.pptx
PDF
Rapid Prototyping: A lecture on prototyping techniques for interface design
PDF
Lung cancer patients survival prediction using outlier detection and optimize...
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PDF
4 layer Arch & Reference Arch of IoT.pdf
PPTX
Microsoft User Copilot Training Slide Deck
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PDF
NewMind AI Weekly Chronicles – August ’25 Week IV
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
DOCX
search engine optimization ppt fir known well about this
PDF
Flame analysis and combustion estimation using large language and vision assi...
PDF
Auditboard EB SOX Playbook 2023 edition.
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PPTX
future_of_ai_comprehensive_20250822032121.pptx
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
sustainability-14-14877-v2.pddhzftheheeeee
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
Data Virtualization in Action: Scaling APIs and Apps with FME
Consumable AI The What, Why & How for Small Teams.pdf
Build Your First AI Agent with UiPath.pptx
Rapid Prototyping: A lecture on prototyping techniques for interface design
Lung cancer patients survival prediction using outlier detection and optimize...
The influence of sentiment analysis in enhancing early warning system model f...
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
4 layer Arch & Reference Arch of IoT.pdf
Microsoft User Copilot Training Slide Deck
Enhancing plagiarism detection using data pre-processing and machine learning...
NewMind AI Weekly Chronicles – August ’25 Week IV
Custom Battery Pack Design Considerations for Performance and Safety
search engine optimization ppt fir known well about this
Flame analysis and combustion estimation using large language and vision assi...
Auditboard EB SOX Playbook 2023 edition.
Convolutional neural network based encoder-decoder for efficient real-time ob...
future_of_ai_comprehensive_20250822032121.pptx

Talend 6.1 - What's New in Talend?

  • 1. 1 ©2015 Talend Inc Embedding Machine Learning for Actionable Insight Introducing Talend 6.1
  • 2. 2 Connecting the Data-Driven Enterprise Data-Driven companies… · 23 times greater customer acquisition · 6 times greater customer retention · 19 times more profitability McKinsey’s DataMatics 2013 Survey - Using customer analytics to boost corporate performance
  • 3. 3 • Data-Driven Opportunities and Challenges • Introducing Talend 6.1 • Demo • Next Steps Agenda
  • 4. 4 Data Explosion 44 Trillion Gigabytes Cloud Success $7B AWS Growing at 81% 80% time fixing data Self-service Data 1. 7th Annual IDC DigitalUniverse Study estimatesthe digitaluniverse will be 44 Zettabytesby 2020 2. Re:Invent 2015 Keynote - Andy Jassy 3. Recent report by Crowdflower found that data scientistsspend 80% of their time wrangling data. What We Believe: Market Changes
  • 5. 5 INTERNET OF THINGS Potential OpEx Savings (15 Year Timeframe) +276B SMART ENERGY (Oil & Gas) 1% Reduction in Capital Expenditures +90B SMART UTILITIES 1% Fuel Savings +66B +30B +27B SMART HEALTHCARE 1% Improvement in Operational Efficiency SMART AVIATION 1% Fuel Savings CONNECTED TRANSPORTAION 1% Improvement in Operational Efficiency +63B Source: http://www.ge.com/docs/chapters/Industrial_Internet.pdf
  • 6. 6 • Data Science • Extracting information from large volumes of data to improve decisions • Machine Learning • A form of artificial intelligence, where computers can learn and act to make decisions • Examples: • Fraud detection • Predict machine failure • Customer recommendations The Importance of Data Science and Machine Learning Benefit: Making data applications intelligent
  • 7. 7 • Acquiring skills to implement analytics • Analysis is often on a sample of data instead of all your data • 67% state cleaning and organizing data is the most time-consuming task • It takes a long time to put a model into production (months) • Analytics models change frequently requiring system updates • Many tasks are unproductive hand-coding New Challenges In Becoming Data-Driven
  • 8. 8 ? How Do You Turn Data Science Into Production? • Use data science tools • Program in R, Spark • Acquire data, model, analyze • Create a prediction, model, score • Use data integration tools • Little coding • Continuous integration – dev, test, deploy, maintain • Deploy a prediction, model, score Need to move from deploying models in months to days
  • 9. 9 What if there was an easier way to operationalize analytics?
  • 10. 10 Faster deployment of advanced analytics Protect all your data at the speed of Spark Expands your big data ecosystem Improves usability and collaboration Introducing Talend 6.1 Talend Continues Big Data Integration Innovation #4 is a buc rest of the inc Talend Manager ( want TMM
  • 11. 11 Benefits: Make decisions faster. Tremendous developer productivity. • Visually develop jobs that run 100% on Spark • 5X times faster using independent benchmarks • 10X developer productivity gained over hand- coding Spark • Over 100 new drag-n-drop Spark components • HDFS, RDBMS, NoSQL, Cloud Storage, Transformation, Messaging, In-memory analytics & machine learning recommendations, and much more • In-memory data caching & “windowed” computations • Click to enable Spark Streaming for real-time data processing Talend 6 Introduced Talend Real-time Big Data 1st Data Integration Platform on Spark
  • 12. 12 New Components to Operationalize Analytics Talend 6.1 Continues Big Data Innovation Entity Question Model Type Talend Components (MLlib) Customer Buy / No Buy, Fraud / Not Fraud Classification Random Forest Naïve Bayes (6.0) Predict Churn, Forecasting Regression Logistic Regression Gold Customers (Segmentation) Clustering K-means Product Recommendation Collaborative Filtering Alternating Least Squares (ALS) (6.0) Yann, Isabe review. Al would we “Encoding data featu changes in T
  • 13. 13 • Fast data masking performance running on Hadoop and Spark Benefit: Meet compliance mandates and prevent data breaches. Data Masking More Secure Data – Now Runs on Spark
  • 14. 14 Building Intelligent Data Applications Talend 6.1 Operationalizes Analytics Data Integrate Learn Act Value
  • 15. 15 Building Intelligent Data Applications Talend 6.1 Operationalizes Analytics Data Fuel learning Apply learning Integrate Learn Act Value • Graphical tools simplify development • Continuous integration speeds delivery • Deploy on Spark and Hadoop • Batch and streaming • Data cleansing and masking • Machine learning • Time-based analysis • 900+ components Benefit: Easily move data science into data processing applications
  • 16. 16 Real-Time Analytics Use Cases Predictive Maintenance Personalized Patient Care Smart Cities Product Recommendation Precision Farming Dynamic Pricing
  • 18. 18 1. Highlight Predictive Analytics scenario 2. Show Data Masking 3. Introduction to the Talend 6 Real-time Big Data Sandbox Talend 6.1 Demonstration
  • 19. 19 DiscoverSparkwithTalendSandbox Create a streaming data flow with Kafka Create a recommendation model with Spark MLlib Create a real-time Spark recommendation engine <Sandbox link> #TalendConnect
  • 20. 20 ©2015 Talend Inc Talend Real-Time Big Data Sandbox Big Data Insights
  • 21. 21 Where do I get the Sandbox? http://www.talend.com/products/real-time-big-data
  • 22. 22 Faster deployment of advanced analytics Protect all your data at the speed of Spark Expands your big data ecosystem Improves usability and collaboration Introducing Talend 6.1 Talend Continues Big Data Integration Innovation #4 is rest inc T Man wa
  • 23. 23 BIG DATA Cloudera Navigator 2.3 certification New and updated connectors • Cassandra 2.2 • Cloudera 5.5 • Hortonworks 2.3 • MarkLogic 8 • Microsoft Azure HDInsight 3.2 on Spark • MapR 5.0 DATA INTEGRATION AND DATA QUALITY Git distributed revision control system
  • 24. 24 MDM Graphical entity / relationship visualizer Hierarchy search panel ESB / DATA MAPPER Continuous delivery for agile development Routes and routelets improves reuse XA transaction support for complex transactions Easier mapping of SAP Idoc Application profiles for provisioning service
  • 25. 25 http://www.talend.com/products/real-time-big-data Learn More Free Spark Real-Time Big Data Sandbox! https://info.talend.com/prodevaltpbdrealtimesandbox.html Talend 6.0 Webinar-on-Demand http://www.talend.com/resources/webinars/what’s-new-in-talend-6

Editor's Notes

  • #2: Welcome to this webinar, Embedding Machine Learning for Actionable insight. …Introducing Talend 6.1 I am <speaker, title>, and today we have a very exciting session to show how easy it is to start moving your Data Science projects into production. Today, we are going to present the new features in Talend 6.1 and demonstrate how you can “drag-and-drop” your way using graphical tools and pre-built Spark machine learning components to operationalize and embed analytics into your systems, that will not only bring significant new insight into your business, but will enable you to Act on this insight in Real-Time.
  • #3: As a brief introduction, Talend is the leading open source integration software provider focused on enabling organizations to become data-driven enterprises A recent report from McKinsey Global Institute highlights the impact of making decisions based on data-driven insights. In the end, companies that are data-driven, that can gather, process and analyze data as it flows through the enterprise, make better decisions This approach results in a 23 times greater likelihood of customer acquisition, a 6 times greater likelihood of customer retention and a 19 times greater likelihood of profitability. Talend 6.1 delivers new capabilities that make companies even more data-driven, and able to turn all their data into decisions.
  • #4: Today we are going to talk about data-driven opportunities and both the organizational and technical challenges companies have in becoming data-driven. Next we will review the new capabilities that Talend 6.1 provides allowing you to easily get more out of your data than every before. And it is more than just moving data from point A to point B. Today companies need to create intelligent data applications that can sense and respond to opportunities and threats as the occur. Next we will do a demonstration showing some of the new machine learning and data masking capabilities that Talend 6.1 delivers. Finally, we will wrap up with some next steps that you can do today to get quick wins in your company, and be on the path to a truly data-driven enterprise.
  • #5: What we are seeing are three significant trends that are impacting every company today. First, the amount of data being generated is staggering. By 2020, IDC estimates that the digital universe will be at 44 Trillion Gigabytes .. The amount of data being created and consumed is doubling every 2 years Secondly, Cloud is becoming the new frontier for applications consisting of tens of thousands of applications. As an example Amazon recently reported record revenue for Amazon Web Services at $7B which is growing at 81% Year over Year. That is amazing. Finally, a recent Data Scientist and Analytics report by Crowdflower, found that data scientists spend 80% of their time wrangling data. With the data appetite that everyone in your company has to make better decisions, it is necessary that everyone can get to clean data and use it for their analysis. In summary, these 3 trends are having a profound impact on how companies manage their data today. Source 1. 7th Annual IDC Digital Universe Study estimates the digital universe will be 44 Zettabytes by 2020 2. Re:Invent 2015 Keynote - Andy Jassy 3. Recent report by Crowdflower found that data scientists spend 80% of their time wrangling data.
  • #6: And by having intelligent data applications, instead of ETL pipelines that just push data from point A to point B, you can get tremendous benefits. A recent GE Internet of Things research study shows that a modest 1% improvement in operational efficiency in key industries such as aviation, energy, transportation and logistics, and healthcare could drive a potential $276B in opex savings over the next 15 years. Today Talend customers include GE and Otto are achieving significant benefits through intelligent data applications. You need to ask yourself, what is your 1%. What if you could reduce operational expenses by 1%? Source: http://www.ge.com/docs/chapters/Industrial_Internet.pdf
  • #7: Data Science and Machine Learning are two topics that are getting a lot of interest today, as companies move from batch analytics to predictive analytics to prescriptive analytics. Data is life-blood of your business. You need to not only absorb as much as you can, but you need to analyze it, and act on it to make data-driven decisions. But what exactly is Data Science and Machine Learning? Data Science is about -Extracting information from large volumes of data to improve decisions -It is using data instead of your intuition. - It combines computer science, statistics, mathematics and domain expertise An example of data science would be finding the location for highest car theft in a city. San Francisco, did such a study and found it to be near parks, that had a number of access points, so they changed how they monitor for these events. Another example of data science is how the city of Portland,Oregon optimized traffic signals to eliminate 157,000 tons of CO2 emissions in 6 years – the equivalent of taking 30,000 cars off the road for a year. And Machine Learning - Is a form of artificial intelligence and data science, where computers can learn and act to make decisions - Examples: Fraud detection – or looking though terabytes of data to find patterns so you can not only stop fraud, but you can predict where it will occur Predict machine failure – as an example one of Talend’s customers GE can predict if wind turbines are going to fail Customer recommendations – such as product recommendations on a website based on past purchases And as we know today, there is a huge demand for data scientists. People who can create these models, so you make decisions based on data, NOT intuition.
  • #8: Amid all this opportunity is the operations of how you turn data into decisions. Or how do you move data science into reality. Think of the back office, or an assembly line. You have all these data sources, producing thousands and millions of events per day coming into your business, and you need to process it intelligently and quickly. This way you can tell for example if There is an opportunity to upsell a customer, or Predict if someone is going to abandon their shopping cart The challenge companies are finding however: Acquiring skills to implement analytics. Data scientists are expensive and hard to find Analysis is often on a sample of data instead of all your data… although the sample size may be good enough, there is inherent risks that it is not. Dirty data means dirty insight. 67% of respondents reported cleaning and organizing data is the most time-consuming task,. This stat is amazing … we want data scientists to model and analyze, not clean data. It takes a long time to put a model into production. A recent TDWI report states that for most companies it takes months to deploy a model into operational use. And once it is deployed. The analytics models can change frequently requiring deployed updates .. Otherwise Out-of-date models = out-of-date insight And finally, many tasks are unproductive hand-coding .. And that is both the data scientist and big data developer. <next slide> --------More ------------ The Four Things Data Scientists Wish You Knew Get The Most Out Of Your Data Science Investments by Brian Hopkins October 26, 2015 A study by McKinsey projects that “by 2018, the U.S. alone may face a 50 percent to 60 percent gap between supply and requisite demand of deep analytic talent.   Data scientists cost $200K/yr and up, or about approx. 50% more than a Java developer.   Sometimes, after the initial exploration phase, the work of a data scientist will be “productized,” or extended, hardened (i.e., made fault-tolerant), and tuned to become a production data processing application, which itself is a component of a business application. For example, the initial investigation of a data scientist might lead to the creation of a production recommender system that is integrated into a web application and used to generate product suggestions to users. Often it is a different person or team that leads the process of productizing the work of the data scientists, and that person is often an engineer.
  • #9: The question on everyone's mind is … How do I turn these insightful models into an assembly line of intelligent data that is sensing and responding in the moment? The Data Scientist will acquire data and build the model, but then the IT developer needs to deploy that model. How do I take the data scientist work and as Brian Hopkins from Forrester states, “deploy it in a way that prompts action” How do you move a model into something that is scalable, easy to maintain, access all your data, and acts in real-time based on what is happening, such a machine is about to fail. Today, we need data scientists and developers to collaborate more ---------- Supporting quote “The output of data science — a prediction, a model, or a score — is useless until the organization deploys it in a way that prompts action.” Brian Hopkins, Forrester
  • #10: What if there was an easier way to operationalize analytics? This is what Talend 6.1 enables….
  • #11: Well, we are very excited to introduce Talend 6.1, which continues Talend’s innovation in big data integration Talend 6.1 delivers new machine learning components based on Spark MLlib, so you can build intelligent data pipelines and do faster deployment of advanced analytics Easily move data models into production, supporting fast and frequent iterations Improves big data design and development collaboration between data scientists and developers We are not saying that developers can do all the work of the data scientist, what we are offering are tools to help developers build and deploy analytics models. Recently Gartner mentioned, that they forecast the role of a “Citizen data scientist” . .or someone who can do some of the functionality, freeing up the data scientist. Talend 6.1 provides new data masking capabilities on Spark, so you can make data private across the data lake and connected systems, reducing associated risks. It expands your big data connectivity options with support for the latest big data technologies from Cloudera, MapR, Hortonworks, Amazon, Microsoft and new partnerships with MarkLogic and ServiceNow Talend 6.1 also includes numbers enhancements such as Git support to help your development and operations team be much more productive supporting fast and frequent iterations of machine learning models. And new ESB and MDM capabilities that increase the productivity of your team. ------stop -------- Talend 6.1 support for Spark machine learning libraries (Mllib) and other cool new features, developers can easily move data science models into production, supporting fast and frequent iterations. With Talend 6.1, developers use pre-built components and drag-and-drop tools to build Spark analytics models for customer segmentation, forecasting, classification, regression analysis and more. recent study showed that the most time consuming task for data scientists is cleaning and organizing data. Well now, with Talend 6.1 support for Spark machine learning libraries (Mllib) and other cool new features, developers can easily move data science models into production, supporting fast and frequent iterations. With Talend 6.1, developers use pre-built components and drag-and-drop tools to build Spark analytics models for customer segmentation, forecasting, classification, regression analysis and more. Join our live webinar to learn more about Talend 6.1, including: Spark machine learning algorithms for advanced analytics Data masking on Spark Continuous delivery enhancements with Git New and updated big data components and connectors
  • #12: For those that might have missed our recent announcement, we just announced Talend Real-Time Big Data the 1st data integration platform on Spark. that provides native support for Spark, Spark Streaming, and Hadoop along with Enterprise Messaging capabilities like Kafka and Kinesis. Using the graphical Talend Studio you visually build integration jobs that generate native Spark code. That code can then be run inside a Hadoop platform like Cloudera, Hortonworks, MapR, or it can be run standalone – in fact our new Real-Time Big Data Sandbox runs standalone. Or it can run in the Cloud, e.g. it runs in Amazon EMR. Talend 6 with Spark is 5X faster over MapReduce using independent benchmarks, and as a developer you are 10 times more productive using Talend model-driven tooling instead of handcoding Spark. Talend 6 provides over 100 new drag-and-drop Spark components, so you can immediately connect to traditional data sources, Hadoop, NoSQL, Cloud storage. There are components for transformation, connecting to messaging systems like Apache Kafka and Amazon Kinesis, and machine learning components and caching components as well. Another important capability that comes with Spark and Talend 6 is running processes in-memory and doing window-based computations for time series analysis - which is extremely important for building intelligent data applications. So for example if you are ingesting a stream of data, you can set a time period to analyze that data stream for any changes.
  • #13: Talend 6.1 continues big data innovation with new components to operationalize analytics. When analytics are embedded into business systems, the end result is that analytics become more consumable, which means that more people can make use of analytics output. It also makes analytics actionable. Talend 6.1 adds Classification, Regression and Clustering components to our Talend 6 components, so you can “drag-and-drop” your way to operational analytics with easy iterative development. And best of all, Talend generates native Spark code so there is no handcoding and with Continuous Delivery integration, your development team is extremely productive for developing, deploying and maintaining analytics models. What this also means is that you can offload some analytics tasks to your development team. This table highlights what both Talend 6.0 and 6.1 deliver <step through each row in table ….> • Will an event happen in the future?• Classification of fraud, churn, propensity to buy • Question such as how much, or forecasting?• Regression: estimate/predict the potential outcome of actions • What are the different profiles in this population?• Clustering customers or sites by similar behavior • Which products are likely to be bought together?• Recommendation --------background information -------------------- 6.0 was shipped with 2 models for 2 families : Recommendation (tALSModel) Classification (tNaiveBayes) 6.1 extends it with a component to help featurization, 3 new models in Classification and added 1 model in the Clustering family Entirely based on Spark ML & ML lib to make profit of all the frameworks capabilities, especially from Spark 1.4+ What is Spark MLlib Spark comes with a library containing common machine learning (ML) functionality, called MLlib. MLlib provides multiple types of machine learning algorithms, including classification, regression, clustering, and collaborative filtering, as well as supporting functionality such as model evaluation and data import. It also provides some lower-level ML primitives, including a generic gradient descent optimization algorithm. All of these methods are designed to scale out across a cluster.
  • #14: In Talend 6.1, we extend Talend Data Masking features to run on Spark.. So now you can privatize the information in you data lake. This helps meet compliance mandates or privacy code of conducts, and protects data against abuses or breaches. This component (tDataMasking) obfuscates data (numbers, strings, dates, personally-identifiable information and so on) without impacting the rules that surrounds that data or allowing other users to see the data. This is very important for all companies, but in particular healthcare and finance companies where there is the sharing of sensitive data. Customers have been asking for this and we are delighted to introduce this.
  • #15: In summary, operationalizing and embedding analytics is about integrating actionable insights into systems and business processes used to make decisions. Streaming analytics is about applying statistical models, algorithms, or other analysis practices to data arriving continuously. By running predictive models on these flows, organizations can monitor events and processes and become more situationally aware, predictive and prescriptive ------- more -------- This enables you to spot trends and patterns, do correlations, and respond to anomalous events. They can also filter for relevance and enrich the quality of data flowing in real time with information from their other sources.
  • #16: And if we look at what Talend 6.1 delivers, it is all the tools you need to operationalize analytics. .. Read list
  • #17: And as a few examples of what Talend customers are doing today Predictive maintenance on wind turbines by (GE) (also airplanes by AirFrance) Product recommendation & dynamic pricing for retailers (Otto) Precision farming - Springg, an argricultural company in the Netherlands, measures soil and other info like weather, with sensors to recommend the right amount of fertilizers to optimize crop yields and reduce costs Personalized patient care by another Talend customer, where they are: collecting Fitbit info from patients that is then exchanged with their physician. The doctor then can adjust the patients medication based on activity levels and other physical, biological, and health tests. And finally , Talend customer m2ocity, a French telecom operator, is making cities Smarter. Talend Helps m2ocity Seamlessly Collect and Process Up to Four Million Pieces of Data per day from its network of millions of smart meters on water, gas meters and other sensors. They are able to provide this information to their clients for behavioral analysis and value added services. -----------more --------------- optical character recognition: categorize images of handwritten characters by the letters represented face detection: find faces in images (or indicate if a face is present) spam filtering: identify email messages as spam or non-spam topic spotting: categorize news articles (say) as to whether they are about politics, sports, entertainment, etc. spoken language understanding: within the context of a limited domain, determine the meaning of something uttered by a speaker to the extent that it can be classified into one of a fixed set of categories medical diagnosis: diagnose a patient as a sufferer or non-sufferer of some disease customer segmentation: predict, for instance, which customers will respond to a par- ticular promotion fraud detection: identify credit card transactions (for instance) which may be fraud- ulent in nature weather prediction: predict, for instance, whether or not it will rain tomorrow
  • #18: Now this is the Demonstration part of the webinar .. I would like to turn it over to Mark Balkenende
  • #19: Mark
  • #20: IN
  • #23: Thanks xxxx In Summary, We are very excited to introduce Talend 6.1, which continues Talend’s innovation in big data integration Talend 6.1 delivers new machine learning algorithms, so you can do operationalize analytics, and turn data science into production. Talend 6.1 enables you to graphical build analytics projects that include customer segmentation, forecasting, classification, regression analysis and more. The benefit is three fold: You can significantly increase your analytics projects, so your business is getting more insight You can deliver analytics projects and updates faster – so instead of updating models once a month and using out-of-data data, you are using the latest model and getting better insight and results. And finally, with Talend’s rich data integration and data quality functions, you can analyze and cleanse all your data at the speed of Spark. Talend 6.1 provides new data masking capabilities on Spark, so you can make data private across the data lake and connected systems It expands your big data connectivity options with support for the latest big data technologies and new partnerships with MarkLogic and ServiceNow And finally, Talend 6.1 includes numbers enhancements such as Git support to facilitate collaboration, ESB and MDM enhancements that increase the productivity of your team.
  • #24: And there are other numerous Talend 6.1 enhancements that we did not get a chance to talk about, but you can learn more about at www.talend.com
  • #26: Thank you for attending this on-demand webinar. To learn more about Talend Real-time Big Data, please visit http://www.talend.com/products/real-time-big-data And also from that URL or the one shown here, you can get your FREE Real-Time Big Data Sandbox, so you can be operationalization analytics in the next few hours! Finally, if you would like to learn more about Talend 6.0 cool new features including Spark and Internet of Things, there is an on-demand webinar located here.