SlideShare a Scribd company logo
Big Data and
The Informatica Platform
9/8/2015
David Ramirez
Senior Solution Architect
Oil and Gas Accounts
About Informatica
• Founded: 1993 INFA Nasdaq
• 2014 Revenue: $1.2b
• Partners: 450+
• Major SI, ISV, OEM and On-Demand
Leaders
• Customers: 5,000+
• > 70% of the Global 500
• Customers in 82 Countries
• Direct Presence in 26 Countries
• # 1 in Customer Loyalty Rankings (7
Years in a Row)
2
B2B Data Exchange
Informatica supports the
requirements of cross-organizational
data exchange, so users apply
familiar & trusted data integration
tools and techniques to the growing
practice of B2B data integration.
Cloud Data IntegrationEnterprise Data Integration
Complex Event Processing
Informatica received high praise for
its services from customers. For
deployments involving systems
monitoring use cases, Informatica
offers a five-day stand‐up of
RulePoint.
Ultra Messaging
In spite of the new entrants,
Informatica remains the market
leader in this highly demanding part
of the messaging market.
Data Quality Master Data Management
Application ILM
Proven Technology Leadership
3
Problem:
• Analytics teams spend most
of their time looking for and
preparing data not analyzing
it
• Impacts project delays, cost
overruns, missed
opportunities
Data Lake Solution
• A single place to manage the
supply and demand of data
• Converts raw big data into fit-
for-purpose, trusted, and
secure information
Intelligent Data Lake
Manage Supply & Demand of Data
80% of the work in big data projects
is data intelligence
“I spend more than half my time
integrating, cleansing, and
transforming data without doing
any actual analysis.”
“80% of the work in any data
project is in cleaning the data”
“70% of my value is an ability
to pull the data, 20% of my
value is using data-science…”
Sources: (1) DJ Patil, Data Jujitsu; (2-3) Kandel, et al. Enterprise Data Analysis and Visualization: An
Interview Study. IEEE Visual Analytics Science and Technology (VAST), 2012
First Pilot(s)
Data
Warehouse
Optimization
Data
Discovery
Real-Time
Operational
Intelligence
Lower operational
IT costs
Big Data
Analytics
Operationalize
Big Data
Insights
Predictive
Maintenance
Lower Total
Cost of Care
Customer
X/Up-Sell
Public Safety
Fraud
Detection
Machine
Device, Cloud
Documents
and Emails
Relational,
Mainframe
Social Media,
Web Logs
DrivenbyITDrivenbyBusiness
Lower Infrastructure Cost Added Business Value
What’s Hadoop?
Intelligent Data Lake
Intelligent Data Lake
Platform for Big Data Projects
Informatica knows the Data Lifecycle
Related Challenges
Source:- Gartner
Informatica
Platform
Data
Ingestion
Refinement
Mastery/
Delivery
Data
Security
Data
Retirement
• Data Quality
•Exception Management
• Any Platform, Appication
•Structured, Unstructured
•Any latency
• Master Data Management
• Data Integration Hub
• Data Archive
•Records Retention/Discovery
•Data Masking
Informatica Platform Overview
Relational
DB
.pdf,
email,
email
Dev
Test
Prod Archive
3. Analyze
1. Profile
2. Define
Targets
5. Monitor
4. Build
Rules
D
A
T
A
Q
U
A
L
I
T
Y
S
E
C
U
R
I
T
Y
E
T
L
M
D
M
MaterialsWellhead Customer
Customer
Customer
Wellhead
Wellhead
Materials
Materials
Databases
Unstructured
Data
Big Data
Cloud
Visualizations
Application Database Partner Data
SWIFT NACHA HIPAA …
Cloud Computing Unstructured
Data
Warehouse
Data
Migration
Test Data
Management
& Archiving
Master Data
Management
Data
Synchronization
B2B Data
Exchange
Data
Consolidation
The Informatica DI Platform
Comprehensive, Unified, Open and Economical platform
Data Sources Applications
Data
Warehouse
MDM /
PIM
Data Ingestion
Visualization
Data
Governance
Data Security
Archiving
Replication
Data Streaming
Change Data
Capture
Batch Load
Data
Virtualization
Event-Based
Processing
Data
Integration
Hub
Data
Integration &
Data Quality
Agile Analytics
Advanced
Analytics
Machine
Learning
Virtual Data
Machine
Data Management Data Delivery
Machine Device,
Cloud
Documents and
Emails
Relational, Mainframe
Social Media, Web
Logs
Mobile Apps
Visualization
& Analytics
Real-Time
Alerts
Batch Load
Pub / Sub
Data Service
Integrate &
Prepare
Loose Coupling &
Abstraction
11
Development
Agility
1
Logical Data Objects
PRODUCT …CUSTOMER ORDER
Jumpstart/Accelerate Projects
Data SourceData SourceData Source
1 Instant Business-IT
Collaboration with Analyst Tool 2 Profile to Discover Data
Patterns and Issues
3
4
Prototype and Validate
Results
Data Source
Fine-tune and Deploy
Desired Solution in Days
Business
IT
IT
Business
Business IT
Business
IT
Common
Repository
Entire Life Cycle Supported by PowerCenter Standard Edition 9.
13
Enterprise
Scalability
2
Scale-up As Your Needs Grow
14
IT
IT
IT
ITHigh
Availability
Pushdown
Optimization
Enterprise
Grid
Concurrent
Users
Partitioned
Data
IT
Included in PowerCenter Advanced Edition 9.6
15
Manage Metadata for Better Data Insights
Data
Lineage
Consolidated
Metadata Catalog
Federated
Business Glossary
Mainframe Flat FilesDatabase Data Modeling BI ToolsERP
Metadata
Repository
Custom
Metadata
Reports
3rd party BI
Metadata
Bookmarks
16
Common Biz Language Via Business Glossary
Provide a common
vocabulary of
business terms
Easily search for
glossary assets with
workflow
Manage
relationships with
other assets
Manage business
policies governing
the assets
Analyst
17
Operational
Confidence
3
Improve Operational Confidence
With Automated Testing and Monitoring
18
End-to-End Agility
Requirements
Gathering
Prototype
& Validate
Deploy
IT
IT
Business
IT
IT
Business
Satisfied
Business-IT
Collaboration
Develop
Business
IT
IT
Self
Service
Monitor
IT
Test
IT
Automate Data Validation Testing
Data Validation Testing Capability
Enterprise Data
PowerCenter
Execute
Tests
DVO Repository
& Warehouse
ReportsDatabase
Views
Id: name
name: string
Price: integer
Date in: date
Date out: date
Salary: float
V_Summary
Id: name
name: string
Price: integer
Date in: date
Date out: date
Salary: float
V_Tests
Id: name
name: string
Price: integer
Date in: date
Date out: date
Salary: float
V_Results
Define
Tests
DVO Clients
Write
Results
Data
Accessed
• Relational databases
• Flat files
• Mainframe data
• DW Appliances
• Cloud-based data
Proactively Monitor with PowerCenter 9.6
20
PowerCenter
WS Hub
Send Alerts to
Stakeholders
Environnent
Information
Get Operating System,
Database Statistics
PowerCenter
Repository Automated Monitoring
and Detection
(Source Feeds, Rules/Templates, Watchlists, Alerts)
Analyst
IT
IT Operations
Analyst
Configure / Build
Rules
1
2
4
Get PowerCenter
Statistics
Monitor PowerCenter
Operations3
1. Entire Informatica mapping
translated to optimal open source
project
2. Currently, MapReduce submitted to
Hadoop cluster.
3. Advanced mapping transformations
executed on Hadoop through User
Defined Functions using Vibe
MapReduce
UDF
Informatica on Hadoop
Informatica Execution on Hadoop Architecture
Flink
INFA’s Unified Platform = Strong Time-to-Value
“Informatica and Microsoft are so much more consistent than their competitors [because] the
platforms provided by these companies support transferable skills across projects more
flexibly than do their rivals.“
TCO – Informatica vs. Hand Coding
$8,500
$11,500
$0 $2,000 $4,000 $6,000 $8,000 $10,000 $12,000 $14,000
Informatica
Hand Coding
Average Costs (3-year TCO) per project per end point
2.4
1
2.4
0.7
5.3
1.2
2.7
0.8
0 2 4 6
Hand coding
Informatica
Master Data management
Data Warehousing
Data Migration
Application Integration
Informatica is Far More Productive than Hand Coding
Source: “ Comparative Costs and Uses for Data Integration Platforms”
Bloor Research, March 2014 24
Average Time to Develop by Project Type (Weeks)
Depending on the project hand coding can take more than 4 weeks longer to
develop!
• Demo – Data Profiling on Hadoop
https://www.youtube.com/watch?v=Nd6UfuteiTY
Big Data – Data Profiling on Hadoop
25

More Related Content

PDF
Oil & Gas Big Data use cases
elephantscale
 
PDF
Stream based Data Integration
Jeffrey T. Pollock
 
PPTX
Ten tools for ten big data areas 01 informatica
Will Du
 
PPTX
Rob Bearden Keynote Hadoop Summit San Jose
DataWorks Summit/Hadoop Summit
 
PPTX
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Precisely
 
PDF
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
EMC
 
PPT
Hadoop India Summit, Feb 2011 - Informatica
Sanjeev Kumar
 
PPTX
Capgemini Insights and Data
DataWorks Summit/Hadoop Summit
 
Oil & Gas Big Data use cases
elephantscale
 
Stream based Data Integration
Jeffrey T. Pollock
 
Ten tools for ten big data areas 01 informatica
Will Du
 
Rob Bearden Keynote Hadoop Summit San Jose
DataWorks Summit/Hadoop Summit
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Precisely
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
EMC
 
Hadoop India Summit, Feb 2011 - Informatica
Sanjeev Kumar
 
Capgemini Insights and Data
DataWorks Summit/Hadoop Summit
 

What's hot (20)

PPTX
Extending Data Lake using the Lambda Architecture June 2015
DataWorks Summit
 
PDF
Sprint's Data Modernization Journey
Hortonworks
 
PDF
Making Enterprise Big Data Small with Ease
Hortonworks
 
PDF
Hybrid Cloud Strategy for Big Data and Analytics
DataWorks Summit/Hadoop Summit
 
PPTX
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
DataWorks Summit
 
PPTX
Breakout: Operational Analytics with Hadoop
Cloudera, Inc.
 
PPTX
Data Offload for the Chief Data Officer – how to move data onto Hadoop withou...
DataWorks Summit
 
PPTX
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
DataWorks Summit
 
PDF
Dataguise hortonworks insurance_feb25
Hortonworks
 
PDF
One Slide Overview: ORCL Big Data Integration and Governance
Jeffrey T. Pollock
 
PDF
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Hortonworks
 
PDF
The Manulife Journey
DataWorks Summit
 
PPTX
Hortonworks Oracle Big Data Integration
Hortonworks
 
PDF
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Hortonworks
 
PPTX
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
DataWorks Summit/Hadoop Summit
 
PDF
Flash session -goldengate--lht1053-lon
Jeffrey T. Pollock
 
PPTX
Pouring the Foundation: Data Management in the Energy Industry
DataWorks Summit
 
PDF
Journey to Big Data: Main Issues, Solutions, Benefits
DataWorks Summit
 
PDF
Teradata Listener™: Radically Simplify Big Data Streaming
Teradata
 
PDF
Big Data at Oracle - Strata 2015 San Jose
Jeffrey T. Pollock
 
Extending Data Lake using the Lambda Architecture June 2015
DataWorks Summit
 
Sprint's Data Modernization Journey
Hortonworks
 
Making Enterprise Big Data Small with Ease
Hortonworks
 
Hybrid Cloud Strategy for Big Data and Analytics
DataWorks Summit/Hadoop Summit
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
DataWorks Summit
 
Breakout: Operational Analytics with Hadoop
Cloudera, Inc.
 
Data Offload for the Chief Data Officer – how to move data onto Hadoop withou...
DataWorks Summit
 
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
DataWorks Summit
 
Dataguise hortonworks insurance_feb25
Hortonworks
 
One Slide Overview: ORCL Big Data Integration and Governance
Jeffrey T. Pollock
 
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Hortonworks
 
The Manulife Journey
DataWorks Summit
 
Hortonworks Oracle Big Data Integration
Hortonworks
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Hortonworks
 
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
DataWorks Summit/Hadoop Summit
 
Flash session -goldengate--lht1053-lon
Jeffrey T. Pollock
 
Pouring the Foundation: Data Management in the Energy Industry
DataWorks Summit
 
Journey to Big Data: Main Issues, Solutions, Benefits
DataWorks Summit
 
Teradata Listener™: Radically Simplify Big Data Streaming
Teradata
 
Big Data at Oracle - Strata 2015 San Jose
Jeffrey T. Pollock
 
Ad

Viewers also liked (20)

PPTX
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Mark Kerzner
 
PPTX
Night owl by Boyd Meyer of PROS
Mark Kerzner
 
PPTX
Toorcamp 2016
Mark Kerzner
 
PDF
Cloudera search
Mark Kerzner
 
PDF
Witsml data processing with kafka and spark streaming
Mark Kerzner
 
PDF
Porting your hadoop app to horton works hdp
Mark Kerzner
 
PPTX
Introduction to pig
Ravi Mutyala
 
PDF
Zeta architecture -2015
MapR Technologies
 
PPTX
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Mark Kerzner
 
PPTX
Big data
gopichand naragam
 
PPT
Hadoop on ec2
Mark Kerzner
 
PDF
Launching your career in Big Data
Sujee Maniyam
 
PDF
Set up Hadoop Cluster on Amazon EC2
IMC Institute
 
PPTX
Hadoop Hadoop & Spark meetup - Altiscale
Mark Kerzner
 
PDF
Hadoop to spark_v2
elephantscale
 
PPTX
Intro to Apache Spark by Marco Vasquez
MapR Technologies
 
PPTX
Configuring Your First Hadoop Cluster On EC2
benjaminwootton
 
PPTX
SHMcloud vision
Mark Kerzner
 
PPT
Andy Jassy Illuminates Amazon Web Services
Michael Skok
 
PDF
Big Data in Oil and Gas: How to Tap Its Full Potential
Hitachi Vantara
 
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Mark Kerzner
 
Night owl by Boyd Meyer of PROS
Mark Kerzner
 
Toorcamp 2016
Mark Kerzner
 
Cloudera search
Mark Kerzner
 
Witsml data processing with kafka and spark streaming
Mark Kerzner
 
Porting your hadoop app to horton works hdp
Mark Kerzner
 
Introduction to pig
Ravi Mutyala
 
Zeta architecture -2015
MapR Technologies
 
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Mark Kerzner
 
Hadoop on ec2
Mark Kerzner
 
Launching your career in Big Data
Sujee Maniyam
 
Set up Hadoop Cluster on Amazon EC2
IMC Institute
 
Hadoop Hadoop & Spark meetup - Altiscale
Mark Kerzner
 
Hadoop to spark_v2
elephantscale
 
Intro to Apache Spark by Marco Vasquez
MapR Technologies
 
Configuring Your First Hadoop Cluster On EC2
benjaminwootton
 
SHMcloud vision
Mark Kerzner
 
Andy Jassy Illuminates Amazon Web Services
Michael Skok
 
Big Data in Oil and Gas: How to Tap Its Full Potential
Hitachi Vantara
 
Ad

Similar to Oil and gas big data edition (20)

PPTX
Meet the experts dwo bde vds v7
mmathipra
 
PPT
Informatica training by quontra solutions
QUONTRASOLUTIONS
 
PPTX
Informatica PowerCenter
Ramy Mahrous
 
PDF
Data & Analytic Innovations: 5 lessons from our customers
Nick Smith
 
PPTX
Accelerate ROI with infa marketplace
Tamara Striffler
 
PPTX
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Dataconomy Media
 
PDF
intelligent-data-lake_executive-brief
Lindy-Anne Botha
 
PPTX
The Hive Data Virtualization Introduction - Sanjay Krishnamurti, Chief Archit...
The Hive
 
PPTX
Informatica agile virtualization apr17 2012
sahatwilliams
 
PPTX
informatica training
Multisoft Virtual Academy
 
PDF
Getting Started with Informatica
Edureka!
 
PDF
ETL Using Informatica Power Center
Edureka!
 
PDF
Decision Ready Data: Power Your Analytics with Great Data
DLT Solutions
 
PPTX
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Yahoo Developer Network
 
PPTX
Informatica Products and Usage
BigClasses Com
 
PDF
Track B-1 建構新世代的智慧數據平台
Etu Solution
 
PDF
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Cynthia Saracco
 
PDF
Capgemini Data Warehouse Optimization Using Hadoop
Appfluent Technology
 
PDF
Capturing big value in big data
BSP Media Group
 
PPTX
Rick Mutsaers Informatica
BigDataExpo
 
Meet the experts dwo bde vds v7
mmathipra
 
Informatica training by quontra solutions
QUONTRASOLUTIONS
 
Informatica PowerCenter
Ramy Mahrous
 
Data & Analytic Innovations: 5 lessons from our customers
Nick Smith
 
Accelerate ROI with infa marketplace
Tamara Striffler
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Dataconomy Media
 
intelligent-data-lake_executive-brief
Lindy-Anne Botha
 
The Hive Data Virtualization Introduction - Sanjay Krishnamurti, Chief Archit...
The Hive
 
Informatica agile virtualization apr17 2012
sahatwilliams
 
informatica training
Multisoft Virtual Academy
 
Getting Started with Informatica
Edureka!
 
ETL Using Informatica Power Center
Edureka!
 
Decision Ready Data: Power Your Analytics with Great Data
DLT Solutions
 
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Yahoo Developer Network
 
Informatica Products and Usage
BigClasses Com
 
Track B-1 建構新世代的智慧數據平台
Etu Solution
 
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Cynthia Saracco
 
Capgemini Data Warehouse Optimization Using Hadoop
Appfluent Technology
 
Capturing big value in big data
BSP Media Group
 
Rick Mutsaers Informatica
BigDataExpo
 

More from Mark Kerzner (20)

PPTX
IBM Strategy for Spark
Mark Kerzner
 
PDF
Joe Witt presentation on Apache NiFi
Mark Kerzner
 
PPTX
FreeEed popcorn overview
Mark Kerzner
 
PPTX
FreeEed presentation
Mark Kerzner
 
PDF
Automated Hadoop Cluster Construction on EC2
Mark Kerzner
 
PPT
Open source e_discovery
Mark Kerzner
 
PPT
FreEed - Open Source eDiscovery
Mark Kerzner
 
PDF
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Mark Kerzner
 
PPS
Google Office in Zurich, Switzerland
Mark Kerzner
 
PPS
Fun art with fruit and vegetable
Mark Kerzner
 
PPS
Carnavale de Venice
Mark Kerzner
 
PPS
Holocaust Memorial Tato
Mark Kerzner
 
PPS
Yehuda Pen
Mark Kerzner
 
PPS
Mark Chagall
Mark Kerzner
 
PPS
Thailand Visite
Mark Kerzner
 
PPS
Venice views with music
Mark Kerzner
 
PPS
Jean Beraud Paris
Mark Kerzner
 
PPS
Cities of the world
Mark Kerzner
 
PPS
Great Views of Nature
Mark Kerzner
 
PPS
Jewish Painters
Mark Kerzner
 
IBM Strategy for Spark
Mark Kerzner
 
Joe Witt presentation on Apache NiFi
Mark Kerzner
 
FreeEed popcorn overview
Mark Kerzner
 
FreeEed presentation
Mark Kerzner
 
Automated Hadoop Cluster Construction on EC2
Mark Kerzner
 
Open source e_discovery
Mark Kerzner
 
FreEed - Open Source eDiscovery
Mark Kerzner
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Mark Kerzner
 
Google Office in Zurich, Switzerland
Mark Kerzner
 
Fun art with fruit and vegetable
Mark Kerzner
 
Carnavale de Venice
Mark Kerzner
 
Holocaust Memorial Tato
Mark Kerzner
 
Yehuda Pen
Mark Kerzner
 
Mark Chagall
Mark Kerzner
 
Thailand Visite
Mark Kerzner
 
Venice views with music
Mark Kerzner
 
Jean Beraud Paris
Mark Kerzner
 
Cities of the world
Mark Kerzner
 
Great Views of Nature
Mark Kerzner
 
Jewish Painters
Mark Kerzner
 

Recently uploaded (20)

PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
Software Development Methodologies in 2025
KodekX
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 

Oil and gas big data edition

  • 1. Big Data and The Informatica Platform 9/8/2015 David Ramirez Senior Solution Architect Oil and Gas Accounts
  • 2. About Informatica • Founded: 1993 INFA Nasdaq • 2014 Revenue: $1.2b • Partners: 450+ • Major SI, ISV, OEM and On-Demand Leaders • Customers: 5,000+ • > 70% of the Global 500 • Customers in 82 Countries • Direct Presence in 26 Countries • # 1 in Customer Loyalty Rankings (7 Years in a Row) 2
  • 3. B2B Data Exchange Informatica supports the requirements of cross-organizational data exchange, so users apply familiar & trusted data integration tools and techniques to the growing practice of B2B data integration. Cloud Data IntegrationEnterprise Data Integration Complex Event Processing Informatica received high praise for its services from customers. For deployments involving systems monitoring use cases, Informatica offers a five-day stand‐up of RulePoint. Ultra Messaging In spite of the new entrants, Informatica remains the market leader in this highly demanding part of the messaging market. Data Quality Master Data Management Application ILM Proven Technology Leadership 3
  • 4. Problem: • Analytics teams spend most of their time looking for and preparing data not analyzing it • Impacts project delays, cost overruns, missed opportunities Data Lake Solution • A single place to manage the supply and demand of data • Converts raw big data into fit- for-purpose, trusted, and secure information Intelligent Data Lake Manage Supply & Demand of Data
  • 5. 80% of the work in big data projects is data intelligence “I spend more than half my time integrating, cleansing, and transforming data without doing any actual analysis.” “80% of the work in any data project is in cleaning the data” “70% of my value is an ability to pull the data, 20% of my value is using data-science…” Sources: (1) DJ Patil, Data Jujitsu; (2-3) Kandel, et al. Enterprise Data Analysis and Visualization: An Interview Study. IEEE Visual Analytics Science and Technology (VAST), 2012
  • 6. First Pilot(s) Data Warehouse Optimization Data Discovery Real-Time Operational Intelligence Lower operational IT costs Big Data Analytics Operationalize Big Data Insights Predictive Maintenance Lower Total Cost of Care Customer X/Up-Sell Public Safety Fraud Detection Machine Device, Cloud Documents and Emails Relational, Mainframe Social Media, Web Logs DrivenbyITDrivenbyBusiness Lower Infrastructure Cost Added Business Value What’s Hadoop? Intelligent Data Lake Intelligent Data Lake Platform for Big Data Projects
  • 7. Informatica knows the Data Lifecycle Related Challenges Source:- Gartner Informatica Platform Data Ingestion Refinement Mastery/ Delivery Data Security Data Retirement • Data Quality •Exception Management • Any Platform, Appication •Structured, Unstructured •Any latency • Master Data Management • Data Integration Hub • Data Archive •Records Retention/Discovery •Data Masking
  • 8. Informatica Platform Overview Relational DB .pdf, email, email Dev Test Prod Archive 3. Analyze 1. Profile 2. Define Targets 5. Monitor 4. Build Rules D A T A Q U A L I T Y S E C U R I T Y E T L M D M MaterialsWellhead Customer Customer Customer Wellhead Wellhead Materials Materials Databases Unstructured Data Big Data Cloud Visualizations
  • 9. Application Database Partner Data SWIFT NACHA HIPAA … Cloud Computing Unstructured Data Warehouse Data Migration Test Data Management & Archiving Master Data Management Data Synchronization B2B Data Exchange Data Consolidation The Informatica DI Platform Comprehensive, Unified, Open and Economical platform
  • 10. Data Sources Applications Data Warehouse MDM / PIM Data Ingestion Visualization Data Governance Data Security Archiving Replication Data Streaming Change Data Capture Batch Load Data Virtualization Event-Based Processing Data Integration Hub Data Integration & Data Quality Agile Analytics Advanced Analytics Machine Learning Virtual Data Machine Data Management Data Delivery Machine Device, Cloud Documents and Emails Relational, Mainframe Social Media, Web Logs Mobile Apps Visualization & Analytics Real-Time Alerts Batch Load Pub / Sub Data Service Integrate & Prepare Loose Coupling & Abstraction
  • 12. Logical Data Objects PRODUCT …CUSTOMER ORDER Jumpstart/Accelerate Projects Data SourceData SourceData Source 1 Instant Business-IT Collaboration with Analyst Tool 2 Profile to Discover Data Patterns and Issues 3 4 Prototype and Validate Results Data Source Fine-tune and Deploy Desired Solution in Days Business IT IT Business Business IT Business IT Common Repository Entire Life Cycle Supported by PowerCenter Standard Edition 9.
  • 14. Scale-up As Your Needs Grow 14 IT IT IT ITHigh Availability Pushdown Optimization Enterprise Grid Concurrent Users Partitioned Data IT Included in PowerCenter Advanced Edition 9.6
  • 15. 15 Manage Metadata for Better Data Insights Data Lineage Consolidated Metadata Catalog Federated Business Glossary Mainframe Flat FilesDatabase Data Modeling BI ToolsERP Metadata Repository Custom Metadata Reports 3rd party BI Metadata Bookmarks
  • 16. 16 Common Biz Language Via Business Glossary Provide a common vocabulary of business terms Easily search for glossary assets with workflow Manage relationships with other assets Manage business policies governing the assets Analyst
  • 18. Improve Operational Confidence With Automated Testing and Monitoring 18 End-to-End Agility Requirements Gathering Prototype & Validate Deploy IT IT Business IT IT Business Satisfied Business-IT Collaboration Develop Business IT IT Self Service Monitor IT Test IT
  • 19. Automate Data Validation Testing Data Validation Testing Capability Enterprise Data PowerCenter Execute Tests DVO Repository & Warehouse ReportsDatabase Views Id: name name: string Price: integer Date in: date Date out: date Salary: float V_Summary Id: name name: string Price: integer Date in: date Date out: date Salary: float V_Tests Id: name name: string Price: integer Date in: date Date out: date Salary: float V_Results Define Tests DVO Clients Write Results Data Accessed • Relational databases • Flat files • Mainframe data • DW Appliances • Cloud-based data
  • 20. Proactively Monitor with PowerCenter 9.6 20 PowerCenter WS Hub Send Alerts to Stakeholders Environnent Information Get Operating System, Database Statistics PowerCenter Repository Automated Monitoring and Detection (Source Feeds, Rules/Templates, Watchlists, Alerts) Analyst IT IT Operations Analyst Configure / Build Rules 1 2 4 Get PowerCenter Statistics Monitor PowerCenter Operations3
  • 21. 1. Entire Informatica mapping translated to optimal open source project 2. Currently, MapReduce submitted to Hadoop cluster. 3. Advanced mapping transformations executed on Hadoop through User Defined Functions using Vibe MapReduce UDF Informatica on Hadoop Informatica Execution on Hadoop Architecture Flink
  • 22. INFA’s Unified Platform = Strong Time-to-Value “Informatica and Microsoft are so much more consistent than their competitors [because] the platforms provided by these companies support transferable skills across projects more flexibly than do their rivals.“
  • 23. TCO – Informatica vs. Hand Coding $8,500 $11,500 $0 $2,000 $4,000 $6,000 $8,000 $10,000 $12,000 $14,000 Informatica Hand Coding Average Costs (3-year TCO) per project per end point
  • 24. 2.4 1 2.4 0.7 5.3 1.2 2.7 0.8 0 2 4 6 Hand coding Informatica Master Data management Data Warehousing Data Migration Application Integration Informatica is Far More Productive than Hand Coding Source: “ Comparative Costs and Uses for Data Integration Platforms” Bloor Research, March 2014 24 Average Time to Develop by Project Type (Weeks) Depending on the project hand coding can take more than 4 weeks longer to develop!
  • 25. • Demo – Data Profiling on Hadoop https://www.youtube.com/watch?v=Nd6UfuteiTY Big Data – Data Profiling on Hadoop 25