SlideShare a Scribd company logo
Big Data Analytics – Scaling R to Enterprise Data
useR! 2013 – Albacete Spain #useR2013
Luis Campos

Mark Hornick

Big Data Solutions Lead, Oracle EMEA
@luigicampos
1

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Director, Oracle Database Advanced Analytics
@MarkHornick

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
2

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
The girl with all the questions!
“The real innovation here is that we can
and get the

ask questions

answer back before we have forgotten

why

we asked the question in the first

place

.”

– Hilary Mason, Chief Scientist Bit.ly
+ member of NYC Mayor Bloomberg’s Technology and Innovation Advisory Council

3

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Nexus of Forces, Platform 3.0, Four Pillars
What Analysts/groups are saying?

4

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
New Information Challenges
Data Explosion
A Decade of Digital Universe Growth: Storage in Exabytes (Source:
IDC’s Digital Universe Study, June 2011)

Combinatory Explosion
Dimension Explosion
5

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Big Data Solution = Data + Analytics + Tools
Source: McKinsey study “Big data: What’s your plan?” (March 2013)
http://www.mckinsey.com/insights/business_technology/big_data_whats_your_plan

DATA
Any Data,
Any Source

6

ANALYTICS
Out-of-the box
Analytics,
New Models

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

TOOLS
Self Service
Data Discovery

On Premise,
On Cloud,
On Mobile
Oracle Complete Business Analytics Solution

BIG DATA
APPLIANCE
BIG DATA
CONNECTORS
NoSQL DB

7

Oracle Advanced
DATA MINING
Analytics
ORACLE R Ent.

SPATIAL,GRAPH
Real Time
Decisions (RTD)

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

OBIEE
ENDECA
Collective
Intellect (CI)

On Premise,
Oracle Cloud,
On Mobile
Apply Advanced Analytics on All Data
Visualise it with any BI Tool

Hadoop
Relational

HDFS

Data

BI Tools

8

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Oracle R Advantages
1. Keep the R tools
2. Keep the data where it sits (Relational or HDFS)
3. Keep the SQL Based BI Tools
4. Scale to LARGE data sets
R workspace console

Function push-down
– data transformation &

Oracle statistics engine

statistics

Development

9

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Production

OBIEE, Web
Services

Consumption
Oracle’s Advanced Analytics Strategic Offerings
Deliver enterprise-level advanced analytics in the Database
 Oracle in-Database Data Mining algorithms
– Access through Free GUI from SQL Developer or programmatically from SQL,
PL/SQL, R or Java
– Predictive model APIs for the Oracle R Enterprise
– Exadata architecture advantages for up to 5x improvement with Smart Scan
 Oracle R Distribution
– Free download, pre-installed on Oracle Big Data Appliance, bundled with Oracle
Linux
– Enhanced linear algebra performance: Intel’s Math Kernel Library, AMD’s Core Math
Library (Windows and Linux), SUN Solaris and IBM AIX
– Enterprise support for customers of Oracle Advanced Analytics, Big Data Appliance,
and Oracle Linux

10

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle’s Advanced Analytics Strategic Offerings
Deliver enterprise-level R in the Database or Hadoop
 Oracle R Enterprise
– Transparent access to database-resident data from R
– Embedded R script execution through database managed R engines
– Statistics engine
– Enhanced support for high-speed Exadata scoring
 Oracle R Connector for Hadoop [ORCH] (Part of Oracle Big Data Connectors)
– R interface to Oracle Hadoop Cluster on BDA and non-Oracle Hadoop clusters
– Access and manipulate data in HDFS, database, and file system
– Write MapReduce functions using R and execute through natural R interface
– Predictive models with execution in-Cluster against Hadoop-stored data

11

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle R Components
Component layout

Analyst Laptop
Oracle Database
Oracle R Distribution
Oracle R Enterprise
Server Components

Oracle R Distribution
Oracle R Connector
for Hadoop Client
Oracle R Enterprise
Client Packages

Optional with ORCH

12

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Oracle R Distribution

Oracle R Connector
for Hadoop
Oracle R Enterprise
Client Packages

Big Data Appliance

Oracle R Enterprise
Client Packages

Exadata
Knowledge Exploitation Process
Typical stages in a Big Data Project
Business
Understanding

Deployment

Data
Scientist

Data
Selection

Evaluation

Discovery

Model
Building

13

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Data
Preparation

13
Data Loading with Oracle R Enterprise
Business
Understanding

Deployment

Data
Scientist

Data
Selection

library(ORE)
R> df <- data.frame(A=1:26,
B=letters[1:26])
R> dim(df)
[1] 26 2
R> class(df)
[1] "data.frame"

R> ore.create(df, table="DF_TABLE")
Evaluation

Discovery

Model
Building

16

Data
Preparation

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

R> ore.ls()
[1] "DF_TABLE"
R> class(DF_TABLE)
[1] "ore.frame" attr(,"package")
[1] "OREbase"
R> dim(DF_TABLE)
[1] 26 2

16
Discovery with Oracle R in-DB and HDFS
Business
Understanding

Deployment

Data
Scientist

Discovery

Evaluation

Model
Building

17

Data
Selection

Data
Preparation

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

library(ORE)
ore.ls() # list tables in DB
class(MY_TABLE) # ore.frame
dim(MY_TABLE)
# overloaded R functions
head(MY_TABLE)
sample(MY_TABLE)
summary(MY_TABLE)
library(ORCH)
hdfs.ls()
hdfs.dim("myHDFSdata")
hdfs.head("myHDFSdata")
hdfs.sample("myHDFSdata")
hdfs.toHive("myHDFSdata",
tablename="my_hive_data")
summary(my_hive_data)

17
Data Prep with Oracle R in-DB and HDFS
library(ORE) / library(ORCH)
# join
merge (MY_TABLE1, MY_TABLE2,by.x="x1", by.y="x2")

Business
Understanding

Deployment

Data
Scientist

Data
Selection

# project columns
df <- MY_TABLE[,c("X","Y","Z")]

# filter rows
df <- df[df$Z<=4.3 | df$A=="B",1:3]
Evaluation

Discovery

Model
Building

18

Data
Preparation

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

#binning
IRIS_TAB <- ore.push(iris[1:4])
IRIS_TAB$PetalBins =
ifelse(IRIS_TAB$Petal.Length < 2.0, "SMALL
PETALS",
ifelse(IRIS_TAB$Petal.Length < 4.0, "MEDIUM
PETALS", "LARGE PETALS"))

18
“Densifying” data: custom MapReduce jobs
Count occurrence of hash tags in tweets per customer for select tags
mapHashTags <- function (k,v) {
x <- strsplit(v$text, " ")
x <- x[x!='']
importantTags <- tolower(importantTags)
for(twt in 1:length(x)) {
for(tag in x[[twt]]) {
if(substr(tag,1,1) == "#") {
tagL <- tolower(tag)
if(tagL %in% importantTags) {
orch.keyval(v[twt,"screenName"],tagL)
}}}}}
reduceHashTags <- function(k,vals) { # k = screenName, vals = vector(tags)
importantTags <- tolower(importantTags)
vals <- factor(vals$val,levels=importantTags)
x <- as.data.frame(t(as.matrix(table(vals))))
orch.keyval(k,x) # k = screenName, x = df(importantTags as cols) with counts
}

19

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

19
ORCH: Create your own MapReduce jobs
Count occurrence of hash tags in tweets per customer for select tags
importantTags <- c("#bigdata","#database","#oracle","#sql")
tag.summary <- hadoop.exec(tweets.id,
mapper=mapHashTags,
reducer=reduceHashTags,
export=orch.export(importantTags=importantTags),
config=new("mapred.config",
job.name
= "TwitterScreenNameHashTags",
reduce.tasks = 5,
map.output
= data.frame(key='a', val='a'),
reduce.output = data.frame(key='a', bigdata=0,
database=0 ,oracle=0, sql=0)))
hdfs.get(tag.summary)
> hdfs.get(tag.summary)
key bigdata

database oracle

sql

1

4

7

37

91

2

twitter.user.2

15

19

1

32

3

twitter.user.3

104

57

8

0

4

20

twitter.user.1

twitter.user.4

0

64

549

0

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

20
Modelling with Oracle R in-DB and HDFS
# Clustering with ORE
Business
Understanding

Deployment

Data
Scientist

Data
Selection

X <- ore.push (data.frame(x))
km.mod1 <ore.odmKMeans(~., X, num.centers=2,
num.bins=5)
summary(km.mod1)
rules(km.mod1)
clusterhists(km.mod1)

# Regression with ORCH
Discovery

Evaluation

Model
Building

21

Data
Preparation

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

mod.lm <- orch.lm(myFormula, myData,
nReducers = 2)
summary(mod.lm)
pred <- predict.orch.lm(mod.lm, newdata =
myData)
res.pred <- hdfs.get(pred)
head(res.pred)

21
In-database performance advantage
R lm vs. ORE ore.lm
Data: 500k to 1.5m records, 3 predictors
Performance: 2x-3x improvement for build, 4x improvement for scoring

22

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

22
In-database performance advantage – lm

More tests at http://blogs.oracle.com/R/entry/oracle_r_enterprise_1_32
23

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

23
Deploying with Oracle R Enterprise
Load R scripts into ORE script repository
Invoke R scripts by name from SQL

Business
Understanding

Production
Deploy
ment

Data
Scientist

Data
Selection

Discovery

Evaluation

Model
Building

24

Data
Preparation

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Store R objects directly in Oracle
Database (no separate files)

Optional return values:
• Data frame consumable by any SQL-ready
application
• XML containing structured data, complex
R objects, PNG images
• PNG table with BLOB column containing
images for immediate consumption
Schedule for automatic execution

24
Oracle Advanced Analytics: Embedded R Execution
SQL interface rqEval – generate XML string for graphic output
Oracle PL/SQL
begin

sys.rqScriptCreate('Example6',
'function(){

res <- 1:10

Oracle BI Publisher

plot( 1:100, rnorm(100), pch = 21,
bg = "red", cex = 2 )

R Language

res
}');
end;
/
Oracle SQL

select value
from

25

table(rqEval(NULL,'XML','Example6'));

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Summary
Oracle R Enterprise (ORE)

Oracle R Connector for Hadoop (ORCH)

• A comprehensive, database-centric
environment for end-to-end analytical
processes in R with immediate deployment
to production environments
• Wide range of in-database advanced
analytics algorithms exposed through R
• Eliminate R client memory limits

• A collection of R packages enabling Big Data
analytics from an R environment
• Allows R users to leverage a Hadoop Cluster
with HDFS and MapReduce from R
• Prepackaged advanced analytics algorithms
• Transparent manipulation of HIVE data

• Enable R users to conduct Big Data projects from R
• Eliminate client R engine memory barrier
• Scale to large data sets
• Deploy R-based solutions without translation to other
languages or environments
26

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

26
Resources
• Blog:

http://www.oracle.com/goto/R

https://blogs.oracle.com/R/

• Forum: https://forums.oracle.com/forums/forum.jspa?forumID=1397
• Oracle R Distribution:
http://www.oracle.com/technetwork/indexes/downloads/r-distribution-1532464.html
• ROracle:
http://cran.r-project.org/web/packages/ROracle
• Oracle R Enterprise:
http://www.oracle.com/technetwork/database/options/advanced-analytics/r-enterprise
• Oracle R Connector for Hadoop:
http://www.oracle.com/us/products/database/big-data-connectors/overview

27

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

27
28

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

28

More Related Content

What's hot (20)

PDF
Talend For Big Data : Secret Key to Hadoop
Edureka!
 
PPTX
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Cloudera, Inc.
 
PPTX
Big Data/Hadoop Option Analysis
zafarali1981
 
ODP
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Roland Bouman
 
PDF
Apache Hadoop Crash Course
DataWorks Summit
 
PDF
PGQL: A Language for Graphs
Jean Ihm
 
PDF
Build Knowledge Graphs with Oracle RDF to Extract More Value from Your Data
Jean Ihm
 
PDF
Teradata - Presentation at Hortonworks Booth - Strata 2014
Hortonworks
 
PPTX
Innovate Analytics with Oracle Data Mining & Oracle R
Capgemini
 
PDF
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...
Jean Ihm
 
DOCX
Hedrich_Michael_Resume_NT
Michael Hedrich
 
PDF
Oracle Unified Information Architeture + Analytics by Example
Harald Erb
 
PDF
Oracle Spatial Studio: Fast and Easy Spatial Analytics and Maps
Jean Ihm
 
ODP
Pentaho Data Integration Introduction
mattcasters
 
PDF
An Introduction to Graph: Database, Analytics, and Cloud Services
Jean Ihm
 
PPTX
HDP Next: Governance
DataWorks Summit
 
PPTX
When Graphs Meet Machine Learning
Jean Ihm
 
PDF
How To Visualize Graphs
Jean Ihm
 
PPTX
Aster getting started
Ahsan Nabi Khan
 
Talend For Big Data : Secret Key to Hadoop
Edureka!
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Cloudera, Inc.
 
Big Data/Hadoop Option Analysis
zafarali1981
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Roland Bouman
 
Apache Hadoop Crash Course
DataWorks Summit
 
PGQL: A Language for Graphs
Jean Ihm
 
Build Knowledge Graphs with Oracle RDF to Extract More Value from Your Data
Jean Ihm
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Hortonworks
 
Innovate Analytics with Oracle Data Mining & Oracle R
Capgemini
 
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...
Jean Ihm
 
Hedrich_Michael_Resume_NT
Michael Hedrich
 
Oracle Unified Information Architeture + Analytics by Example
Harald Erb
 
Oracle Spatial Studio: Fast and Easy Spatial Analytics and Maps
Jean Ihm
 
Pentaho Data Integration Introduction
mattcasters
 
An Introduction to Graph: Database, Analytics, and Cloud Services
Jean Ihm
 
HDP Next: Governance
DataWorks Summit
 
When Graphs Meet Machine Learning
Jean Ihm
 
How To Visualize Graphs
Jean Ihm
 
Aster getting started
Ahsan Nabi Khan
 

Similar to User 2013-oracle-big-data-analytics-1971985 (20)

PDF
Microsoft Data Science Technologies 201608
Mark Tabladillo
 
PPTX
Big data oracle_introduccion
Fran Navarro
 
DOCX
LT Infotech_Amit_Kurani_10621681_CV
Amit Kurani
 
PDF
Microsoft Technologies for Data Science 201612
Mark Tabladillo
 
PPTX
Expand a Data warehouse with Hadoop and Big Data
jdijcks
 
PPTX
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Revolution Analytics
 
PDF
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Kognitio
 
DOC
ETL_Developer_Resume_Shipra_7_02_17
Shipra Jaiswal
 
DOC
Munir_Database_Developer
Munir Muhammad
 
PDF
ODI11g, Hadoop and "Big Data" Sources
Mark Rittman
 
PPTX
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
avanttic Consultoría Tecnológica
 
PDF
Meetup Oracle Database BCN: 2.1 Data Management Trends
avanttic Consultoría Tecnológica
 
PDF
Agile Data Science 2.0
Russell Jurney
 
PDF
Solution Use Case Demo: The Power of Relationships in Your Big Data
InfiniteGraph
 
PDF
6 enriching your data warehouse with big data and hadoop
Dr. Wilfred Lin (Ph.D.)
 
PPTX
2013 05 Oracle big_dataapplianceoverview
jdijcks
 
PDF
Database@Home : The Future is Data Driven
Tammy Bednar
 
PDF
jagadeesh updated
jagadeesh yadav
 
PDF
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Rittman Analytics
 
Microsoft Data Science Technologies 201608
Mark Tabladillo
 
Big data oracle_introduccion
Fran Navarro
 
LT Infotech_Amit_Kurani_10621681_CV
Amit Kurani
 
Microsoft Technologies for Data Science 201612
Mark Tabladillo
 
Expand a Data warehouse with Hadoop and Big Data
jdijcks
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Revolution Analytics
 
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Kognitio
 
ETL_Developer_Resume_Shipra_7_02_17
Shipra Jaiswal
 
Munir_Database_Developer
Munir Muhammad
 
ODI11g, Hadoop and "Big Data" Sources
Mark Rittman
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
avanttic Consultoría Tecnológica
 
Meetup Oracle Database BCN: 2.1 Data Management Trends
avanttic Consultoría Tecnológica
 
Agile Data Science 2.0
Russell Jurney
 
Solution Use Case Demo: The Power of Relationships in Your Big Data
InfiniteGraph
 
6 enriching your data warehouse with big data and hadoop
Dr. Wilfred Lin (Ph.D.)
 
2013 05 Oracle big_dataapplianceoverview
jdijcks
 
Database@Home : The Future is Data Driven
Tammy Bednar
 
jagadeesh updated
jagadeesh yadav
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Rittman Analytics
 
Ad

More from OUGTH Oracle User Group in Thailand (18)

PDF
Quarterly leader-call-dec-2014
OUGTH Oracle User Group in Thailand
 
PDF
Oracle Database Monitoring with AAS
OUGTH Oracle User Group in Thailand
 
PDF
How oracle 12c flexes its muscles against oracle 11g r2
OUGTH Oracle User Group in Thailand
 
PDF
Presentation joelperez thailand2014
OUGTH Oracle User Group in Thailand
 
PDF
How to-work-with-the-oracle-user-group-team
OUGTH Oracle User Group in Thailand
 
PDF
Apouc 2014-java-8-create-the-future
OUGTH Oracle User Group in Thailand
 
PDF
Apouc 2014-enterprise-manager-12c
OUGTH Oracle User Group in Thailand
 
PDF
Apouc 2014-learn-from-oracle-support
OUGTH Oracle User Group in Thailand
 
PDF
Apouc 2014-business-analytics-and-big-data
OUGTH Oracle User Group in Thailand
 
PDF
Apouc 2014-oracle-applications-update
OUGTH Oracle User Group in Thailand
 
PDF
Apouc 2014-oracle mobile platform
OUGTH Oracle User Group in Thailand
 
PDF
Apouc 2014-oracle-ace-program
OUGTH Oracle User Group in Thailand
 
PDF
Apouc 2014-oracle-community-programs
OUGTH Oracle User Group in Thailand
 
PDF
Apouc 2014-oracle-cloud-strategy
OUGTH Oracle User Group in Thailand
 
PDF
How to install oracle 12c release 1
OUGTH Oracle User Group in Thailand
 
PDF
Session 307 ravi pendekanti engineered systems
OUGTH Oracle User Group in Thailand
 
PDF
Session 203 iouc summit database
OUGTH Oracle User Group in Thailand
 
Quarterly leader-call-dec-2014
OUGTH Oracle User Group in Thailand
 
Oracle Database Monitoring with AAS
OUGTH Oracle User Group in Thailand
 
How oracle 12c flexes its muscles against oracle 11g r2
OUGTH Oracle User Group in Thailand
 
Presentation joelperez thailand2014
OUGTH Oracle User Group in Thailand
 
How to-work-with-the-oracle-user-group-team
OUGTH Oracle User Group in Thailand
 
Apouc 2014-java-8-create-the-future
OUGTH Oracle User Group in Thailand
 
Apouc 2014-enterprise-manager-12c
OUGTH Oracle User Group in Thailand
 
Apouc 2014-learn-from-oracle-support
OUGTH Oracle User Group in Thailand
 
Apouc 2014-business-analytics-and-big-data
OUGTH Oracle User Group in Thailand
 
Apouc 2014-oracle-applications-update
OUGTH Oracle User Group in Thailand
 
Apouc 2014-oracle mobile platform
OUGTH Oracle User Group in Thailand
 
Apouc 2014-oracle-ace-program
OUGTH Oracle User Group in Thailand
 
Apouc 2014-oracle-community-programs
OUGTH Oracle User Group in Thailand
 
Apouc 2014-oracle-cloud-strategy
OUGTH Oracle User Group in Thailand
 
How to install oracle 12c release 1
OUGTH Oracle User Group in Thailand
 
Session 307 ravi pendekanti engineered systems
OUGTH Oracle User Group in Thailand
 
Session 203 iouc summit database
OUGTH Oracle User Group in Thailand
 
Ad

User 2013-oracle-big-data-analytics-1971985

  • 1. Big Data Analytics – Scaling R to Enterprise Data useR! 2013 – Albacete Spain #useR2013 Luis Campos Mark Hornick Big Data Solutions Lead, Oracle EMEA @luigicampos 1 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Director, Oracle Database Advanced Analytics @MarkHornick Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
  • 2. 2 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
  • 3. The girl with all the questions! “The real innovation here is that we can and get the ask questions answer back before we have forgotten why we asked the question in the first place .” – Hilary Mason, Chief Scientist Bit.ly + member of NYC Mayor Bloomberg’s Technology and Innovation Advisory Council 3 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
  • 4. Nexus of Forces, Platform 3.0, Four Pillars What Analysts/groups are saying? 4 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
  • 5. New Information Challenges Data Explosion A Decade of Digital Universe Growth: Storage in Exabytes (Source: IDC’s Digital Universe Study, June 2011) Combinatory Explosion Dimension Explosion 5 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
  • 6. Big Data Solution = Data + Analytics + Tools Source: McKinsey study “Big data: What’s your plan?” (March 2013) http://www.mckinsey.com/insights/business_technology/big_data_whats_your_plan DATA Any Data, Any Source 6 ANALYTICS Out-of-the box Analytics, New Models Copyright © 2013, Oracle and/or its affiliates. All rights reserved. TOOLS Self Service Data Discovery On Premise, On Cloud, On Mobile
  • 7. Oracle Complete Business Analytics Solution BIG DATA APPLIANCE BIG DATA CONNECTORS NoSQL DB 7 Oracle Advanced DATA MINING Analytics ORACLE R Ent. SPATIAL,GRAPH Real Time Decisions (RTD) Copyright © 2013, Oracle and/or its affiliates. All rights reserved. OBIEE ENDECA Collective Intellect (CI) On Premise, Oracle Cloud, On Mobile
  • 8. Apply Advanced Analytics on All Data Visualise it with any BI Tool Hadoop Relational HDFS Data BI Tools 8 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
  • 9. Oracle R Advantages 1. Keep the R tools 2. Keep the data where it sits (Relational or HDFS) 3. Keep the SQL Based BI Tools 4. Scale to LARGE data sets R workspace console Function push-down – data transformation & Oracle statistics engine statistics Development 9 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Production OBIEE, Web Services Consumption
  • 10. Oracle’s Advanced Analytics Strategic Offerings Deliver enterprise-level advanced analytics in the Database  Oracle in-Database Data Mining algorithms – Access through Free GUI from SQL Developer or programmatically from SQL, PL/SQL, R or Java – Predictive model APIs for the Oracle R Enterprise – Exadata architecture advantages for up to 5x improvement with Smart Scan  Oracle R Distribution – Free download, pre-installed on Oracle Big Data Appliance, bundled with Oracle Linux – Enhanced linear algebra performance: Intel’s Math Kernel Library, AMD’s Core Math Library (Windows and Linux), SUN Solaris and IBM AIX – Enterprise support for customers of Oracle Advanced Analytics, Big Data Appliance, and Oracle Linux 10 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 11. Oracle’s Advanced Analytics Strategic Offerings Deliver enterprise-level R in the Database or Hadoop  Oracle R Enterprise – Transparent access to database-resident data from R – Embedded R script execution through database managed R engines – Statistics engine – Enhanced support for high-speed Exadata scoring  Oracle R Connector for Hadoop [ORCH] (Part of Oracle Big Data Connectors) – R interface to Oracle Hadoop Cluster on BDA and non-Oracle Hadoop clusters – Access and manipulate data in HDFS, database, and file system – Write MapReduce functions using R and execute through natural R interface – Predictive models with execution in-Cluster against Hadoop-stored data 11 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 12. Oracle R Components Component layout Analyst Laptop Oracle Database Oracle R Distribution Oracle R Enterprise Server Components Oracle R Distribution Oracle R Connector for Hadoop Client Oracle R Enterprise Client Packages Optional with ORCH 12 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle R Distribution Oracle R Connector for Hadoop Oracle R Enterprise Client Packages Big Data Appliance Oracle R Enterprise Client Packages Exadata
  • 13. Knowledge Exploitation Process Typical stages in a Big Data Project Business Understanding Deployment Data Scientist Data Selection Evaluation Discovery Model Building 13 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Data Preparation 13
  • 14. Data Loading with Oracle R Enterprise Business Understanding Deployment Data Scientist Data Selection library(ORE) R> df <- data.frame(A=1:26, B=letters[1:26]) R> dim(df) [1] 26 2 R> class(df) [1] "data.frame" R> ore.create(df, table="DF_TABLE") Evaluation Discovery Model Building 16 Data Preparation Copyright © 2013, Oracle and/or its affiliates. All rights reserved. R> ore.ls() [1] "DF_TABLE" R> class(DF_TABLE) [1] "ore.frame" attr(,"package") [1] "OREbase" R> dim(DF_TABLE) [1] 26 2 16
  • 15. Discovery with Oracle R in-DB and HDFS Business Understanding Deployment Data Scientist Discovery Evaluation Model Building 17 Data Selection Data Preparation Copyright © 2013, Oracle and/or its affiliates. All rights reserved. library(ORE) ore.ls() # list tables in DB class(MY_TABLE) # ore.frame dim(MY_TABLE) # overloaded R functions head(MY_TABLE) sample(MY_TABLE) summary(MY_TABLE) library(ORCH) hdfs.ls() hdfs.dim("myHDFSdata") hdfs.head("myHDFSdata") hdfs.sample("myHDFSdata") hdfs.toHive("myHDFSdata", tablename="my_hive_data") summary(my_hive_data) 17
  • 16. Data Prep with Oracle R in-DB and HDFS library(ORE) / library(ORCH) # join merge (MY_TABLE1, MY_TABLE2,by.x="x1", by.y="x2") Business Understanding Deployment Data Scientist Data Selection # project columns df <- MY_TABLE[,c("X","Y","Z")] # filter rows df <- df[df$Z<=4.3 | df$A=="B",1:3] Evaluation Discovery Model Building 18 Data Preparation Copyright © 2013, Oracle and/or its affiliates. All rights reserved. #binning IRIS_TAB <- ore.push(iris[1:4]) IRIS_TAB$PetalBins = ifelse(IRIS_TAB$Petal.Length < 2.0, "SMALL PETALS", ifelse(IRIS_TAB$Petal.Length < 4.0, "MEDIUM PETALS", "LARGE PETALS")) 18
  • 17. “Densifying” data: custom MapReduce jobs Count occurrence of hash tags in tweets per customer for select tags mapHashTags <- function (k,v) { x <- strsplit(v$text, " ") x <- x[x!=''] importantTags <- tolower(importantTags) for(twt in 1:length(x)) { for(tag in x[[twt]]) { if(substr(tag,1,1) == "#") { tagL <- tolower(tag) if(tagL %in% importantTags) { orch.keyval(v[twt,"screenName"],tagL) }}}}} reduceHashTags <- function(k,vals) { # k = screenName, vals = vector(tags) importantTags <- tolower(importantTags) vals <- factor(vals$val,levels=importantTags) x <- as.data.frame(t(as.matrix(table(vals)))) orch.keyval(k,x) # k = screenName, x = df(importantTags as cols) with counts } 19 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 19
  • 18. ORCH: Create your own MapReduce jobs Count occurrence of hash tags in tweets per customer for select tags importantTags <- c("#bigdata","#database","#oracle","#sql") tag.summary <- hadoop.exec(tweets.id, mapper=mapHashTags, reducer=reduceHashTags, export=orch.export(importantTags=importantTags), config=new("mapred.config", job.name = "TwitterScreenNameHashTags", reduce.tasks = 5, map.output = data.frame(key='a', val='a'), reduce.output = data.frame(key='a', bigdata=0, database=0 ,oracle=0, sql=0))) hdfs.get(tag.summary) > hdfs.get(tag.summary) key bigdata database oracle sql 1 4 7 37 91 2 twitter.user.2 15 19 1 32 3 twitter.user.3 104 57 8 0 4 20 twitter.user.1 twitter.user.4 0 64 549 0 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 20
  • 19. Modelling with Oracle R in-DB and HDFS # Clustering with ORE Business Understanding Deployment Data Scientist Data Selection X <- ore.push (data.frame(x)) km.mod1 <ore.odmKMeans(~., X, num.centers=2, num.bins=5) summary(km.mod1) rules(km.mod1) clusterhists(km.mod1) # Regression with ORCH Discovery Evaluation Model Building 21 Data Preparation Copyright © 2013, Oracle and/or its affiliates. All rights reserved. mod.lm <- orch.lm(myFormula, myData, nReducers = 2) summary(mod.lm) pred <- predict.orch.lm(mod.lm, newdata = myData) res.pred <- hdfs.get(pred) head(res.pred) 21
  • 20. In-database performance advantage R lm vs. ORE ore.lm Data: 500k to 1.5m records, 3 predictors Performance: 2x-3x improvement for build, 4x improvement for scoring 22 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 22
  • 21. In-database performance advantage – lm More tests at http://blogs.oracle.com/R/entry/oracle_r_enterprise_1_32 23 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 23
  • 22. Deploying with Oracle R Enterprise Load R scripts into ORE script repository Invoke R scripts by name from SQL Business Understanding Production Deploy ment Data Scientist Data Selection Discovery Evaluation Model Building 24 Data Preparation Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Store R objects directly in Oracle Database (no separate files) Optional return values: • Data frame consumable by any SQL-ready application • XML containing structured data, complex R objects, PNG images • PNG table with BLOB column containing images for immediate consumption Schedule for automatic execution 24
  • 23. Oracle Advanced Analytics: Embedded R Execution SQL interface rqEval – generate XML string for graphic output Oracle PL/SQL begin sys.rqScriptCreate('Example6', 'function(){ res <- 1:10 Oracle BI Publisher plot( 1:100, rnorm(100), pch = 21, bg = "red", cex = 2 ) R Language res }'); end; / Oracle SQL select value from 25 table(rqEval(NULL,'XML','Example6')); Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 24. Summary Oracle R Enterprise (ORE) Oracle R Connector for Hadoop (ORCH) • A comprehensive, database-centric environment for end-to-end analytical processes in R with immediate deployment to production environments • Wide range of in-database advanced analytics algorithms exposed through R • Eliminate R client memory limits • A collection of R packages enabling Big Data analytics from an R environment • Allows R users to leverage a Hadoop Cluster with HDFS and MapReduce from R • Prepackaged advanced analytics algorithms • Transparent manipulation of HIVE data • Enable R users to conduct Big Data projects from R • Eliminate client R engine memory barrier • Scale to large data sets • Deploy R-based solutions without translation to other languages or environments 26 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 26
  • 25. Resources • Blog: http://www.oracle.com/goto/R https://blogs.oracle.com/R/ • Forum: https://forums.oracle.com/forums/forum.jspa?forumID=1397 • Oracle R Distribution: http://www.oracle.com/technetwork/indexes/downloads/r-distribution-1532464.html • ROracle: http://cran.r-project.org/web/packages/ROracle • Oracle R Enterprise: http://www.oracle.com/technetwork/database/options/advanced-analytics/r-enterprise • Oracle R Connector for Hadoop: http://www.oracle.com/us/products/database/big-data-connectors/overview 27 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 27
  • 26. 28 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 28