SlideShare a Scribd company logo
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Rscript R/master.R 
--port=7137
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
●
○
○
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
> rsuite install
Detecting repositories ...
Will use repositories:
CRAN.CRAN = https://mran.microsoft.com/snapshot/2017-10-15
CRAN.CRANextra = http://www.stats.ox.ac.uk/pub/RWin
Other = http://wlog-rsuite.s3.amazonaws.com
Installing RSuite(v0.17x) package ...
installing the source package 'RSuite'
All done.
Large scale machine learning projects with r suite
> rsuite proj start -n spmf
Commands:
update
Checks if newest version of RSuite CLI is installed. If not
installer for newest version is downloaded and installation
is initiated.
install
Install RSuite with all the dependencies.
proj
Use it to manage project, its dependencies, and build
project packages.
repo
Use to manage repositories. e.g. upload packages.
pkgzip
Use to create PKGZIP packages to fillup remove repository.
version
Show RSuite CLI version.
help
Show this message and exit.
Call 'rsuite [command] help' to get information on acceptable [args].
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
logs/.gitignore
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
PARAMETERS
●
●
●
○
○
○
○
●
●
●
●
LogLevel: INFO
N_days: 365
solver_max_iterations: 10
solver_opt_horizon: 8
Large scale machine learning projects with r suite
●
●
○ main
○ if __name__ == "__main__":
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
predmodel
Large scale machine learning projects with r suite
● ==
● >=
● <=
●
master.R
Large scale machine learning projects with r suite
spmf/libs
packages_import.R
master.R
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
import_training.R (I)
● import/<session_id>/
● work/<session_id>/
library(predmodel)
import_path <- file.path(script_path, "../import")
work_path <- file.path(script_path, "../work")
# required
session_id <- args$get(name = "session_id", default = "201711122000", required = FALSE)
loginfo("--> Session id:%s", session_id)
session_work <- file.path(work_path, session_id)
if(!dir.exists(session_work)) {
dir.create(session_work)
}
import_training_data(file.path(import_path, session_id),
session_work)
import_training.R (II)
Large scale machine learning projects with r suite
devtools
import_training_data
#' @export
import_training_data <- function(import_path, work_path) {
pkg_loginfo("Importing from %s into %s",
import_path,
work_path)
n <- 10000
dt <- data.table(feature1 = rnorm(n), feature2 = rnorm(n))
m <- round(n*0.3)
dt[, resp := c(rep(1, m), rep(0, n - m))]
fwrite(x = dt,
file = file.path(work_path, "training.csv"),
sep = ";")
}
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
estimate_model.R (I)
●
●
library(predmodel)
work_path <- file.path(script_path, "../work")
# required
session_id <- args$get(name = "session_id", required =
FALSE, default = "201710111655")
loginfo("--> Session id:%s", session_id)
session_work <- file.path(work_path, session_id)
h2o.init(max_mem_size = "4g",
nthreads = 2)
logdebug("---> H2O started")
train_file <- file.path(session_work, "training.csv")
stopifnot(file.exists(train_file))
train_file %>%
transform_training() %>%
estimate_model(session_id) %>%
save_model(session_work)
transform_training
#' @export
transform_training <- function(train_file) {
dt <- h2o.importFile(path = train_file,
destination_frame = "train_dt",
parse = TRUE,
header = TRUE,
sep = ";")
dt$resp <- as.factor(dt$resp)
dt <- h2o.assign(data=dt, key = "train_dt")
return(dt)
}
estimate_model
#'@export
estimate_model <- function(dt, session_id) {
model <- h2o.gbm(x = colnames(dt),
y = "resp",
training_frame = dt,
model_id = sprintf("gbm_%s", session_id),
ntrees = 10,
learn_rate = 0.1)
}
save_model
#' @export
save_model <- function(model, session_work) {
h2o.saveModel(model,
path = session_work,
force =TRUE)
}
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
import_test.R (I)
● import/<session_id>/
● work/<session_id>/
library(predmodel)
import_path <- file.path(script_path, "../import")
work_path <- file.path(script_path, "../work")
# required
session_id <- args$get(name = "session_id", default = "201711122000", required = FALSE)
loginfo("--> Session id:%s", session_id)
session_work <- file.path(work_path, session_id)
if(!dir.exists(session_work)) {
dir.create(session_work)
}
import_test_data(file.path(import_path, session_id),
session_work)
import_test_data
#' @export
import_test_data <- function(import_path, work_path) {
pkg_loginfo("Importing from %s into %s",
import_path,
work_path)
n <- 1000
dt <- data.table(feature1 = rnorm(n), feature2 = rnorm(n))
fwrite(x = dt,
file = file.path(work_path, "test.csv"),
sep = ";")
}
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
score_model.R (I)
● work/<score_session_id>
● work/<train_session_id>
● export/<score_session_id>
score_model.R (II)
library(h2o)
library(magrittr)
library(predmodel)
work_path <- file.path(script_path, "../work")
export_path <- file.path(script_path, "../export")
# required
train_session_id <- args$get(name = "train_session_id",
required = FALSE, default = "201710111655")
score_session_id <- args$get(name = "score_session_id",
required = FALSE, default = "201710111655")
loginfo("--> train session id:%s", train_session_id)
loginfo("--> score session id:%s", score_session_id)
score_session_export <- export_path
train_session_work <- file.path(work_path, train_session_id)
score_session_work <- file.path(work_path, score_session_id)
h2o.init(max_mem_size = "4g",
nthreads = 2)
logdebug("---> H2O started")
test_file <- file.path(score_session_work, "test.csv")
model_file <- file.path(train_session_work,
sprintf("gbm_%s", train_session_id))
stopifnot(file.exists(test_file))
stopifnot(file.exists(model_file))
test_dt <- test_file %>%
transform_test()
score_model(test_dt = test_dt,
model_path = model_file) %>%
export_score(export_path = export_path,
score_session_id = score_session_id)
transform_test
#' @export
transform_test <- function(test_file) {
h2o.importFile(path = test_file,
destination_frame = "test_dt",
parse = TRUE,
header = TRUE,
sep = ";")
}
score_model
#' @export
score_model <- function(test_dt, model_path) {
model <- h2o.loadModel(model_path)
pred_dt <- h2o.predict(model, test_dt)
pred_dt
}
export_score
#' @export
export_score <- function(score_dt, score_session_id, export_path) {
score_dt <- as.data.table(score_dt)
score_dt[, score_session_id := score_session_id]
fwrite(x = score_dt,
file = file.path(export_path, "score.csv"),
sep = ";",
append = TRUE)
}
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Production
spmf_0.1_001.zip
Production/spmf import export
work
Production/spmf/R
a. Rscript import_training.R
b. Rscript estimate_model.R
c. Rscript import_test.R
d. Rscript score_model.R
Production/spmf/export
Large scale machine learning projects with r suite
print
Large scale machine learning projects with r suite
loginfo("Phase 1 passed")
logdebug("Iter %d done", i)
logtrace("Iter %d done", i)
logwarning("Are you sure?")
logerror("I failed :(")
Packages
pkg_loginfo("Phase 1 passed")
pkg_logdebug("Iter %d done", i)
pkg_logtrace("Iter %d done", i)
pkg_logwarning("Are you sure?")
pkg_logerror("I failed :(")
2017-11-13 13:47:03 INFO::--> Session id:201711122000
2017-11-13 13:47:03 INFO:predmodel:Importing from
C:/Workplace/Sandbox/Production/spmf/R/../import/201711122000 into
C:/Workplace/Sandbox/Production/spmf/R/../work/201711122000
2017-11-13 13:47:14 INFO::--> Session id:201711122000
2017-11-13 13:47:51 INFO::--> Session id:201711131000
2017-11-13 13:47:51 INFO:predmodel:Importing from
C:/Workplace/Sandbox/Production/spmf/R/../import/201711131000 into
C:/Workplace/Sandbox/Production/spmf/R/../work/201711131000
2017-11-13 13:47:57 INFO::--> train session id:201711122000
2017-11-13 13:47:57 INFO::--> score session id:201711131000
LogLevel: INFO
LogLevel: DEBUG
LogLevel: TRACE
import_training.R
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
tests/test_spmf.R
library(predmodel)
library(testthat)
context("Testing context")
test_that(desc = "Test",
code = {
expect_true(5 > 3)
expect_true(pi < 3)
})
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite
Large scale machine learning projects with r suite

More Related Content

What's hot (20)

PDF
Ansible Callback Plugins
jtyr
 
PDF
ActionHeroJS Talk
David Peralta
 
PDF
Intro django
Alexander Lyabah
 
PDF
第1回PHP拡張勉強会
Ippei Ogiwara
 
PDF
Commencer avec le TDD
Eric Hogue
 
ODP
nginx mod PSGI
Yaroslav Korshak
 
PDF
Continuous testing In PHP
Eric Hogue
 
PDF
Guarding Your Code Against Bugs with Continuous Testing
Eric Hogue
 
PDF
Angular promises and http
Alexe Bogdan
 
PDF
Getting started with TDD - Confoo 2014
Eric Hogue
 
PDF
Unit testing JavaScript using Mocha and Node
Josh Mock
 
KEY
Zen: Building Maintainable Catalyst Applications
Jay Shirley
 
PDF
Elixir on Containers
Sachirou Inoue
 
ODP
Europython 2011 - Playing tasks with Django & Celery
Mauro Rocco
 
PDF
Why Redux-Observable?
Anna Su
 
PDF
Workshop 10: ECMAScript 6
Visual Engineering
 
PDF
Unit testing with mocha
Revath S Kumar
 
PDF
05 communications
memeapps
 
PDF
톰캣 #04-환경설정
GyuSeok Lee
 
PDF
Hacking ansible
bcoca
 
Ansible Callback Plugins
jtyr
 
ActionHeroJS Talk
David Peralta
 
Intro django
Alexander Lyabah
 
第1回PHP拡張勉強会
Ippei Ogiwara
 
Commencer avec le TDD
Eric Hogue
 
nginx mod PSGI
Yaroslav Korshak
 
Continuous testing In PHP
Eric Hogue
 
Guarding Your Code Against Bugs with Continuous Testing
Eric Hogue
 
Angular promises and http
Alexe Bogdan
 
Getting started with TDD - Confoo 2014
Eric Hogue
 
Unit testing JavaScript using Mocha and Node
Josh Mock
 
Zen: Building Maintainable Catalyst Applications
Jay Shirley
 
Elixir on Containers
Sachirou Inoue
 
Europython 2011 - Playing tasks with Django & Celery
Mauro Rocco
 
Why Redux-Observable?
Anna Su
 
Workshop 10: ECMAScript 6
Visual Engineering
 
Unit testing with mocha
Revath S Kumar
 
05 communications
memeapps
 
톰캣 #04-환경설정
GyuSeok Lee
 
Hacking ansible
bcoca
 

Similar to Large scale machine learning projects with r suite (20)

PDF
Writing and Publishing Puppet Modules - PuppetConf 2014
Puppet
 
PDF
Burn down the silos! Helping dev and ops gel on high availability websites
Lindsay Holmwood
 
PDF
Nko workshop - node js crud & deploy
Simon Su
 
PDF
Writing and Publishing Puppet Modules
Puppet
 
PDF
Lean Php Presentation
Alan Pinstein
 
PDF
Nativescript angular
Christoffer Noring
 
KEY
Railsconf2011 deployment tips_for_slideshare
tomcopeland
 
PDF
Pyramid Deployment and Maintenance
Jazkarta, Inc.
 
PDF
Zero Downtime Deployment with Ansible
Stein Inge Morisbak
 
PDF
IR Journal (itscholar.codegency.co.in).pdf
RahulRoy130127
 
PPTX
Protractor framework – how to make stable e2e tests for Angular applications
Ludmila Nesvitiy
 
TXT
Hello click click boom
symbian_mgl
 
PDF
From Dev to DevOps - Codemotion ES 2012
Carlos Sanchez
 
PDF
PerlDancer for Perlers (FOSDEM 2011)
xSawyer
 
KEY
fog or: How I Learned to Stop Worrying and Love the Cloud
Wesley Beary
 
ODP
Dynamic Tracing of your AMP web site
Sriram Natarajan
 
PPTX
Let's play with adf 3.0
Eugenio Romano
 
PDF
Тестирование и Django
MoscowDjango
 
DOCX
VPN Access Runbook
Taha Shakeel
 
PPTX
Sql storeprocedure
ftz 420
 
Writing and Publishing Puppet Modules - PuppetConf 2014
Puppet
 
Burn down the silos! Helping dev and ops gel on high availability websites
Lindsay Holmwood
 
Nko workshop - node js crud & deploy
Simon Su
 
Writing and Publishing Puppet Modules
Puppet
 
Lean Php Presentation
Alan Pinstein
 
Nativescript angular
Christoffer Noring
 
Railsconf2011 deployment tips_for_slideshare
tomcopeland
 
Pyramid Deployment and Maintenance
Jazkarta, Inc.
 
Zero Downtime Deployment with Ansible
Stein Inge Morisbak
 
IR Journal (itscholar.codegency.co.in).pdf
RahulRoy130127
 
Protractor framework – how to make stable e2e tests for Angular applications
Ludmila Nesvitiy
 
Hello click click boom
symbian_mgl
 
From Dev to DevOps - Codemotion ES 2012
Carlos Sanchez
 
PerlDancer for Perlers (FOSDEM 2011)
xSawyer
 
fog or: How I Learned to Stop Worrying and Love the Cloud
Wesley Beary
 
Dynamic Tracing of your AMP web site
Sriram Natarajan
 
Let's play with adf 3.0
Eugenio Romano
 
Тестирование и Django
MoscowDjango
 
VPN Access Runbook
Taha Shakeel
 
Sql storeprocedure
ftz 420
 
Ad

More from Wit Jakuczun (14)

PDF
recommendation = optimization(prediction)
Wit Jakuczun
 
PDF
Always Be Deploying. How to make R great for machine learning in (not only) E...
Wit Jakuczun
 
PDF
Driving your marketing automation with multi-armed bandits in real time
Wit Jakuczun
 
PDF
Know your R usage workflow to handle reproducibility challenges
Wit Jakuczun
 
PDF
Managing large (and small) R based solutions with R Suite
Wit Jakuczun
 
PDF
20170928 why r_r jako główna platforma do zaawansowanej analityki w enterprise
Wit Jakuczun
 
PDF
Wit jakuczun dss_conf_2017_jak_wdrazac_r_w_enterprise
Wit Jakuczun
 
PDF
Case Studies in advanced analytics with R
Wit Jakuczun
 
PPTX
Bringing the Power of LocalSolver to R: a Real-Life Case-Study
Wit Jakuczun
 
PDF
ANALYTICS WITHOUT LOSS OF GENERALITY
Wit Jakuczun
 
PDF
Showcase: on segmentation importance for marketing campaign in retail using R...
Wit Jakuczun
 
PDF
20150521 ser protecto_r_final
Wit Jakuczun
 
PDF
Rozwiązywanie problemów optymalizacyjnych (z przykładem w R)
Wit Jakuczun
 
PDF
R+H2O - idealny tandem do analityki predykcyjnej?
Wit Jakuczun
 
recommendation = optimization(prediction)
Wit Jakuczun
 
Always Be Deploying. How to make R great for machine learning in (not only) E...
Wit Jakuczun
 
Driving your marketing automation with multi-armed bandits in real time
Wit Jakuczun
 
Know your R usage workflow to handle reproducibility challenges
Wit Jakuczun
 
Managing large (and small) R based solutions with R Suite
Wit Jakuczun
 
20170928 why r_r jako główna platforma do zaawansowanej analityki w enterprise
Wit Jakuczun
 
Wit jakuczun dss_conf_2017_jak_wdrazac_r_w_enterprise
Wit Jakuczun
 
Case Studies in advanced analytics with R
Wit Jakuczun
 
Bringing the Power of LocalSolver to R: a Real-Life Case-Study
Wit Jakuczun
 
ANALYTICS WITHOUT LOSS OF GENERALITY
Wit Jakuczun
 
Showcase: on segmentation importance for marketing campaign in retail using R...
Wit Jakuczun
 
20150521 ser protecto_r_final
Wit Jakuczun
 
Rozwiązywanie problemów optymalizacyjnych (z przykładem w R)
Wit Jakuczun
 
R+H2O - idealny tandem do analityki predykcyjnej?
Wit Jakuczun
 
Ad

Recently uploaded (20)

PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PPTX
Rocket-Launched-PowerPoint-Template.pptx
Arden31
 
PDF
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
DOCX
AI/ML Applications in Financial domain projects
Rituparna De
 
PDF
Performance Report Sample (Draft7).pdf
AmgadMaher5
 
PPT
01 presentation finyyyal معهد معايره.ppt
eltohamym057
 
PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
PPT
Lecture 2-1.ppt at a higher learning institution such as the university of Za...
rachealhantukumane52
 
PDF
Incident Response and Digital Forensics Certificate
VICTOR MAESTRE RAMIREZ
 
PPTX
TSM_08_0811111111111111111111111111111111111111111111111
csomonasteriomoscow
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PPTX
Usage of Power BI for Pharmaceutical Data analysis.pptx
Anisha Herala
 
PPTX
fashion industry boom.pptx an economics project
TGMPandeyji
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PPTX
Slide studies GC- CRC - PC - HNC baru.pptx
LLen8
 
PPTX
recruitment Presentation.pptxhdhshhshshhehh
devraj40467
 
DOC
MATRIX_AMAN IRAWAN_20227479046.docbbbnnb
vanitafiani1
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PDF
How to Avoid 7 Costly Mainframe Migration Mistakes
JP Infra Pvt Ltd
 
PPT
1 DATALINK CONTROL and it's applications
karunanidhilithesh
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
Rocket-Launched-PowerPoint-Template.pptx
Arden31
 
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
AI/ML Applications in Financial domain projects
Rituparna De
 
Performance Report Sample (Draft7).pdf
AmgadMaher5
 
01 presentation finyyyal معهد معايره.ppt
eltohamym057
 
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
Lecture 2-1.ppt at a higher learning institution such as the university of Za...
rachealhantukumane52
 
Incident Response and Digital Forensics Certificate
VICTOR MAESTRE RAMIREZ
 
TSM_08_0811111111111111111111111111111111111111111111111
csomonasteriomoscow
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
Usage of Power BI for Pharmaceutical Data analysis.pptx
Anisha Herala
 
fashion industry boom.pptx an economics project
TGMPandeyji
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
Slide studies GC- CRC - PC - HNC baru.pptx
LLen8
 
recruitment Presentation.pptxhdhshhshshhehh
devraj40467
 
MATRIX_AMAN IRAWAN_20227479046.docbbbnnb
vanitafiani1
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
How to Avoid 7 Costly Mainframe Migration Mistakes
JP Infra Pvt Ltd
 
1 DATALINK CONTROL and it's applications
karunanidhilithesh
 

Large scale machine learning projects with r suite

  • 31. > rsuite install Detecting repositories ... Will use repositories: CRAN.CRAN = https://mran.microsoft.com/snapshot/2017-10-15 CRAN.CRANextra = http://www.stats.ox.ac.uk/pub/RWin Other = http://wlog-rsuite.s3.amazonaws.com Installing RSuite(v0.17x) package ... installing the source package 'RSuite' All done.
  • 33. > rsuite proj start -n spmf
  • 34. Commands: update Checks if newest version of RSuite CLI is installed. If not installer for newest version is downloaded and installation is initiated. install Install RSuite with all the dependencies. proj Use it to manage project, its dependencies, and build project packages. repo Use to manage repositories. e.g. upload packages. pkgzip Use to create PKGZIP packages to fillup remove repository. version Show RSuite CLI version. help Show this message and exit. Call 'rsuite [command] help' to get information on acceptable [args].
  • 60. ● ● ○ main ○ if __name__ == "__main__":
  • 77. import_training.R (I) ● import/<session_id>/ ● work/<session_id>/ library(predmodel) import_path <- file.path(script_path, "../import") work_path <- file.path(script_path, "../work") # required session_id <- args$get(name = "session_id", default = "201711122000", required = FALSE) loginfo("--> Session id:%s", session_id) session_work <- file.path(work_path, session_id) if(!dir.exists(session_work)) { dir.create(session_work) } import_training_data(file.path(import_path, session_id), session_work)
  • 81. import_training_data #' @export import_training_data <- function(import_path, work_path) { pkg_loginfo("Importing from %s into %s", import_path, work_path) n <- 10000 dt <- data.table(feature1 = rnorm(n), feature2 = rnorm(n)) m <- round(n*0.3) dt[, resp := c(rep(1, m), rep(0, n - m))] fwrite(x = dt, file = file.path(work_path, "training.csv"), sep = ";") }
  • 84. estimate_model.R (I) ● ● library(predmodel) work_path <- file.path(script_path, "../work") # required session_id <- args$get(name = "session_id", required = FALSE, default = "201710111655") loginfo("--> Session id:%s", session_id) session_work <- file.path(work_path, session_id) h2o.init(max_mem_size = "4g", nthreads = 2) logdebug("---> H2O started") train_file <- file.path(session_work, "training.csv") stopifnot(file.exists(train_file)) train_file %>% transform_training() %>% estimate_model(session_id) %>% save_model(session_work)
  • 85. transform_training #' @export transform_training <- function(train_file) { dt <- h2o.importFile(path = train_file, destination_frame = "train_dt", parse = TRUE, header = TRUE, sep = ";") dt$resp <- as.factor(dt$resp) dt <- h2o.assign(data=dt, key = "train_dt") return(dt) }
  • 86. estimate_model #'@export estimate_model <- function(dt, session_id) { model <- h2o.gbm(x = colnames(dt), y = "resp", training_frame = dt, model_id = sprintf("gbm_%s", session_id), ntrees = 10, learn_rate = 0.1) }
  • 87. save_model #' @export save_model <- function(model, session_work) { h2o.saveModel(model, path = session_work, force =TRUE) }
  • 90. import_test.R (I) ● import/<session_id>/ ● work/<session_id>/ library(predmodel) import_path <- file.path(script_path, "../import") work_path <- file.path(script_path, "../work") # required session_id <- args$get(name = "session_id", default = "201711122000", required = FALSE) loginfo("--> Session id:%s", session_id) session_work <- file.path(work_path, session_id) if(!dir.exists(session_work)) { dir.create(session_work) } import_test_data(file.path(import_path, session_id), session_work)
  • 91. import_test_data #' @export import_test_data <- function(import_path, work_path) { pkg_loginfo("Importing from %s into %s", import_path, work_path) n <- 1000 dt <- data.table(feature1 = rnorm(n), feature2 = rnorm(n)) fwrite(x = dt, file = file.path(work_path, "test.csv"), sep = ";") }
  • 94. score_model.R (I) ● work/<score_session_id> ● work/<train_session_id> ● export/<score_session_id>
  • 95. score_model.R (II) library(h2o) library(magrittr) library(predmodel) work_path <- file.path(script_path, "../work") export_path <- file.path(script_path, "../export") # required train_session_id <- args$get(name = "train_session_id", required = FALSE, default = "201710111655") score_session_id <- args$get(name = "score_session_id", required = FALSE, default = "201710111655") loginfo("--> train session id:%s", train_session_id) loginfo("--> score session id:%s", score_session_id) score_session_export <- export_path train_session_work <- file.path(work_path, train_session_id) score_session_work <- file.path(work_path, score_session_id) h2o.init(max_mem_size = "4g", nthreads = 2) logdebug("---> H2O started") test_file <- file.path(score_session_work, "test.csv") model_file <- file.path(train_session_work, sprintf("gbm_%s", train_session_id)) stopifnot(file.exists(test_file)) stopifnot(file.exists(model_file)) test_dt <- test_file %>% transform_test() score_model(test_dt = test_dt, model_path = model_file) %>% export_score(export_path = export_path, score_session_id = score_session_id)
  • 96. transform_test #' @export transform_test <- function(test_file) { h2o.importFile(path = test_file, destination_frame = "test_dt", parse = TRUE, header = TRUE, sep = ";") }
  • 97. score_model #' @export score_model <- function(test_dt, model_path) { model <- h2o.loadModel(model_path) pred_dt <- h2o.predict(model, test_dt) pred_dt }
  • 98. export_score #' @export export_score <- function(score_dt, score_session_id, export_path) { score_dt <- as.data.table(score_dt) score_dt[, score_session_id := score_session_id] fwrite(x = score_dt, file = file.path(export_path, "score.csv"), sep = ";", append = TRUE) }
  • 106. Production/spmf/R a. Rscript import_training.R b. Rscript estimate_model.R c. Rscript import_test.R d. Rscript score_model.R Production/spmf/export
  • 108. print
  • 110. loginfo("Phase 1 passed") logdebug("Iter %d done", i) logtrace("Iter %d done", i) logwarning("Are you sure?") logerror("I failed :(") Packages pkg_loginfo("Phase 1 passed") pkg_logdebug("Iter %d done", i) pkg_logtrace("Iter %d done", i) pkg_logwarning("Are you sure?") pkg_logerror("I failed :(")
  • 111. 2017-11-13 13:47:03 INFO::--> Session id:201711122000 2017-11-13 13:47:03 INFO:predmodel:Importing from C:/Workplace/Sandbox/Production/spmf/R/../import/201711122000 into C:/Workplace/Sandbox/Production/spmf/R/../work/201711122000 2017-11-13 13:47:14 INFO::--> Session id:201711122000 2017-11-13 13:47:51 INFO::--> Session id:201711131000 2017-11-13 13:47:51 INFO:predmodel:Importing from C:/Workplace/Sandbox/Production/spmf/R/../import/201711131000 into C:/Workplace/Sandbox/Production/spmf/R/../work/201711131000 2017-11-13 13:47:57 INFO::--> train session id:201711122000 2017-11-13 13:47:57 INFO::--> score session id:201711131000