SlideShare a Scribd company logo
Java BigData Full Stack
Development as is ...
Alexey Zinovyev, Java Trainer in EPAM
About
With IT since 2007
With Java since 2009
With Hadoop since 2012
With EPAM since 2015
3Java Big Data Full Stack Development
Contacts
E-mail : Alexey_Zinovyev@epam.com
Twitter : @zaleslaw @BigDataRussia
vk.com/big_data_russia Big Data Russia
vk.com/java_jvm Java & JVM langs
4Java Big Data Full Stack Development
The Good Old Days
5Java Big Data Full Stack Development
HRs & RMs are looking for Java developers
6Java Big Data Full Stack Development
Is Java Dream Team waiting You?
7Java Big Data Full Stack Development
Required Skills
• Advanced SQL
• Basic Linux
• Core Java & JVM
• Backend Development Experience
• Basic Computer Science Level
8Java Big Data Full Stack Development
REAL WORLD
9Java Big Data Full Stack Development
Let’s just use Javascript in frontend ONLY
10Java Big Data Full Stack Development
In frontend
ONLY?
11Java Big Data Full Stack Development
Cruel world
12Java Big Data Full Stack Development
Do you know ML JS library?
13Java Big Data Full Stack Development
Wild animals everywhere
14Java Big Data Full Stack Development
And what I tell you
15Java Big Data Full Stack Development
And what I tell you
16Java Big Data Full Stack Development
It’s Time for Java Superhero, yeah!
17Java Big Data Full Stack Development
Before patterns discovering you should ..
• Select small pieces
• Define default values for missed
data
• Remove strange signals from data
• Merge some tables in one if
required
18Java Big Data Full Stack Development
How it really works
• Share your date with us
• Our magic manipulations
• Building an answering machine
• PROFIT!!!
19Java Big Data Full Stack Development
How to start?
20Java Big Data Full Stack Development
21Java Big Data Full Stack Development
WHAT IS BIG DATA?
22Java Big Data Full Stack Development
Joke about Excel
23Java Big Data Full Stack Development
5V
24Java Big Data Full Stack Development
Every 60 seconds…
25Java Big Data Full Stack Development
From Mobile Devices
26Java Big Data Full Stack Development
From Industry
27Java Big Data Full Stack Development
We started to keep and handle stupid new things!
28Java Big Data Full Stack Development
10^6 rows
in MySQL
29Java Big Data Full Stack Development
GB->TB->PB->?
30Java Big Data Full Stack Development
Is BigData about PBs?
31Java Big Data Full Stack Development
Is BigData about PBs?
32Java Big Data Full Stack Development
It’s hard to …
• .. store
• .. handle
• .. search in
• .. visualize
• .. send in network
33Java Big Data Full Stack Development
Likes in Classmates: how to count?
34Java Big Data Full Stack Development
Crazy Zoo
2012
35Java Big Data Full Stack Development
Crazy Zoo
2016
36Java Big Data Full Stack Development
What will be
lighted this
training
37Java Big Data Full Stack Development
NOSQL
38Java Big Data Full Stack Development
What’s the problem with RBDMS’s
• Caching
• Master/Slave
• Cluster
• Table Partitioning
• Sharding
39Java Big Data Full Stack Development
Family
40Java Big Data Full Stack Development
Database
party
41Java Big Data Full Stack Development
Spring Data
42Java Big Data Full Stack Development
How to start?
43Java Big Data Full Stack Development
Java MongoDB Driver + Robomongo
44Java Big Data Full Stack Development
BIG DATA TOOL MASTER
VS
DATA SCIENTIST
45Java Big Data Full Stack Development
TRAIN
MODEL
46Java Big Data Full Stack Development
Datasets
• Facebook users, tweets
• Trade transactions
• Government
• Medicine (genomic data)
• Telecommunications
47Java Big Data Full Stack Development
Data Sources
• Relational Databases
• Data warehouses (Historical data)
• Files in CSV or in binary format
• Internet or electronic mails
• Scientific, research (R, Octave,
Matlab)
48Java Big Data Full Stack Development
Hey, man, predict something!
49Java Big Data Full Stack Development
Man or sofa?
50Java Big Data Full Stack Development
Typical questions for DM
• Which loan applicants are high-risk?
51Java Big Data Full Stack Development
Typical questions for DM
• Which loan applicants are high-risk?
• How do we detect phone card fraud?
52Java Big Data Full Stack Development
Typical questions for DM
• Which loan applicants are high-risk?
• How do we detect phone card fraud?
• What is the revenue prediction for next year?
53Java Big Data Full Stack Development
Typical questions for DM
• Which loan applicants are high-risk?
• How do we detect phone card fraud?
• What is the revenue prediction for next year?
• Can you recommend music for users?
54Java Big Data Full Stack Development
Green circle is blue square or red
triangle? Let’s ask its neighbors!
kNN (k-nearest neighbor)
55Java Big Data Full Stack Development
Collaborative Filtering
56Java Big Data Full Stack Development
Machine Learning vs Traditional Programming
57Java Big Data Full Stack Development
Data
Science
58Java Big Data Full Stack Development
Can a Java programmer to be a Data Scientist?
59Java Big Data Full Stack Development
Sexy Data Scientist
60Java Big Data Full Stack Development
Real Data Scientist
61Java Big Data Full Stack Development
How to start?
62Java Big Data Full Stack Development
Weka
63Java Big Data Full Stack Development
HADOOP
64Java Big Data Full Stack Development
Hadoop and Data Knights
65Java Big Data Full Stack Development
Hadoop
66Java Big Data Full Stack Development
MapReduce in different languages
67Java Big Data Full Stack Development
MapReduce for WordCount
68Java Big Data Full Stack Development
Hadoop
Jobs
69Java Big Data Full Stack Development
Hadoop frameworks
• Universal (MapReduce, Tez, RDD in Spark)
• Abstract (Pig, Pipeline Spark)
• SQL - like (Hive, Impala, Spark SQL)
• Processing graph (Giraph, GraphX)
• Machine Learning (Mahout, MLib)
• Stream processing (Spark Streaming, Storm)
70Java Big Data Full Stack Development
SPARK
71Java Big Data Full Stack Development
SPARK: the bloody son of MR
• MapReduce in memory
• Up to 50x faster than Hadoop
• RDD is a basic building block
(immutable distributed
collections of objects)
• Pipeline API (no needs in PIG)
72Java Big Data Full Stack Development
Spark
Family
73Java Big Data Full Stack Development
MLlib supports
• Classification and regression
• Collaborative filtering
• Clustering
• Dimensionality reduction
• Optimization
74Java Big Data Full Stack Development
Code sample MLlib (K-Means)
// Cluster the data into two classes using KMeans
int numClusters = 2;
int numIterations = 20;
KMeansModel clusters = KMeans.train(parsedData.rdd(), numClusters, numIterations);
// Evaluate clustering by computing Within Set Sum of Squared Errors
double WSSSE = clusters.computeCost(parsedData.rdd());
System.out.println("Within Set Sum of Squared Errors = " + WSSSE);
// Save and load model
clusters.save(sc.sc(), "myModelPath");
KMeansModel sameModel = KMeansModel.load(sc.sc(), "myModelPath");
75Java Big Data Full Stack Development
MLlib
• .. extends scikit-learn (Python lib) and Mahout
• .. runs fully on Spark and supports Spark’s Pipeline API
• .. dataset is represented by Spark SQL’s SchemaRDD
• .. supports Hive like external data source
• .. is well for large datasets and parallelized algorithms
76Java Big Data Full Stack Development
It solves all problems!
77Java Big Data Full Stack Development
How to start?
78Java Big Data Full Stack Development
HDP Zoo
79Java Big Data Full Stack Development
Ok, Google!
80Java Big Data Full Stack Development
AWS Amazon
81Java Big Data Full Stack Development
Infrastructure issues are waiting YOU!
82Java Big Data Full Stack Development
DEEP LEARNING
83Java Big Data Full Stack Development
Deep Learning help us build NEW FUTURE
84Java Big Data Full Stack Development
Deep Learning help us build NEW FUTURE
85Java Big Data Full Stack Development
HOW TO LEARN?
86Java Big Data Full Stack Development
1. Read books and write ‘pet’ projects
DIFFERENT WAYS
87Java Big Data Full Stack Development
1. Read books and write ‘pet’ projects
2. Become a mentee in Mentoring Process
DIFFERENT WAYS
88Java Big Data Full Stack Development
1. Read books and write ‘pet’ projects
2. Become a mentee in Mentoring Process
3. MOOC
DIFFERENT WAYS
89Java Big Data Full Stack Development
1. Read books and write ‘pet’ projects
2. Become a mentee in Mentoring Process
3. MOOC
4. Take a training course
DIFFERENT WAYS
90Java Big Data Full Stack Development
1. Read books and write ‘pet’ projects
2. Become a mentee in Mentoring Process
3. MOOC
4. Take a training course
5. Visit conferences
DIFFERENT WAYS
91Java Big Data Full Stack Development
Recommended Books
92Java Big Data Full Stack Development
Contacts
E-mail : Alexey_Zinovyev@epam.com
Twitter : @zaleslaw @BigDataRussia
vk.com/big_data_russia Big Data Russia
vk.com/java_jvm Java & JVM langs

More Related Content

What's hot (20)

PPT
MongoDB Pros and Cons
johnrjenson
 
PDF
HPTS 2011: The NoSQL Ecosystem
Adam Marcus
 
PPTX
What is NoSQL and CAP Theorem
Rahul Jain
 
PDF
NoSQL Databases
BADR
 
PDF
Solr cloud the 'search first' nosql database extended deep dive
lucenerevolution
 
PDF
introduction to Neo4j (Tabriz Software Open Talks)
Farzin Bagheri
 
PPTX
Introduction to Cassandra (June 2010)
gdusbabek
 
PDF
NoSQL-Overview
Ranjeet Jha - OCM-JEA
 
PDF
Introduction to NoSQL
Dimitar Danailov
 
PDF
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
rhatr
 
PDF
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
Jamey Hanson
 
PDF
No sq lv1_0
Tuan Luong
 
PPTX
Sql vs nosql
Nick Verschueren
 
PDF
Plmce2012 scaling pinterest
Mohit Jain
 
PDF
Non Relational Databases
Chris Baglieri
 
PPTX
Big Data tools in practice
Darko Marjanovic
 
PDF
NOSQL Overview
Tobias Lindaaker
 
ODP
Hadoop and Cassandra at Rackspace
Stu Hood
 
PDF
A Hitchhiker's Guide to NOSQL v1.0
Krishna Sankar
 
PPTX
NoSQL
Radu Vunvulea
 
MongoDB Pros and Cons
johnrjenson
 
HPTS 2011: The NoSQL Ecosystem
Adam Marcus
 
What is NoSQL and CAP Theorem
Rahul Jain
 
NoSQL Databases
BADR
 
Solr cloud the 'search first' nosql database extended deep dive
lucenerevolution
 
introduction to Neo4j (Tabriz Software Open Talks)
Farzin Bagheri
 
Introduction to Cassandra (June 2010)
gdusbabek
 
NoSQL-Overview
Ranjeet Jha - OCM-JEA
 
Introduction to NoSQL
Dimitar Danailov
 
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
rhatr
 
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
Jamey Hanson
 
No sq lv1_0
Tuan Luong
 
Sql vs nosql
Nick Verschueren
 
Plmce2012 scaling pinterest
Mohit Jain
 
Non Relational Databases
Chris Baglieri
 
Big Data tools in practice
Darko Marjanovic
 
NOSQL Overview
Tobias Lindaaker
 
Hadoop and Cassandra at Rackspace
Stu Hood
 
A Hitchhiker's Guide to NOSQL v1.0
Krishna Sankar
 

Viewers also liked (20)

PDF
Мастер-класс по BigData Tools для HappyDev'15
Alexey Zinoviev
 
PDF
Google Docs. Zinoviev Alexey
Alexey Zinoviev
 
PDF
HappyDev'15 Keynote: Когда все данные станут большими...
Alexey Zinoviev
 
PPTX
MongoDB первые впечатления
fudz1k
 
PPT
MongoDB basics in Russian
Oleg Kachan
 
ODP
Кратко о MongoDB
Gleb Lebedev
 
ODP
JBoss seam 2 part
Andrey Bratukhin
 
PPTX
MongoDB. Области применения, преимущества и узкие места, тонкости использован...
phpdevby
 
PDF
A22 Introduction to DTrace by Kyle Hailey
Insight Technology, Inc.
 
PPTX
Преимущества NoSQL баз данных на примере MongoDB
UNETA
 
PDF
Docker 基本概念與指令操作
NUTC, imac
 
PDF
Spark Solution for Rank Product
Mahmoud Parsian
 
PDF
Выбор NoSQL базы данных для вашего проекта: "Не в свои сани не садись"
Alexey Zinoviev
 
PDF
Apache Spark Essentials
Sreekanth Kodeboyena
 
PDF
Performance in Spark 2.0, PDX Spark Meetup 8/18/16
pdx_spark
 
PDF
JavaDayKiev'15 Java in production for Data Mining Research projects
Alexey Zinoviev
 
PDF
Joker'16 Spark 2 (API changes; Structured Streaming; Encoders)
Alexey Zinoviev
 
PPTX
Meetup Spark 2.0
José Carlos García Serrano
 
PDF
使用 CLI 管理 OpenStack 平台
NUTC, imac
 
PDF
Joker'15 Java straitjackets for MongoDB
Alexey Zinoviev
 
Мастер-класс по BigData Tools для HappyDev'15
Alexey Zinoviev
 
Google Docs. Zinoviev Alexey
Alexey Zinoviev
 
HappyDev'15 Keynote: Когда все данные станут большими...
Alexey Zinoviev
 
MongoDB первые впечатления
fudz1k
 
MongoDB basics in Russian
Oleg Kachan
 
Кратко о MongoDB
Gleb Lebedev
 
JBoss seam 2 part
Andrey Bratukhin
 
MongoDB. Области применения, преимущества и узкие места, тонкости использован...
phpdevby
 
A22 Introduction to DTrace by Kyle Hailey
Insight Technology, Inc.
 
Преимущества NoSQL баз данных на примере MongoDB
UNETA
 
Docker 基本概念與指令操作
NUTC, imac
 
Spark Solution for Rank Product
Mahmoud Parsian
 
Выбор NoSQL базы данных для вашего проекта: "Не в свои сани не садись"
Alexey Zinoviev
 
Apache Spark Essentials
Sreekanth Kodeboyena
 
Performance in Spark 2.0, PDX Spark Meetup 8/18/16
pdx_spark
 
JavaDayKiev'15 Java in production for Data Mining Research projects
Alexey Zinoviev
 
Joker'16 Spark 2 (API changes; Structured Streaming; Encoders)
Alexey Zinoviev
 
使用 CLI 管理 OpenStack 平台
NUTC, imac
 
Joker'15 Java straitjackets for MongoDB
Alexey Zinoviev
 
Ad

Similar to Java BigData Full Stack Development (version 2.0) (20)

PPTX
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
TXT
Books neended
Abhinav Kumar
 
PDF
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India event
The Hive
 
PDF
Big data processing with apache spark
sarith divakar
 
PDF
Spark Will Replace Hadoop ! Know Why
Edureka!
 
PPTX
Hands On: Introduction to the Hadoop Ecosystem
Adaryl "Bob" Wakefield, MBA
 
PPTX
Big data hadoop training in pune course content advanto software
Advanto Software
 
PDF
Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data
Hejwowski Piotr
 
PPT
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Andrew Brust
 
PPTX
INTRODUCTION OF BIG DATA
HarshitChaurasia6
 
PDF
Evolution of apache spark
datamantra
 
PDF
Hadoop Master Class : A concise overview
Abhishek Roy
 
PDF
Spark For Faster Batch Processing
Edureka!
 
PDF
Started with-apache-spark
Happiest Minds Technologies
 
PDF
Big Data Analytics With Java 1st Rajat Mehta
jnuozdz0702
 
PPT
Data analytics & its Trends
Dr.K.Sreenivas Rao
 
PPTX
Big Data and NoSQL for Database and BI Pros
Andrew Brust
 
PDF
Spark SQL | Apache Spark
Edureka!
 
PDF
Big Data Processing With Spark
Edureka!
 
ODP
Hadoop and Big Data for Absolute Beginners
Sam Dias
 
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
Books neended
Abhinav Kumar
 
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India event
The Hive
 
Big data processing with apache spark
sarith divakar
 
Spark Will Replace Hadoop ! Know Why
Edureka!
 
Hands On: Introduction to the Hadoop Ecosystem
Adaryl "Bob" Wakefield, MBA
 
Big data hadoop training in pune course content advanto software
Advanto Software
 
Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data
Hejwowski Piotr
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Andrew Brust
 
INTRODUCTION OF BIG DATA
HarshitChaurasia6
 
Evolution of apache spark
datamantra
 
Hadoop Master Class : A concise overview
Abhishek Roy
 
Spark For Faster Batch Processing
Edureka!
 
Started with-apache-spark
Happiest Minds Technologies
 
Big Data Analytics With Java 1st Rajat Mehta
jnuozdz0702
 
Data analytics & its Trends
Dr.K.Sreenivas Rao
 
Big Data and NoSQL for Database and BI Pros
Andrew Brust
 
Spark SQL | Apache Spark
Edureka!
 
Big Data Processing With Spark
Edureka!
 
Hadoop and Big Data for Absolute Beginners
Sam Dias
 
Ad

More from Alexey Zinoviev (20)

PDF
Kafka pours and Spark resolves
Alexey Zinoviev
 
PDF
Hadoop Jungle
Alexey Zinoviev
 
PDF
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Alexey Zinoviev
 
PDF
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Alexey Zinoviev
 
PDF
Joker'14 Java as a fundamental working tool of the Data Scientist
Alexey Zinoviev
 
PDF
First steps in Data Mining Kindergarten
Alexey Zinoviev
 
PDF
EST: Smart rate (Effective recommendation system for Taxi drivers based on th...
Alexey Zinoviev
 
PDF
Android Geo Apps in Soviet Russia: Latitude and longitude find you
Alexey Zinoviev
 
PDF
Keynote on JavaDay Omsk 2014 about new features in Java 8
Alexey Zinoviev
 
PDF
Big data algorithms and data structures for large scale graphs
Alexey Zinoviev
 
PDF
"Говнокод-шоу"
Alexey Zinoviev
 
PDF
Алгоритмы и структуры данных BigData для графов большой размерности
Alexey Zinoviev
 
PDF
ALMADA 2013 (computer science school by Yandex and Microsoft Research)
Alexey Zinoviev
 
PDF
GDG Devfest Omsk 2013. Year of events!
Alexey Zinoviev
 
PDF
How to port JavaScript library to Android and iOS
Alexey Zinoviev
 
PDF
Поездка на IT-DUMP 2012
Alexey Zinoviev
 
PDF
MyBatis и Hibernate на одном проекте. Как подружить?
Alexey Zinoviev
 
PDF
Google I/O туда и обратно.
Alexey Zinoviev
 
PDF
Google Maps. Zinoviev Alexey.
Alexey Zinoviev
 
PDF
ORM battle. MyBatis vs Hibernate
Alexey Zinoviev
 
Kafka pours and Spark resolves
Alexey Zinoviev
 
Hadoop Jungle
Alexey Zinoviev
 
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Alexey Zinoviev
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Alexey Zinoviev
 
Joker'14 Java as a fundamental working tool of the Data Scientist
Alexey Zinoviev
 
First steps in Data Mining Kindergarten
Alexey Zinoviev
 
EST: Smart rate (Effective recommendation system for Taxi drivers based on th...
Alexey Zinoviev
 
Android Geo Apps in Soviet Russia: Latitude and longitude find you
Alexey Zinoviev
 
Keynote on JavaDay Omsk 2014 about new features in Java 8
Alexey Zinoviev
 
Big data algorithms and data structures for large scale graphs
Alexey Zinoviev
 
"Говнокод-шоу"
Alexey Zinoviev
 
Алгоритмы и структуры данных BigData для графов большой размерности
Alexey Zinoviev
 
ALMADA 2013 (computer science school by Yandex and Microsoft Research)
Alexey Zinoviev
 
GDG Devfest Omsk 2013. Year of events!
Alexey Zinoviev
 
How to port JavaScript library to Android and iOS
Alexey Zinoviev
 
Поездка на IT-DUMP 2012
Alexey Zinoviev
 
MyBatis и Hibernate на одном проекте. Как подружить?
Alexey Zinoviev
 
Google I/O туда и обратно.
Alexey Zinoviev
 
Google Maps. Zinoviev Alexey.
Alexey Zinoviev
 
ORM battle. MyBatis vs Hibernate
Alexey Zinoviev
 

Recently uploaded (20)

PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 

Java BigData Full Stack Development (version 2.0)

  • 1. Java BigData Full Stack Development as is ... Alexey Zinovyev, Java Trainer in EPAM
  • 2. About With IT since 2007 With Java since 2009 With Hadoop since 2012 With EPAM since 2015
  • 3. 3Java Big Data Full Stack Development Contacts E-mail : [email protected] Twitter : @zaleslaw @BigDataRussia vk.com/big_data_russia Big Data Russia vk.com/java_jvm Java & JVM langs
  • 4. 4Java Big Data Full Stack Development The Good Old Days
  • 5. 5Java Big Data Full Stack Development HRs & RMs are looking for Java developers
  • 6. 6Java Big Data Full Stack Development Is Java Dream Team waiting You?
  • 7. 7Java Big Data Full Stack Development Required Skills • Advanced SQL • Basic Linux • Core Java & JVM • Backend Development Experience • Basic Computer Science Level
  • 8. 8Java Big Data Full Stack Development REAL WORLD
  • 9. 9Java Big Data Full Stack Development Let’s just use Javascript in frontend ONLY
  • 10. 10Java Big Data Full Stack Development In frontend ONLY?
  • 11. 11Java Big Data Full Stack Development Cruel world
  • 12. 12Java Big Data Full Stack Development Do you know ML JS library?
  • 13. 13Java Big Data Full Stack Development Wild animals everywhere
  • 14. 14Java Big Data Full Stack Development And what I tell you
  • 15. 15Java Big Data Full Stack Development And what I tell you
  • 16. 16Java Big Data Full Stack Development It’s Time for Java Superhero, yeah!
  • 17. 17Java Big Data Full Stack Development Before patterns discovering you should .. • Select small pieces • Define default values for missed data • Remove strange signals from data • Merge some tables in one if required
  • 18. 18Java Big Data Full Stack Development How it really works • Share your date with us • Our magic manipulations • Building an answering machine • PROFIT!!!
  • 19. 19Java Big Data Full Stack Development How to start?
  • 20. 20Java Big Data Full Stack Development
  • 21. 21Java Big Data Full Stack Development WHAT IS BIG DATA?
  • 22. 22Java Big Data Full Stack Development Joke about Excel
  • 23. 23Java Big Data Full Stack Development 5V
  • 24. 24Java Big Data Full Stack Development Every 60 seconds…
  • 25. 25Java Big Data Full Stack Development From Mobile Devices
  • 26. 26Java Big Data Full Stack Development From Industry
  • 27. 27Java Big Data Full Stack Development We started to keep and handle stupid new things!
  • 28. 28Java Big Data Full Stack Development 10^6 rows in MySQL
  • 29. 29Java Big Data Full Stack Development GB->TB->PB->?
  • 30. 30Java Big Data Full Stack Development Is BigData about PBs?
  • 31. 31Java Big Data Full Stack Development Is BigData about PBs?
  • 32. 32Java Big Data Full Stack Development It’s hard to … • .. store • .. handle • .. search in • .. visualize • .. send in network
  • 33. 33Java Big Data Full Stack Development Likes in Classmates: how to count?
  • 34. 34Java Big Data Full Stack Development Crazy Zoo 2012
  • 35. 35Java Big Data Full Stack Development Crazy Zoo 2016
  • 36. 36Java Big Data Full Stack Development What will be lighted this training
  • 37. 37Java Big Data Full Stack Development NOSQL
  • 38. 38Java Big Data Full Stack Development What’s the problem with RBDMS’s • Caching • Master/Slave • Cluster • Table Partitioning • Sharding
  • 39. 39Java Big Data Full Stack Development Family
  • 40. 40Java Big Data Full Stack Development Database party
  • 41. 41Java Big Data Full Stack Development Spring Data
  • 42. 42Java Big Data Full Stack Development How to start?
  • 43. 43Java Big Data Full Stack Development Java MongoDB Driver + Robomongo
  • 44. 44Java Big Data Full Stack Development BIG DATA TOOL MASTER VS DATA SCIENTIST
  • 45. 45Java Big Data Full Stack Development TRAIN MODEL
  • 46. 46Java Big Data Full Stack Development Datasets • Facebook users, tweets • Trade transactions • Government • Medicine (genomic data) • Telecommunications
  • 47. 47Java Big Data Full Stack Development Data Sources • Relational Databases • Data warehouses (Historical data) • Files in CSV or in binary format • Internet or electronic mails • Scientific, research (R, Octave, Matlab)
  • 48. 48Java Big Data Full Stack Development Hey, man, predict something!
  • 49. 49Java Big Data Full Stack Development Man or sofa?
  • 50. 50Java Big Data Full Stack Development Typical questions for DM • Which loan applicants are high-risk?
  • 51. 51Java Big Data Full Stack Development Typical questions for DM • Which loan applicants are high-risk? • How do we detect phone card fraud?
  • 52. 52Java Big Data Full Stack Development Typical questions for DM • Which loan applicants are high-risk? • How do we detect phone card fraud? • What is the revenue prediction for next year?
  • 53. 53Java Big Data Full Stack Development Typical questions for DM • Which loan applicants are high-risk? • How do we detect phone card fraud? • What is the revenue prediction for next year? • Can you recommend music for users?
  • 54. 54Java Big Data Full Stack Development Green circle is blue square or red triangle? Let’s ask its neighbors! kNN (k-nearest neighbor)
  • 55. 55Java Big Data Full Stack Development Collaborative Filtering
  • 56. 56Java Big Data Full Stack Development Machine Learning vs Traditional Programming
  • 57. 57Java Big Data Full Stack Development Data Science
  • 58. 58Java Big Data Full Stack Development Can a Java programmer to be a Data Scientist?
  • 59. 59Java Big Data Full Stack Development Sexy Data Scientist
  • 60. 60Java Big Data Full Stack Development Real Data Scientist
  • 61. 61Java Big Data Full Stack Development How to start?
  • 62. 62Java Big Data Full Stack Development Weka
  • 63. 63Java Big Data Full Stack Development HADOOP
  • 64. 64Java Big Data Full Stack Development Hadoop and Data Knights
  • 65. 65Java Big Data Full Stack Development Hadoop
  • 66. 66Java Big Data Full Stack Development MapReduce in different languages
  • 67. 67Java Big Data Full Stack Development MapReduce for WordCount
  • 68. 68Java Big Data Full Stack Development Hadoop Jobs
  • 69. 69Java Big Data Full Stack Development Hadoop frameworks • Universal (MapReduce, Tez, RDD in Spark) • Abstract (Pig, Pipeline Spark) • SQL - like (Hive, Impala, Spark SQL) • Processing graph (Giraph, GraphX) • Machine Learning (Mahout, MLib) • Stream processing (Spark Streaming, Storm)
  • 70. 70Java Big Data Full Stack Development SPARK
  • 71. 71Java Big Data Full Stack Development SPARK: the bloody son of MR • MapReduce in memory • Up to 50x faster than Hadoop • RDD is a basic building block (immutable distributed collections of objects) • Pipeline API (no needs in PIG)
  • 72. 72Java Big Data Full Stack Development Spark Family
  • 73. 73Java Big Data Full Stack Development MLlib supports • Classification and regression • Collaborative filtering • Clustering • Dimensionality reduction • Optimization
  • 74. 74Java Big Data Full Stack Development Code sample MLlib (K-Means) // Cluster the data into two classes using KMeans int numClusters = 2; int numIterations = 20; KMeansModel clusters = KMeans.train(parsedData.rdd(), numClusters, numIterations); // Evaluate clustering by computing Within Set Sum of Squared Errors double WSSSE = clusters.computeCost(parsedData.rdd()); System.out.println("Within Set Sum of Squared Errors = " + WSSSE); // Save and load model clusters.save(sc.sc(), "myModelPath"); KMeansModel sameModel = KMeansModel.load(sc.sc(), "myModelPath");
  • 75. 75Java Big Data Full Stack Development MLlib • .. extends scikit-learn (Python lib) and Mahout • .. runs fully on Spark and supports Spark’s Pipeline API • .. dataset is represented by Spark SQL’s SchemaRDD • .. supports Hive like external data source • .. is well for large datasets and parallelized algorithms
  • 76. 76Java Big Data Full Stack Development It solves all problems!
  • 77. 77Java Big Data Full Stack Development How to start?
  • 78. 78Java Big Data Full Stack Development HDP Zoo
  • 79. 79Java Big Data Full Stack Development Ok, Google!
  • 80. 80Java Big Data Full Stack Development AWS Amazon
  • 81. 81Java Big Data Full Stack Development Infrastructure issues are waiting YOU!
  • 82. 82Java Big Data Full Stack Development DEEP LEARNING
  • 83. 83Java Big Data Full Stack Development Deep Learning help us build NEW FUTURE
  • 84. 84Java Big Data Full Stack Development Deep Learning help us build NEW FUTURE
  • 85. 85Java Big Data Full Stack Development HOW TO LEARN?
  • 86. 86Java Big Data Full Stack Development 1. Read books and write ‘pet’ projects DIFFERENT WAYS
  • 87. 87Java Big Data Full Stack Development 1. Read books and write ‘pet’ projects 2. Become a mentee in Mentoring Process DIFFERENT WAYS
  • 88. 88Java Big Data Full Stack Development 1. Read books and write ‘pet’ projects 2. Become a mentee in Mentoring Process 3. MOOC DIFFERENT WAYS
  • 89. 89Java Big Data Full Stack Development 1. Read books and write ‘pet’ projects 2. Become a mentee in Mentoring Process 3. MOOC 4. Take a training course DIFFERENT WAYS
  • 90. 90Java Big Data Full Stack Development 1. Read books and write ‘pet’ projects 2. Become a mentee in Mentoring Process 3. MOOC 4. Take a training course 5. Visit conferences DIFFERENT WAYS
  • 91. 91Java Big Data Full Stack Development Recommended Books
  • 92. 92Java Big Data Full Stack Development Contacts E-mail : [email protected] Twitter : @zaleslaw @BigDataRussia vk.com/big_data_russia Big Data Russia vk.com/java_jvm Java & JVM langs