Big Data & Security Training Guide
Big Data & Security Training Guide
1st Day
Ernesto Damiani
First Day Outline
• Introduction
• Prerequisites
– Data preparation
– Data warehousing
– Data mining
• Processing models and MapReduce paradigm
• Big Data storage basics
– Sharding
– CAP Theorem
Introduction
BIG DATA INITIATIVE
Drive open research & innovation collaboration with UAE and international
institutes and organisations to carry world leading research and deliver tangible
value, training, knowledge transfer and skills development in line with the UAE
strategic priorities in the areas of :
Smart enterprise, smart infrastructure & smart society
• Batch vs streaming
• Hash vs sketch
Processing models: a change in paradigm
(1)
• Traditional data processing model:
SWAPPING
– Data are read BLOCK by BLOCK
– Disk -> RAM -> cache -> register
• Unfeasible for current data sizes
– Google indexes 10 billion pages, 200 Tbyte
– Disk bandwidth: 50 Mbyte/sec (for
contiguous reads!)
– Reading time > 40 days (!)
17
A change in paradigm (2)
• From SWAPPING to CLUSTERING
– Data partitioned in CHUNKS, each going to a
different node of the cluster
– Move computation to chunk sites (bring
computations to data)
– 10 Tbytes, 1000 nodes -> 10 Gbytes per node -> can
use SWAPPING locally
18
A change in paradigm (3)
• Problem: local servers may need to exchange data prior to
processing
– Example: I want to compute the average driving time from/to all
Emirates
• Chunks I have: each chunk contains records collected at arrival site A ,
showing trips form X to A (for all Xs)
• Chunk I need: each chunk containing records showing trips form X to
A (for a given X)
– Shuffling is needed -> BUT I need to consider network latency
• Assume to move 10 Tbytes at 1 Gbyte/sec -> many hours
– Need to do shuffling at collection, NOT at computation time
A change in paradigm (4)
• Even after I distribute computation to chunk
servers, some may fail
• Node failure
– 1000 servers – 1 failure per day
– Use redundancy for chunk persistence and
availability
– Build a redundant storage infrastructure -> multiple
replicas of each chunk server
20
Google GFS/HDFS
• Master-slave
• Slave servers: contain large data files, 16/64 Gbytes
each
• Master servers: contain metadata on file distribution
– Talk to master to find slave servers
• Google equation:
• Slave servers = Computation servers
• Avoid moving data
• Killer application: build global index
21
After Google: a vision
• Big Data is not just a technological advance but represents a
paradigm shift in extracting value from complex multi-party
scenarios
– Coming up: European Big Data ecosystem of data owners,
transformers, distributors, and analyzers.
• In this ecosystem:
– Data owners/suppliers compete to provide high quality and
timely information. Each supplier has its own privacy and
confidentiality constraints, as well as its level of data
quality, trustworthiness and credibility.
– Data transformers compete to sanitize, prepare and
publish data sets and streams
– Data analyzers compete to collect the “best” data from
data suppliers and transformers and process them to
deliver quality suggestions to decision makers within the
allotted time.
Actors and Roles
• Roles: collector, transformer, user
• Actors:
– Over-The-Top (OTT) operator: global application level
players (Google, Amazon..) who collect application
level data, use them for value-added-services
– Telcos (Vodafone, Etisalat,..): transport providers,
whio collect data at the lower stack levels, use them
for value-added-services (also to themselves)
– Cloud providers doing IaaS and SaaS: (Amazon,
Aruba..): collect data at all levels
Overlapping interests... different questions
• TELCO QUESTIONS • OTT QUESTIONS
• Traffic estimation • Traffic estimation
– How many bytes were sent between a pair of – Apps having at least 5k downloads in the last 20
devices? seconds
– What fraction network IP addresses are active? – List top 100 locations in terms of purchase
• Traffic analysis transactions
– What is the average duration of a session? • Traffic analysis
– What is the median of the number of bytes in – What is the per-app duration of a user session?
each session? – Median size of users’ personal cloud storage
• Fraud detection • Fraud detection
– List all sessions originating from location X that – List all purchases made by user X in the last 20
transmitted more than 1000 bytes minutes
– Identify all sessions started in the last 20 – Identify all purchases whose amount was more
seconds with duration more than twice normal than twice normal
• Security/Denial of Service
• Security/Denial of Service
– List all IP addresses that witnessed a sudden – List all services that have witnessed a sudden
spike in traffic spike in usage
– Identify IP who were in more than 1000 – Identify Iservices involved in more than 1000
sessions sessions
Sample Telco-style application:
Network Monitoring
Source Destination Duration Bytes Protocol
10.1.0.2 16.2.3.7 12 20K http
Network Operations 18.6.7.1 12.4.0.3 16 24K http
SNMP/RMON, 13.9.4.3 11.6.8.2 15 20K http
Center (NOC)
NetFlow records 15.2.2.9 17.1.2.1 19 40K http
12.4.3.8 14.8.7.4 26 58K http
10.5.1.3 13.0.0.1 27 100K ftp
Peer 11.1.0.6 10.3.4.5 32 300K ftp
Converged IP/MPLS
Core 19.7.1.2 16.5.5.8 18 80K ftp
Example NetFlow IP
Session Data
Enterprise PSTN
Networks
• FR, ATM, IP VPN Access Mobile Network • Voice over IP
• Before: stored and shipped off-site to data warehouse for off-line analysis
• Today: analysis must be done within the network
25
Analytics models: static vs. dynamic
Handling batch and streams together:
the lambda architecture
EBTIC Methodology and Reference
Architecture
Open issues
• Enforcement of privacy, trustworthiness and access control
– Fast data granulation techniques for bringing large volumes of data to a
granularity and detail level compatible with the privacy preferences and
non-disclosure requirements of the data owners.
– Fast data filtering techniques to bring data at the desired level of
trustworthiness and credibility, as well as enforcing data owners’ access
authorizations.
• Dynamically adaptable data representation
– Dynamically adaptable semantic enrichment, e.g. by adding references to
each other, to external data vocabularies, or to other reference sources.
– Quantitative improvement (1) by “purchasing” additional data (e.g.,
publicly available open data) to add new information (2) by processing
available information to make implicit facts explicit.
– Qualitative improvement by turning data into assertions drawn on external
formal vocabularies
• Data and analytics co-provisioning
– Dynamic provisioning of data and of analytics tailored to the data
representation and distribution.
Prerequisites
What Data?
• Case management data
– Audits / Dynamic processing
• CRM /ERP data
– SAP, Oracle, …
• Servicedesk ITIL data
– HP service manager
• Internet click STREAMS
• Netflow data
• IoT
– Device hardware events
Data quality
• Data are of high quality if
– they are fit for their intended uses in operations,
decision making and planning.
– they correctly represent the real-world construct
to which they refer
• As data volume increases, the question of
internal consistency within data becomes
paramount, regardless of fitness for use for
any external purpose (J. M. Juran)
Adapted from: http://en.wikipedia.org/wiki/Data_quality
32
Definitions of data quality
• Degree of excellence exhibited by the data in relation to the
portrayal of the actual scenario.
• The state of completeness, validity, consistency, timeliness
and accuracy that makes data appropriate for a specific
use.
• The totality of features and characteristics of data that
bears on their ability to satisfy a given purpose; the sum of
the degrees of excellence for factors related to data.
• The processes and technologies involved in ensuring the
conformance of data values to business requirements and
acceptance criteria.
• Complete, standards based, consistent, accurate and time
stamped.
Adapted from: http://en.wikipedia.org/wiki/Data_quality
33
Challenges for data quality in a process
analysis context
General:
• No history (just latest activity, current/last status) OR too
much history, very large datasets
IDs:
• No IDs, across multiple systems OR multiple IDs
(customer_id, update_id, session_id)
Time:
• Unrepresentative or wrong timestamps
Activity:
• System / automated activities vs. human-interaction activities
vs. unregistered activities
34
What is data preparation?
• Cleaning
• Integration
• Selection
• Transformation
35
Challenges for data preparation
Cleaning:
• Removing duplicates / correcting date-timestamps / ID
disambiguation
Integration:
• Merging data sources
– How to deal with large blobs full of multimedia/ free text?
Selection:
• Connecting IDs to follow end-to-end process
Transformation:
• Formatting: activities in columns, you lose loops and assume a
pre-specified process is followed
• Which environment is receiving the prepared data?
36
Sampling
• Population: the set or universe of all entities under study.
• Looking at the entire population may not be feasible, or too
expensive.
• Instead, one draws a random sample from the population,
and compute appropriate statistics from the sample, that give
estimates of the corresponding population parameters of
interest.
What is a sample?
• Sampling is a general technique for tackling massive amounts
of data
• Example: To compute the median packet size of a stream of IP packets, we
just sample some and use the median of the sample as an estimate for the
true median. Statistical arguments relate the size of the sample to the
accuracy of the estimate.
• How big need the sample to be?.
– A poll to predict election outcomes could get by with no more than a
couple of thousand respondents to gauge the attitudes of millions of
voters.
– Even a massive data source with billions of rows may be just a small
portion of the data that could potentially be collected from the real
world (e.g., atmospheric temperature or pressure samples)
• How do we know how big a sample we need?
– Classical statistics has methods for that.
– The regression criterion: what you discover from your sample must
holds when tested on additional data
Randomness
• “Random sample” means that every case in the
population has an equal opportunity to get in the
sample.
– The most fundamental assumption of statistical
analysis is that samples are random
• If you use all the data in a dataset, you are not
really avoiding sampling.
– You will use your analysis to draw conclusions about future
cases – cases that are not in your data set today. So your Big
Data is still just a very, very big sample from the population that
matters to you
Random Sample and Statistics (II)
• Let Xi denote a random variable (e.g.: waiting time to
get a service) corresponding to data point xi (a timed
wait). Then, a statistic θ is a function θ : (X1, X2, · · · ,
Xn) → R.
• If we use the value of a statistic to estimate a
population parameter, this value is called a point
estimate of the parameter, and the statistic is called
as an estimator of the parameter.
• More with Dr. Gabriel later
Not always trivial
• How do you sample a sliding window on a
stream of unknown length?
Food for thought:
Sampled data vs. full data
• Since data mining began to take hold in the
late nineties, “sampling” has been neglected
• The Big Data frenzy is leading many to
conclude that size always means more
predictive power and value. The more data
the better, the biggest analysis is the best.
• Is this really the case?
Sampled data vs. full data
• Two legitimate reasons for abandoning sampling.
1. The customer base of Big Data data mining tools
is not trained in sampling techniques.
2. Going full data allows looking for extreme cases.
1. For example, in intelligence or security applications,
only a few cases out of millions may exhibit behavior
indicative of threatening activity. So anshould alysts
in those fields have a good reason to go full data.
Disclaimer 1
• More is not necessarily better.
– Analyzing massive quantities of data
consumes a lot of resources, in computing
power, storage space, in the effort of the
analyst.
– There’s also the matter of data quality. It is
easier to assure that a modest-sized sample
is cleaned up than a huge repository.
Disclaimer 2
Metadata Data
Warehouse
Merge-Integrate
Source Source
47
Star Schemas
48
Terms
sale
orderId
date customer
product
custId custId
• Fact table prodId
name prodId name
storeId address
price
• Dimension tables qty city
amt
• Measures
store
storeId
city
49
Star
product prodId name price store storeId city
p1 bolt 10 c1 nyc
p2 nut 5 c2 sfo
c3 la
50
Cube
dimensions = 2
51
3-D Cube
dimensions = 3
52
ROLAP vs. MOLAP
• ROLAP:
Relational On-Line Analytical Processing
• MOLAP:
Multi-Dimensional On-Line Analytical
Processing
53
Aggregate metrics
• Add up amounts for day 1
• In SQL: SELECT sum(amt) FROM SALE
WHERE date = 1
54
Aggregate metrics
• Add up amounts by day
• In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date
55
Another Example
• Add up amounts by day, product
• In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date, prodId
sale prodId storeId date amt
p1 c1 1 12 sale prodId date amt
p2 c1 1 11
p1 1 62
p1 c3 1 50
p2 1 19
p2 c2 1 8
p1 c1 2 44
p1 2 48
p1 c2 2 4
rollup
drill-down
56
Aggregate metrics
• Operators: sum, count, max, min,
median, ave
• “Having” clause
• Using dimension hierarchy
– average by region (within store)
– maximum by month (within date)
57
Data Mining
• Discovery of useful, possibly unexpected,
patterns in data
• Non-trivial extraction of implicit, previously
unknown and potentially useful information
from data
• Exploration & analysis, by automatic or
semi-automatic means, of large quantities of
data in order to discover meaningful patterns
Data Mining Tasks
• Classification [Predictive]
• Clustering [Descriptive]
• Association Rule Discovery [Descriptive]
• Sequential Pattern Discovery [Descriptive]
• Regression [Predictive]
• Deviation Detection [Predictive]
• Collaborative Filter [Predictive]
Classification: Definition
• Given a collection of records (training set )
– Each record contains a set of attributes, one of the
attributes is the class.
• Find a model for class attribute as a function of
the values of other attributes.
• Goal: previously unseen records should be
assigned a class as accurately as possible.
– A test set is used to determine the accuracy of the
model. Usually, the given data set is divided into
training and test sets, with training set used to
build the model and test set used to validate it.
Decision Trees
Example:
• Conducted survey to see what customers were
interested in new model car
• Want to select customers for advertising campaign
61
Clustering
income
education
age
62
Association Rule Mining
63
Repeated clustering
• Goal: predict what products/services… a customer may be
interested in, on the basis of
– Past preferences of the person
– Preferences of other people with similar past preferences
• Repeated clustering
– Cluster people based on preferences
– Cluster products liked by the same clusters of people
– Again cluster people based on their preferences for (the newly created
clusters of) products
– Repeat till no more changes occur
64
Other Types of Mining
• Text mining: application of data mining to textual
documents
– cluster Web pages to find related pages
– cluster pages a user has visited (visit history)
– automatically classify Web pages into a directory
• Graph Mining:
– Deal with graph data
– RDF-style models (more later)
65
Enters Big Data
• Classic analytics assume:
– Standard data models/formats
– Reasonable volumes
– Loose deadlines
• Problem: The five Vs jeopardise these
assumptions – (unless we sample or
summarize)
Scaling Up vs Scaling Out
• Issues with scaling up when the dataset is just too big
• RDBMS were not designed to be distributed
– Best way to provide ACID and a rich query model is to have
the dataset on a single machine. However, there are limits to
scaling up (Vertical Scaling).
– Past a certain point, an organization will find it is cheaper and
more feasible to scale out (horizontal scaling) by adding
smaller, more inexpensive (relatively) servers rather than
investing in a single larger server.
• A number of different approaches to scaling out
(Horizontal Scaling).
• Two approaches:
– Master-slave
– Sharding
Back to Processing Models
Map/Reduce Basics
• Map/Reduce is a programming model for
efficient distributed computing
• It works like a Unix pipeline:
– cat input | grep | sort | uniq –c |cat > output
Input (GFS/HDFS) | Map | Shuffle & Sort |
Reduce | Output (GFS)
– Map is called for each <key,value> pair
in the GFS input
– No support for import and export from
GFS/HDFS
Where does efficiency come from
• Efficiency from
– Streaming through data
• No disk seeks
• Disks are streamed to HDFS, then HDFS <key
value> pairs are mapped/reduced
– Pipelining
• Mapping is parallel
• A good fit for a lot of applications
– Web index building (Google!)
– Log processing
Example
• Need to build the histogram of English word frequencies, in all
Shakespeare plays
• Solution 1 (swapping): There is enough room in memory for the file of all
Shakespeare plays
– Load file from disk
– Scan file in memory and create <word, count> table
• Solution 2: (scanning) File too big for memory, but the list of all words
used by Shakespeare fits
– Create <word, counter> array (no duplicate words)
– Scan (not load!) file on disk and update counter when you find word
– Faster
• Solution 3 (chunking): Even the list of all words used by Shakespeare
does not fit
– Scan file on disk and extract <word, 1> entries (will get duplicate words)
– Send for each word, send <word, 1> entries to a “chunk server”
– Compute <word, n> on each chunk server
– Even faster
Map/Reduce Dataflow
• Input: list of <key,
value> pairs
• Map task: for each
<k,v> pair, gives a
set of <k’,v’>
• Reduce task: for
each set of <k’,v’>
computes an
integer x
Map-reduce computation (1)
GFS data: Lines from ALL SHAKESPEARE’S PLAYS
• <1. «To be or not to be»>
• <2. «To die, to sleep … perhaps to dream»>
• ….
Lines mapped to MAP “line servers”. Start from sets of «word, 1» and compute entries
«word, k». Nodes for line 1 and 2:
• MAP1: MAP2:
To 1 To 1 “To” set (“To” is now the key):
To1 To 1 Node will compute «To, 3»
Be 1 To 1
Be1
Or 1 Sleep 1
Not 1 Perhaps 1
Die 1
Dream 1
Map-reduce computation (2)
Lines corresponding to keywords are sent to REDUCE
node.
• REDUCE NODE receives «To, k» data from line servers:
To 2
To 3
Received data with same key are reduced to histogram
entry:
«To 5»
MapReduce: Word Count Example
(Pseudocode)
map(String input_key, String input_value):
// input_key: document name
// input_value: document contents
for each word w in input_value:
EmitIntermediate(w, "1");
http://research.google.com/archive/mapreduce-osdi04-slides/index-auto-0004.html
Combiners
• Problem: too many sources
• Massive number of Reduce nodes
• Solution: do Reduce in two steps
– First: compute histogram for each single play (at
combiner nodes)
– Second: compute histogram of histograms (at final
reduce node)
– Function to be used is the same for Combiners
and for final Reduce
When it does not work
• Compute the average of word counts (rather
than histogram)
• With a single Reduce node: no problem
– Just divide each word counter by the total number
of words
• But: can you use recombiners?
When it does not work
• No, you cannot recombine
• For a given word, average of word count across all
Shakespare plays is NOT the average of per-play
averages
# wordi in all plays / #words in all plays <> S Playj #wordi in Playj/ #words in Playj
Source: https://developer.yahoo.com/hadoop/tutorial/module4.html
Concept of hashing
• The problem at hand is to define and implement a
mapping from a domain of keys to a domain of
locations
• From the performance standpoint, the goal is to
avoid collisions (A collision occurs when two or
more keys map to the same location)
• From the compactness standpoint, no application
ever stores all keys in a domain simultaneously
unless the size of the domain is small
Concept of hashing (2)
• The information to be retrieved is stored in a hash
table which is best thought of as an array of m
locations, called buckets
• The mapping between a key and a bucket is called
the hash function
• The time to store and retrieve data is proportional
to the time to compute the hash function
Hashing function
• The ideal function, termed a perfect hash function,
would distribute all elements across the buckets such
that no collisions ever occurred
• h(v) = f(v) mod m
• Knuth (1973) suggests using as the value for m a
prime number
Hashing function (3)
• It is usually better to treat v as a sequence of bytes
and do one of the following for f(v):
(1) Sum or multiply all the bytes. Overflow can be
ignored
(2) Use the last (or middle) byte instead of the first
(3) Use the square of a few of the middle bytes
Hash in MapReduce
• A typical default is to hash the key and use the
hash value modulo the number of reducers.
• It is important to pick a partition function that
gives an approximately uniform distribution of
data per shard for load-balancing purposes,
otherwise the MapReduce operation can be held
up waiting for slow reducers (reducers assigned
more than their share of data) to finish.
• Will resume hashing in the third day
Example: Hash Join
•Read from two sets of reducer outputs that share the same
hashing buckets
•One is used as a build set and the other probe
94
The sources
BigTable
• Three major papers were the http://labs.google.com/papers/bigta
ble.html
seeds of the NoSQL movement Dynamo
http://www.allthingsdistributed.co
– BigTable (Google) m/2007/10/amazons_dynamo.html
and http://www.allthingsdistribute
– Dynamo (Amazon) d.com/files/amazon-dynamo-
sosp2007.pdf
• Gossip protocol (discovery and Amazon and consistency
error detection) http://www.allthingsdistributed.co
m/2010/02
• Distributed key-value data store http://www.allthingsdistributed.co
m/2008/12
• Eventual consistency
– CAP Theorem
ACID
Atomic: Either the whole process of a transaction is
done or none is.
Consistency: Database constraints (application-
specific) are preserved.
Isolation: It appears to the user as if only one process
executes at a time. (Two concurrent transactions will
not see on another’s transaction while “in flight”.)
Durability: The updates made to the database in a
committed transaction will be visible to future
transactions. (Effects of a process do not get lost if
the system crashes.)
CAP Theorem
• Proposed fifteen years go by Eric Brewer (talk on Principles of
Distributed Computing, July 2000).
• Three properties of a system: consistency, availability and partitions
• You can have at most two of these three properties for any shared-
data system
• To scale out, you have to partition. That leaves either consistency
or availability to choose from
– In almost all cases, you would choose availability over consistency
• Two kinds of consistency:
– strong consistency – ACID (Atomicity Consistency Isolation
Durability)
– weak consistency – BASE (Basically Available Soft-state Eventual
consistency )
To put it simple
• Many nodes
C
• Nodes contain replicas of
partitions of data
• Consistency
– all replicas contain the same
version of data
• Availability A P
– system remains operational
on failing nodes
• Partition tolarence CAP Theorem:
– multiple entry points satisfying all three at the
– system remains operational same time is impossible
on system split
98
CAP properties explained
• Partitionability: divide nodes into small groups that
can see other groups, but they can't see everyone.
• Consistency: write a value and then you read the value
you get the same value back. In a partitioned system
there are windows where that's not true.
• Availability: may not always be able to write or read.
The system will say you can't write because it wants to
keep the system consistent.
• To scale you have to partition: either high consistency
or high availability.
– Find the right overlap of availability and consistency.
Consistency Model
• A consistency model determines rules for visibility and apparent
order of updates.
• For example:
– Row R is replicated on nodes W and V
– Client A writes row R to node W
– Some period of time t elapses.
– Client B reads row X from node V
– Does client B see the write from client A?
– For No SQL, the answer is: “maybe”
• CAP Theorem again!: strict consistency cannot be achieved
at the same time as availability and partition-tolerance.
Eventual Consistency
• When no updates occur for a long period of
time, eventually all updates will propagate
through the system and all the nodes will be
consistent
• For a given accepted update and a given node,
eventually either the update reaches the node
or the node is removed from service
• As the data is written, the latest version is on
at least one node. The data is then
versioned/replicated to other nodes within the
system. Eventually, the same version is on all http://en.wikipedia.org/wiki/Eventual_
nodes. consistency
• Known as BASE (Basically Available, Soft state,
Eventual consistency), as opposed to ACID http://www.allthingsdistributed.com/2
008/12/eventually_consistent.html
From ACID to BASE
• large systems based on CAP are not ACID they are
BASE (http://queue.acm.org/detail.cfm?id=1394128):
• Basically Available - system seems to work all the
time
• Soft State – not consistent all the time
• Eventually Consistent - becomes consistent at
some later time
• Today, everyone who builds big applications
builds them on CAP and BASE: Google, Yahoo,
Facebook, Amazon, eBay, etc
Advice
• Choose a specific approach based on the needs of the
service.
• Example: in a e-commerce checkout process you
always want to honor requests to add items to a
shopping cart because it's revenue producing.
– In this case you choose high availability. Errors are hidden
from the customer and sorted out later.
• Instead: when a customer submits an order you favor
consistency
– Several services (credit card processing, shipping and
handling, reporting) are simultaneously accessing the data.
Schema-Less
Pros:
- Schema-less data model is richer than key/value
pairs
- eventual consistency
- many are distributed
- still provide excellent performance and scalability
Cons:
- Again, no ACID transactions or joins
Common Advantages
• Cheap, easy to implement (open source)
• Data are replicated to multiple nodes (therefore
identical and fault-tolerant) and can be
partitioned
– Down nodes easily replaced
– No single point of failure
• Easy to distribute
• No schema required
• Elasticity
• Relax the data consistency requirement (CAP)
What do we loose?
• Joins - group by - order by
• ACID transactions
• SQL as a sometimes frustrating but still
powerful query language
• easy integration with other applications
that support SQL