Big Data
A big step towards innovation, competition and
productivity
Contents









Big Data Definition
Example of Big Data
Big Data Vectors
Cost Problem
Importance of Big Data
Big Data growth
Some Challenges in Big Data
Big Data Implementation
Big Data Definition


Big data is used to describe a massive volume of both
structured and unstructured data that is so large that it's
difficult to process using traditional database and
software techniques.



In most enterprise scenarios the data is too big or it
moves too fast or it exceeds current processing capacity.



The term big data is believed to have originated with
Web search companies who had to query very large
distributed aggregations of loosely-structured data.
An Example of Big Data


An example of big data might be petabytes (1,024
terabytes) or exabytes (1,024 petabytes) of data
consisting of billions to trillions of records of millions of
people—all from different sources (e.g. Web, sales,
customer contact center, social media, mobile data and
so on). The data is typically loosely structured data that
is often incomplete and inaccessible.



When dealing with larger datasets, organizations face
difficulties in being able to create, manipulate, and
manage big data. Big data is particularly a problem in
business analytics because standard tools and
procedures are not designed to search and analyze
massive datasets.
Big Data vectors
Cost problem
Cost of processing 1 Petabyte of data with 1000 nodes?
 1 PB = 1015 B = 1 million gigabytes = 1 thousand
terabytes
 9 hours for each node to process 500GB at rate of
15MB/S
 15*60*60*9 = 486000MB ~ 500 GB
 1000 * 9 * 0.34$ = 3060$ for single run
 1 PB = 1000000 / 500 = 2000 * 9 = 18000 h /24 = 750
Day
 The cost for 1000 cloud node each processing 1PB
 2000 * 3060$ = 6,120,000$
Importance of Big Data









Government: In 2012, the Obama administration
announced the Big Data Research and Development
Initiative.
84 different big data programs spread across six
departments.
Private Sector: Wal-Mart handles more than 1 million
customer transactions every hour, which is imported into
databases estimated to contain more than 2.5 petabytes
of data.
Facebook handles 40 billion photos from its user base.
Falcon Credit Card Fraud Detection System protects 2.1
billion active accounts world-wide.
Science: Large Synoptic Survey Telescope will generate
140 Terabyte of data every 5 days.






Large Hardon Colider 13 Petabyte data produced in
2010.
Medical computation like decoding human Genome.
Social science revolution
New way of science (Microscope example)
Technology Player in this field











Google
Oracle
Microsoft
IBM
Hadapt
Nike
Yelp
Netflix
Dropbox
Zipdial
Big Data growth
Some Challenges in Big Data






While big data can yield extremely useful information, it
also presents new challenges with respect to :
How much data to store ?
How much this will cost ?
Whether the data will be secure ? and
How long it must be maintained ?
Implementation of Big Data
Platforms for Large-scale Data Analysis :
 The Apache Software Foundations' Java-based Hadoop
programming framework that can run applications on
systems with thousands of nodes; and
 The MapReduce software framework, which consists of
a Map function that distributes work to different nodes
and a Reduce function that gathers results and resolves
them into a single value.
Thank You!!
By:
Harshita Rachora
Trainee Software Consultant
Knoldus Software LLP

Big data

  • 1.
    Big Data A bigstep towards innovation, competition and productivity
  • 2.
    Contents         Big Data Definition Exampleof Big Data Big Data Vectors Cost Problem Importance of Big Data Big Data growth Some Challenges in Big Data Big Data Implementation
  • 3.
    Big Data Definition  Bigdata is used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques.  In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity.  The term big data is believed to have originated with Web search companies who had to query very large distributed aggregations of loosely-structured data.
  • 4.
    An Example ofBig Data  An example of big data might be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data consisting of billions to trillions of records of millions of people—all from different sources (e.g. Web, sales, customer contact center, social media, mobile data and so on). The data is typically loosely structured data that is often incomplete and inaccessible.  When dealing with larger datasets, organizations face difficulties in being able to create, manipulate, and manage big data. Big data is particularly a problem in business analytics because standard tools and procedures are not designed to search and analyze massive datasets.
  • 5.
  • 6.
    Cost problem Cost ofprocessing 1 Petabyte of data with 1000 nodes?  1 PB = 1015 B = 1 million gigabytes = 1 thousand terabytes  9 hours for each node to process 500GB at rate of 15MB/S  15*60*60*9 = 486000MB ~ 500 GB  1000 * 9 * 0.34$ = 3060$ for single run  1 PB = 1000000 / 500 = 2000 * 9 = 18000 h /24 = 750 Day  The cost for 1000 cloud node each processing 1PB  2000 * 3060$ = 6,120,000$
  • 7.
    Importance of BigData       Government: In 2012, the Obama administration announced the Big Data Research and Development Initiative. 84 different big data programs spread across six departments. Private Sector: Wal-Mart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data. Facebook handles 40 billion photos from its user base. Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts world-wide. Science: Large Synoptic Survey Telescope will generate 140 Terabyte of data every 5 days.
  • 8.
        Large Hardon Colider13 Petabyte data produced in 2010. Medical computation like decoding human Genome. Social science revolution New way of science (Microscope example)
  • 9.
    Technology Player inthis field           Google Oracle Microsoft IBM Hadapt Nike Yelp Netflix Dropbox Zipdial
  • 10.
  • 11.
    Some Challenges inBig Data     While big data can yield extremely useful information, it also presents new challenges with respect to : How much data to store ? How much this will cost ? Whether the data will be secure ? and How long it must be maintained ?
  • 12.
    Implementation of BigData Platforms for Large-scale Data Analysis :  The Apache Software Foundations' Java-based Hadoop programming framework that can run applications on systems with thousands of nodes; and  The MapReduce software framework, which consists of a Map function that distributes work to different nodes and a Reduce function that gathers results and resolves them into a single value.
  • 13.
    Thank You!! By: Harshita Rachora TraineeSoftware Consultant Knoldus Software LLP