www.unglobalpulse.org
  @UNGlobalPulse
Download at:

http://www.unglobalpulse.org/BigDataforDevWhitePape
                           r
TABLE OF CONTENTS
Section I: Opportunities
 • DATA INTENT AND CAPACITY
 • SOCIAL SCIENCE AND POLICY
   APPLICATIONS
 Section II: Challenges
 • DATA CHALLENGES
 • ANALYTICAL CHALLENGES

 Section III: Applications
 • WHAT NEW DATA STREAMS BRING TO
   THE TABLE
 • MAKING BIG DATA WORK FOR
Section I: Opportunity
              The Data Revolution
Big data
• The three V’s of the digital data
   deluge:
   • Exponential growth in volume
   • Increasing velocity of data flow
   • Bewildering variety of new data
      types

Real-time operations in the private sector
• Real-time analysis, real-time decision-making,
  real-time customer feedback
What Do We Mean by Real-Time?

Global Pulse Definition:
“Information about a phenomenon available quickly enough to
maintain an accurate reflection of its current state, such that
effective action may be taken in response.”

Timeframe for intervention is relative to context:

                • Malnutrition        Months

                • Starvation           Weeks

                • Cholera              Days

                • Earthquake            Hours
Section I: Opportunity
        Relevance to the Developing World

• As of 2010: 4 billion of the world’s 5 billion
  mobile phones are in in developing
  countries
• Mobile Services: money transfers, job
  search, commerce, market prices, social
  Mobile Banking in East Africa: Kenya: 11,000 new users/day,
  media
  Tanzania: 15,000, Uganda 18,000
   Facebook in Senegal: 100,000 new users per month
Section I: Opportunity
        Intent in an Age of Growing Volatility

• Drivers of Volatility: financial shocks,
  climate change, hyperconnectivity
  2011 OECD Report: “[d]isruptive shocks to the global economy are
  likely to become more frequent and cause greater economic and
  societal hardship. The economic spill-over effect of events like the
  financial crisis or a potential pandemic will grow due to the increasing
  interconnectivity of the global economy and speed with which people,
  goods and data travel”.
• Early Warning Today: local impacts
  invisible or impossible to track as they
  happen.
• Growing Intent: policy makers are
  recognizing both the costs of volatility and
  the need for greater agility.
Section I: Opportunity
           Data Mining and Data Science
• The availability of real-time digital data is
  increasing every second.
• Slowly but surely, intent to leverage it as a
  public good is growing.
• Yet there must also be capacity to understand
  it -- and use it to change outcomes.


 “Data is the new oil. Like oil, it must be refined
 before it can be used.”
                                      - Andreas
 Weigend
Section I: Opportunity
     Big Data for Development: Getting Started
Illustration: Coping strategies of a hypothetical
household facing rising commodity prices and
unemployment
  OFFLINE BEHAVIORS DIGITAL SIGNATURES
 •   Buy cheaper foods       • Depletion of airtime credit
 •   Work longer hours       • Smaller mobile airtime
 •   Reduce energy use         purchases
 •   Draw down savings       • Failure to repay microloans via
 •   Sell assets               mobile financial services
 •   Borrow from relatives   • Changes in calling patterns
                             • Inbound money transfers
                             • Searches for jobs, health
                             • Sales of livestock via mobile
                               trading network
                             • Venting frustrations on social
                               media
Section I: Opportunity
    Big Data for Development: Getting Started
A Loose BD4D Taxonomy:
 1. Data Exhaust. Mobile usage, purchases,
    search, app usage.
 2. Online Information. New stories, blogs,
    Twitter, Facebook, obituaries, job
    postings, ecommerce.
 3. Physical Sensors. Satellite imagery,
    video, traffic sensors, etc.
 4. Crowdsourced Reports. Information
    actively generated by citizens through
    mobile phone-based surveys, hotlines,
    online maps, etc.
Section I: Opportunity
          Capacity: Big Data Analytics
    Data Analytics and “Reality
             Mining”
1. Stream Analytics: Continuous analysis
   over real-time streaming data (social
   media, calling patterns, online prices,
   search)
2. Data Mining: Online digestion of semi-
   structured and unstructured historical data
   (news items, blog posts)
3. Real-Time Correlation: Integrating fast
   streams with historical records to provide
   context to new data
Data Visualization Matters!




A word cloud of this whitepaper




                                  Global legal timber trade:
                                  Top 5 exporters and costs
Section I: Opportunity
      Social Science and Policy Applications

A growing body of evidence:
• Mining mobile location data to detect job loss,
   migration.
• Mining mobile usage to detect mental illness
• Mining Twitter for misuse of antibiotics and other
   medications
• Mining Facebook for evidence of drinking
   problems among college students
• Remote sensing of nighttime light emissions for
   a real-time estimation of GDP
• Crowdsourcing citizen SMS reports to estimate
   earthquake damage
Tracking Health-Related Behaviour Change:
             Mining Twitter messages




 H1N1 epidemic in the
 US


                                Cholera in Haiti
Tracking Health-Related Behaviour Change
             Mining Google searches




     Volume of real-time searches for symptoms
     predicts official # of cases of Dengue in Brazil
Section II: Challenges
                Data Privacy
1. Digital Data Privacy as a Human Right
   • Data acquisition
   • Storage
   • Retention
   • Use
   • Presentation

2. Privacy Risks in Big Data.
   • Awareness of consent to collect,
   • Reuse of public content,
   • Re-identification.
Section II: Challenges
                    Data Access

Private sector barriers to
sharing Big Data:
  • Legal constraints
  • Reputational risk
  • Competitive
    advantage
  • Culture of secrecy
  • Lack of incentives
  • Technical complexity
  • Level of effort
Data
Philanthropy!
Section II: Challenges
                       Analysis
Getting the picture right with
user-generated data
  • Falsification, deliberate
      distortion
  • Sensor network distribution
  • Perceptions vs. facts: Flu
      Trends detects ILI, not
      Influenza.
  • Sentiment Analysis: sarcasm,      Map of tweets in Jakarta
      irony, hyperbole, humor, and
      the elusiveness of intent.
  • Expressed vs. actual intentions
  • Text mining: context and
      significance
Section II: Challenges
                        Analysis
Interpreting behavioral data
  • Selection bias: income, education, age,
    gender, technical aptitude, service provider
  • Media coverage drives behaviour change
  • Apophenia: correlation is not causality
Section II: Challenges
                    Analysis

Detecting and defining anomalies in human
ecosystems
  • Establishing a baseline: how stringent is
    your model?
  • Sensitivity vs. specificity: false positive
    undermine credibility; false negatives
    reduce relevance.
Section III: Application
 What New Data Streams Bring to the Table
Know your data!
  • Big Data is….just data. However…
    • News organizations have developed
      verification methodologies
    • Perceptual data is useful for detecting
      events
    • False perceptions drive population
      behavior
    • Selection bias can be an advantage: in
      developing countries, online inflation may
      precede offline inflation
Section III: Application
    What New Data Streams Bring to the Table
Applications of Big Data for Development
“Even if all you have got is a contemporaneous correlation, you’ve got a 6-
week lead on the reported values. The hope is that as you take the economic
pulse in real time, you will be able to respond to anomalies more quickly.” -
Hal Varian, Chief Economist, Google

• Sometimes
  correlation suffices:
  proxy indicators
• Accuracy vs. speed,
  cost, scale
• Real-time data
  saves lives


                                        USGS Twitter Earthquake Detector
Section III: Application
What New Data Streams Bring to the Table
 Global Pulse research: real-time proxy indicators




 Tweets about the price of rice vs. official food prices in
 Indonesia
Section III: Application
What New Data Streams Bring to the Table
  Global Pulse research: real-time proxy
  indicators




Correlation of mood changes and emerging topics in social media
with official unemployment figures in the US and Ireland
Section III: Application
 What New Data Streams Bring to the Table
A threefold opportunity for development
1. Early warning: Faster detection of anomalies at
   the onset of a crisis allows more agile
   responses to prevent harm.
2. Real-time awareness: A fine-grained and
   current representation of reality informs better
   design and targeting of programmes and
   policies;
3. Real-time feedback: Continuous monitoring for
   behaviour changes following programme
   implementation enables a more adaptive
   approach to development, in which rapid
   adjustments may be made until results are
   achieved.
Section III: Application
   Making Big Data Work for Development
Contextualization is key
1. Data context: Indicators should not be
   interpreted in isolation. Monitor for
   constellations of anomalies, triangulating
   across data sources.
2. Cultural context: Local knowledge of what is
   “normal” in a given population is a prerequisite
   for recognizing anomalies. Cultural practices
   and norms vary widely the world over and
   these differences certainly extend to the use of
   digital services. There is a deeply ethnographic
   dimension to using Big Data for development
Section III: Application
    Making Big Data Work for Development
Becoming sophisticated users of information
Example: FEMA tracking 2011 US tornado
impacts through Twitter

1. “We aren’t making widgets”: Navigating the
   tradeoff between speed and accuracy.
2. Focus on changing outcomes. How can we
   leverage the real-time nature of the data to
   save lives?

“Disasters are like horseshoes, hand grenades and
thermal nuclear devices, you just need to be close—
preferably more than less.” – Craig Fugate, Administrator,
US Federal Emergency Management Agency
Conclusion

How can Big Data fulfill its potential as a
public good?
1. Institutional and financial support from public
   sector actors
2. Creating incentives for corporations to share
   data
3. Creating opportunities for academic
   researchers to collaborate
4. Developing new models, technologies and
   policies for safe and responsible sharing and
   reuse of data for the public good
5. New types of partnerships
UN Global Pulse
www.unglobalpulse.org

@unglobalpulse




                                         Image credit: Aaron Koblin
                        24 hours of AT&T phone calls and Internet
                             traffic flowing through New York City

"Big Data for Development: Opportunities & Challenges” - UN Global Pulse

  • 1.
  • 2.
  • 3.
    TABLE OF CONTENTS SectionI: Opportunities • DATA INTENT AND CAPACITY • SOCIAL SCIENCE AND POLICY APPLICATIONS Section II: Challenges • DATA CHALLENGES • ANALYTICAL CHALLENGES Section III: Applications • WHAT NEW DATA STREAMS BRING TO THE TABLE • MAKING BIG DATA WORK FOR
  • 4.
    Section I: Opportunity The Data Revolution Big data • The three V’s of the digital data deluge: • Exponential growth in volume • Increasing velocity of data flow • Bewildering variety of new data types Real-time operations in the private sector • Real-time analysis, real-time decision-making, real-time customer feedback
  • 5.
    What Do WeMean by Real-Time? Global Pulse Definition: “Information about a phenomenon available quickly enough to maintain an accurate reflection of its current state, such that effective action may be taken in response.” Timeframe for intervention is relative to context: • Malnutrition Months • Starvation Weeks • Cholera Days • Earthquake Hours
  • 6.
    Section I: Opportunity Relevance to the Developing World • As of 2010: 4 billion of the world’s 5 billion mobile phones are in in developing countries • Mobile Services: money transfers, job search, commerce, market prices, social Mobile Banking in East Africa: Kenya: 11,000 new users/day, media Tanzania: 15,000, Uganda 18,000 Facebook in Senegal: 100,000 new users per month
  • 7.
    Section I: Opportunity Intent in an Age of Growing Volatility • Drivers of Volatility: financial shocks, climate change, hyperconnectivity 2011 OECD Report: “[d]isruptive shocks to the global economy are likely to become more frequent and cause greater economic and societal hardship. The economic spill-over effect of events like the financial crisis or a potential pandemic will grow due to the increasing interconnectivity of the global economy and speed with which people, goods and data travel”. • Early Warning Today: local impacts invisible or impossible to track as they happen. • Growing Intent: policy makers are recognizing both the costs of volatility and the need for greater agility.
  • 8.
    Section I: Opportunity Data Mining and Data Science • The availability of real-time digital data is increasing every second. • Slowly but surely, intent to leverage it as a public good is growing. • Yet there must also be capacity to understand it -- and use it to change outcomes. “Data is the new oil. Like oil, it must be refined before it can be used.” - Andreas Weigend
  • 9.
    Section I: Opportunity Big Data for Development: Getting Started Illustration: Coping strategies of a hypothetical household facing rising commodity prices and unemployment OFFLINE BEHAVIORS DIGITAL SIGNATURES • Buy cheaper foods • Depletion of airtime credit • Work longer hours • Smaller mobile airtime • Reduce energy use purchases • Draw down savings • Failure to repay microloans via • Sell assets mobile financial services • Borrow from relatives • Changes in calling patterns • Inbound money transfers • Searches for jobs, health • Sales of livestock via mobile trading network • Venting frustrations on social media
  • 10.
    Section I: Opportunity Big Data for Development: Getting Started A Loose BD4D Taxonomy: 1. Data Exhaust. Mobile usage, purchases, search, app usage. 2. Online Information. New stories, blogs, Twitter, Facebook, obituaries, job postings, ecommerce. 3. Physical Sensors. Satellite imagery, video, traffic sensors, etc. 4. Crowdsourced Reports. Information actively generated by citizens through mobile phone-based surveys, hotlines, online maps, etc.
  • 11.
    Section I: Opportunity Capacity: Big Data Analytics Data Analytics and “Reality Mining” 1. Stream Analytics: Continuous analysis over real-time streaming data (social media, calling patterns, online prices, search) 2. Data Mining: Online digestion of semi- structured and unstructured historical data (news items, blog posts) 3. Real-Time Correlation: Integrating fast streams with historical records to provide context to new data
  • 12.
    Data Visualization Matters! Aword cloud of this whitepaper Global legal timber trade: Top 5 exporters and costs
  • 13.
    Section I: Opportunity Social Science and Policy Applications A growing body of evidence: • Mining mobile location data to detect job loss, migration. • Mining mobile usage to detect mental illness • Mining Twitter for misuse of antibiotics and other medications • Mining Facebook for evidence of drinking problems among college students • Remote sensing of nighttime light emissions for a real-time estimation of GDP • Crowdsourcing citizen SMS reports to estimate earthquake damage
  • 14.
    Tracking Health-Related BehaviourChange: Mining Twitter messages H1N1 epidemic in the US Cholera in Haiti
  • 15.
    Tracking Health-Related BehaviourChange Mining Google searches Volume of real-time searches for symptoms predicts official # of cases of Dengue in Brazil
  • 16.
    Section II: Challenges Data Privacy 1. Digital Data Privacy as a Human Right • Data acquisition • Storage • Retention • Use • Presentation 2. Privacy Risks in Big Data. • Awareness of consent to collect, • Reuse of public content, • Re-identification.
  • 17.
    Section II: Challenges Data Access Private sector barriers to sharing Big Data: • Legal constraints • Reputational risk • Competitive advantage • Culture of secrecy • Lack of incentives • Technical complexity • Level of effort Data Philanthropy!
  • 18.
    Section II: Challenges Analysis Getting the picture right with user-generated data • Falsification, deliberate distortion • Sensor network distribution • Perceptions vs. facts: Flu Trends detects ILI, not Influenza. • Sentiment Analysis: sarcasm, Map of tweets in Jakarta irony, hyperbole, humor, and the elusiveness of intent. • Expressed vs. actual intentions • Text mining: context and significance
  • 19.
    Section II: Challenges Analysis Interpreting behavioral data • Selection bias: income, education, age, gender, technical aptitude, service provider • Media coverage drives behaviour change • Apophenia: correlation is not causality
  • 20.
    Section II: Challenges Analysis Detecting and defining anomalies in human ecosystems • Establishing a baseline: how stringent is your model? • Sensitivity vs. specificity: false positive undermine credibility; false negatives reduce relevance.
  • 21.
    Section III: Application What New Data Streams Bring to the Table Know your data! • Big Data is….just data. However… • News organizations have developed verification methodologies • Perceptual data is useful for detecting events • False perceptions drive population behavior • Selection bias can be an advantage: in developing countries, online inflation may precede offline inflation
  • 22.
    Section III: Application What New Data Streams Bring to the Table Applications of Big Data for Development “Even if all you have got is a contemporaneous correlation, you’ve got a 6- week lead on the reported values. The hope is that as you take the economic pulse in real time, you will be able to respond to anomalies more quickly.” - Hal Varian, Chief Economist, Google • Sometimes correlation suffices: proxy indicators • Accuracy vs. speed, cost, scale • Real-time data saves lives USGS Twitter Earthquake Detector
  • 23.
    Section III: Application WhatNew Data Streams Bring to the Table Global Pulse research: real-time proxy indicators Tweets about the price of rice vs. official food prices in Indonesia
  • 24.
    Section III: Application WhatNew Data Streams Bring to the Table Global Pulse research: real-time proxy indicators Correlation of mood changes and emerging topics in social media with official unemployment figures in the US and Ireland
  • 25.
    Section III: Application What New Data Streams Bring to the Table A threefold opportunity for development 1. Early warning: Faster detection of anomalies at the onset of a crisis allows more agile responses to prevent harm. 2. Real-time awareness: A fine-grained and current representation of reality informs better design and targeting of programmes and policies; 3. Real-time feedback: Continuous monitoring for behaviour changes following programme implementation enables a more adaptive approach to development, in which rapid adjustments may be made until results are achieved.
  • 26.
    Section III: Application Making Big Data Work for Development Contextualization is key 1. Data context: Indicators should not be interpreted in isolation. Monitor for constellations of anomalies, triangulating across data sources. 2. Cultural context: Local knowledge of what is “normal” in a given population is a prerequisite for recognizing anomalies. Cultural practices and norms vary widely the world over and these differences certainly extend to the use of digital services. There is a deeply ethnographic dimension to using Big Data for development
  • 27.
    Section III: Application Making Big Data Work for Development Becoming sophisticated users of information Example: FEMA tracking 2011 US tornado impacts through Twitter 1. “We aren’t making widgets”: Navigating the tradeoff between speed and accuracy. 2. Focus on changing outcomes. How can we leverage the real-time nature of the data to save lives? “Disasters are like horseshoes, hand grenades and thermal nuclear devices, you just need to be close— preferably more than less.” – Craig Fugate, Administrator, US Federal Emergency Management Agency
  • 28.
    Conclusion How can BigData fulfill its potential as a public good? 1. Institutional and financial support from public sector actors 2. Creating incentives for corporations to share data 3. Creating opportunities for academic researchers to collaborate 4. Developing new models, technologies and policies for safe and responsible sharing and reuse of data for the public good 5. New types of partnerships
  • 29.
    UN Global Pulse www.unglobalpulse.org @unglobalpulse Image credit: Aaron Koblin 24 hours of AT&T phone calls and Internet traffic flowing through New York City