BIG DATA and VERACITY:
A novel approach to data
veracity using crowd-sourcing
techniques
Samarth Bhargav, Bhoomika Agarwal,
Abhiram Ravikumar and Vrishabh DN
April 18, 2014
Presented at BMS Institute of Technology, Bangalore
Introduction
Big Data
● What is Big Data?
● The 3 traditional V’s
o Volume
o Velocity
o Variety
● Fourth V
● Crowdsourcing
Volume
VarietyVelocity
Veracity
The 4 Vs of Big Data
Source: http://well-managed-business-intelligence.blogspot.in/2012/06/big-data-fourth.html
Crowdsourcing - Models in place
GOOGLE MAPS
WIKIPEDIA
DUOLINGO
RECAPTCHA
AMAZON TURK
● Digitizing one word at a time
● Utilize the 10 seconds spent by humans, productively
● Digitizing old books - herculean task for computers
● An efficient alternative to OCR
● Workflow - entry, multiple-checks, verify, upload
● 20 years of The New York Times Daily was digitized in
just a couple of months
reCAPTCHA
● “Enrich Google Maps with your local knowledge”
● The Google Map Maker project
● Data used by Google Maps and Google Earth
● Projects like PhotoSphere and StreetView use huge
contributions from the masses
● Workflow
○ add/edit places
○ verified by a moderator
○ cross-referenced and updated
Google Maps
WIKIPEDIA
● Termed as the “mother of all encyclopedias”
● Hosts an immense pool of data, multi-linguistic in nature
and entirely community driven
● Run by donations from all over the world (crowdfunding)
● Dynamic and constantly updated, thus scores big over
traditional encyclopedias
● Unbiased and high-quality
information
● Data-verification and
validation done instantly
by both experts and
general public
DUOLINGO
● Learn a language and translate the Web
● Entirely free and crowd-driven
● Luis van Ahn - ESP games and reCAPTCHA
● Workflow
o website to be translated is uploaded
o broken into parts & given to students
o students translate the doc during learning procedure
o translated doc returned to owner
● Win-win situation for both students and corporates
● Popular on both web as well as mobile platforms
Amazon Mechanical Turk
● Use of artificial intelligence to run businesses
● HITs enable machine learning concepts
● Workflow
o Requester places task on the site or through API
o Provider picks a suitable task
o Payments made through Amazon gift certificates
● Advantages include
o Quality assurance
o Scalability options
o Lower cost
Analysis
● Handling data IS important
● Google FLU tracker
● KickStarter and CosmoQuest
● Lot of scope and wide opportunities
Repercussions
● Senator Kennedy’s story
● FCRA (Fair Credit Reporting Act)
● Crowds unaware of data-acquisition
● Confidential data and security-leaks to be
addressed with care
Conclusion
Crowdsourcing
model
Volume Velocity Variety Veracity
Google Maps terabytes high low medium
Duolingo terabytes medium high high
reCAPTCHA petabytes very high very high very high
Amazon Turk petabytes medium very high high
Wikipedia petabytes medium high very high
References
1. http://crowdsourcingweek.com/you-have-helped-digitize-millions-of-books-through-online-
collaboration/
2. http://www.loopinsight.com/2014/03/14/duolingo-recaptcha-and-a-magnificent-piece-of-
crowdsourcing/
3. http://www.cracked.com/article_19431_5-mind-blowing-things-crowds-do-better-than-
experts.html
4. http://royal.pingdom.com/2012/02/08/google-maps-turns-7-years-old-amazing-facts-and-figures/
5. http://en.wikipedia.org/wiki/Amazon_Mechanical_Turk
6. http://www.pomona.edu/academics/departments/psychology/files/Buhrmester%20-
Crowdsourcing-Amazon-MTurk.pdf
7. http://hcil2.cs.umd.edu/trs/2010-09/2010-09.pdf
8. http://www.slideshare.net/davidgracia/crowdsourcing-at-wikipedia-8586584
9. http://info.articleonepartners.com/crowdsourcing-series-wikipedia-the-godfather-of-
crowdsourcing/
10. http://ezinearticles.com/?Wikipedia---A-Successful-Crowdsourcing-Project&id=3736803
Question & Answers time! :-)
Source:http://2.bp.blogspot.com/
Thank you, UTSAHA 2k’14.

More Related Content

PPTX
Our big data
PPTX
Big Data
PPTX
PPTX
Big data by Mithlesh sadh
PPTX
What is big data?
PPTX
Big Data Overview 2013-2014
PPTX
Big Data PPT by Rohit Dubey
PPTX
Chapter 1 big data
Our big data
Big Data
Big data by Mithlesh sadh
What is big data?
Big Data Overview 2013-2014
Big Data PPT by Rohit Dubey
Chapter 1 big data

What's hot (20)

PPTX
Structuring Big Data
PPT
Big data analysis using map/reduce
PPTX
Big Data & Hadoop Introduction
PDF
Research paper on big data and hadoop
PPTX
Presentation About Big Data (DBMS)
PPT
big data analytics in mobile cellular network
PPTX
Big_data_ppt
PDF
Big data analytics, research report
PPT
Big data Analytics
PDF
Sina Sohangir Presentation on IWMC 2015
PDF
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
PDF
The importance of data
PDF
Core concepts and Key technologies - Big Data Analytics
PPTX
Big data ppt
PPTX
Introducing Technologies for Handling Big Data by Jaseela
PDF
Integrating Big Data Technologies
PPTX
Big Data - Applications and Technologies Overview
PPTX
Big data Ppt
PDF
Big Data introduction - Café Numérique Bruxelles
Structuring Big Data
Big data analysis using map/reduce
Big Data & Hadoop Introduction
Research paper on big data and hadoop
Presentation About Big Data (DBMS)
big data analytics in mobile cellular network
Big_data_ppt
Big data analytics, research report
Big data Analytics
Sina Sohangir Presentation on IWMC 2015
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
The importance of data
Core concepts and Key technologies - Big Data Analytics
Big data ppt
Introducing Technologies for Handling Big Data by Jaseela
Integrating Big Data Technologies
Big Data - Applications and Technologies Overview
Big data Ppt
Big Data introduction - Café Numérique Bruxelles
Ad

Viewers also liked (17)

PPTX
Basuras en lugares especificos
PDF
Take back control - introduction
PDF
Rockin' Search Engine Optimization in Drupal
PDF
Museum Textile Review- Collections Care: Costume & books at New Harmony/ Harm...
PPTX
Creating Shareable content
PDF
Air France
PPTX
Advert codes and conventions
PPT
Clichés de Jean Ledocq sur le thème Art Image de 2016
PDF
Hooks Historic Drugstore Preservation
PDF
Textile Military History, 27th Indiana. Vol. Regiment, Dubois county Civil W...
PDF
Museum collection storage- Cincinnati History Museum Ctr, Geiger - Fleishman...
PDF
Welcome to Drupal 262
PDF
20150423 跨科際短講籌備會議
PPTX
cancer-de-cuello-uterino
PPT
References expose 2016
PPTX
Website codes and conventions
Basuras en lugares especificos
Take back control - introduction
Rockin' Search Engine Optimization in Drupal
Museum Textile Review- Collections Care: Costume & books at New Harmony/ Harm...
Creating Shareable content
Air France
Advert codes and conventions
Clichés de Jean Ledocq sur le thème Art Image de 2016
Hooks Historic Drugstore Preservation
Textile Military History, 27th Indiana. Vol. Regiment, Dubois county Civil W...
Museum collection storage- Cincinnati History Museum Ctr, Geiger - Fleishman...
Welcome to Drupal 262
20150423 跨科際短講籌備會議
cancer-de-cuello-uterino
References expose 2016
Website codes and conventions
Ad

Similar to A novel approach to big data veracity using crowd-sourcing techniques (20)

PDF
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
PDF
Crowdsourcing the Semantic Web
PDF
1Lecture_01_Introduction to Big Data.pdf
PDF
4Lecture_01_Introduction to Big Data.pdf
PDF
NPTEL BIG DATA FULL PPT BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA...
PPTX
Big data 101
PPTX
1st Birmingham Big Data Science Group meetup
PDF
Crowdsourcing: A Survey
PPTX
Data analytics introduction
PPTX
Big Data World
PDF
Wims2012
PPTX
Big Data and Mobile Recruitment - Irish Recruiters Conf Dec 2012
PPTX
Bigdata analytics
PDF
Big Data Analytics Introduction chapter.pdf
PPTX
PDF
big-datagroup6-150317090053-conversion-gate01.pdf
PPTX
NGO Seminar on Big Data - Guardian 6/13/12
PPTX
Crowdsourcing for Online Data Collection
PDF
Human factor in big data qrowd bdve
PPTX
BigData
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing the Semantic Web
1Lecture_01_Introduction to Big Data.pdf
4Lecture_01_Introduction to Big Data.pdf
NPTEL BIG DATA FULL PPT BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA...
Big data 101
1st Birmingham Big Data Science Group meetup
Crowdsourcing: A Survey
Data analytics introduction
Big Data World
Wims2012
Big Data and Mobile Recruitment - Irish Recruiters Conf Dec 2012
Bigdata analytics
Big Data Analytics Introduction chapter.pdf
big-datagroup6-150317090053-conversion-gate01.pdf
NGO Seminar on Big Data - Guardian 6/13/12
Crowdsourcing for Online Data Collection
Human factor in big data qrowd bdve
BigData

More from Abhiram Ravikumar (7)

PDF
Innovate the foss-way
PPTX
Rust meetup delhi nov 18
PDF
Ethereum and blockchain
PDF
BCI Media Playet | Intuit Accessibility Summit
PPTX
Privacy & Security on the Web - Tools on Mozilla Firefox
PDF
A seminar on User Topic Interest profiles research by Google
PDF
A kick-start into Open Source
Innovate the foss-way
Rust meetup delhi nov 18
Ethereum and blockchain
BCI Media Playet | Intuit Accessibility Summit
Privacy & Security on the Web - Tools on Mozilla Firefox
A seminar on User Topic Interest profiles research by Google
A kick-start into Open Source

Recently uploaded (20)

PDF
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
PDF
REPORT CARD OF GRADE 2 2025-2026 MATATAG
PPTX
PPT for Diseases (1)-2, types of diseases.pptx
PPTX
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
PPTX
recommendation Project PPT with details attached
PPTX
9 Bioterrorism.pptxnsbhsjdgdhdvkdbebrkndbd
PPTX
AI AND ML PROPOSAL PRESENTATION MUST.pptx
PDF
book-34714 (2).pdfhjkkljgfdssawtjiiiiiujj
PDF
A biomechanical Functional analysis of the masitary muscles in man
PDF
Session 11 - Data Visualization Storytelling (2).pdf
PPTX
ch20 Database System Architecture by Rizvee
PDF
The Role of Pathology AI in Translational Cancer Research and Education
PPTX
machinelearningoverview-250809184828-927201d2.pptx
PPTX
Hushh.ai: Your Personal Data, Your Business
PPTX
Hushh Hackathon for IIT Bombay: Create your very own Agents
PPT
Classification methods in data analytics.ppt
PPTX
OJT-Narrative-Presentation-Entrep-group.pptx_20250808_102837_0000.pptx
PPTX
inbound2857676998455010149.pptxmmmmmmmmm
PDF
2025-08 San Francisco FinOps Meetup: Tiering, Intelligently.
PPTX
AI_Agriculture_Presentation_Enhanced.pptx
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
REPORT CARD OF GRADE 2 2025-2026 MATATAG
PPT for Diseases (1)-2, types of diseases.pptx
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
recommendation Project PPT with details attached
9 Bioterrorism.pptxnsbhsjdgdhdvkdbebrkndbd
AI AND ML PROPOSAL PRESENTATION MUST.pptx
book-34714 (2).pdfhjkkljgfdssawtjiiiiiujj
A biomechanical Functional analysis of the masitary muscles in man
Session 11 - Data Visualization Storytelling (2).pdf
ch20 Database System Architecture by Rizvee
The Role of Pathology AI in Translational Cancer Research and Education
machinelearningoverview-250809184828-927201d2.pptx
Hushh.ai: Your Personal Data, Your Business
Hushh Hackathon for IIT Bombay: Create your very own Agents
Classification methods in data analytics.ppt
OJT-Narrative-Presentation-Entrep-group.pptx_20250808_102837_0000.pptx
inbound2857676998455010149.pptxmmmmmmmmm
2025-08 San Francisco FinOps Meetup: Tiering, Intelligently.
AI_Agriculture_Presentation_Enhanced.pptx

A novel approach to big data veracity using crowd-sourcing techniques

  • 1. BIG DATA and VERACITY: A novel approach to data veracity using crowd-sourcing techniques Samarth Bhargav, Bhoomika Agarwal, Abhiram Ravikumar and Vrishabh DN April 18, 2014 Presented at BMS Institute of Technology, Bangalore
  • 2. Introduction Big Data ● What is Big Data? ● The 3 traditional V’s o Volume o Velocity o Variety ● Fourth V ● Crowdsourcing Volume VarietyVelocity Veracity
  • 3. The 4 Vs of Big Data Source: http://well-managed-business-intelligence.blogspot.in/2012/06/big-data-fourth.html
  • 4. Crowdsourcing - Models in place GOOGLE MAPS WIKIPEDIA DUOLINGO RECAPTCHA AMAZON TURK
  • 5. ● Digitizing one word at a time ● Utilize the 10 seconds spent by humans, productively ● Digitizing old books - herculean task for computers ● An efficient alternative to OCR ● Workflow - entry, multiple-checks, verify, upload ● 20 years of The New York Times Daily was digitized in just a couple of months reCAPTCHA
  • 6. ● “Enrich Google Maps with your local knowledge” ● The Google Map Maker project ● Data used by Google Maps and Google Earth ● Projects like PhotoSphere and StreetView use huge contributions from the masses ● Workflow ○ add/edit places ○ verified by a moderator ○ cross-referenced and updated Google Maps
  • 7. WIKIPEDIA ● Termed as the “mother of all encyclopedias” ● Hosts an immense pool of data, multi-linguistic in nature and entirely community driven ● Run by donations from all over the world (crowdfunding) ● Dynamic and constantly updated, thus scores big over traditional encyclopedias ● Unbiased and high-quality information ● Data-verification and validation done instantly by both experts and general public
  • 8. DUOLINGO ● Learn a language and translate the Web ● Entirely free and crowd-driven ● Luis van Ahn - ESP games and reCAPTCHA ● Workflow o website to be translated is uploaded o broken into parts & given to students o students translate the doc during learning procedure o translated doc returned to owner ● Win-win situation for both students and corporates ● Popular on both web as well as mobile platforms
  • 9. Amazon Mechanical Turk ● Use of artificial intelligence to run businesses ● HITs enable machine learning concepts ● Workflow o Requester places task on the site or through API o Provider picks a suitable task o Payments made through Amazon gift certificates ● Advantages include o Quality assurance o Scalability options o Lower cost
  • 10. Analysis ● Handling data IS important ● Google FLU tracker ● KickStarter and CosmoQuest ● Lot of scope and wide opportunities
  • 11. Repercussions ● Senator Kennedy’s story ● FCRA (Fair Credit Reporting Act) ● Crowds unaware of data-acquisition ● Confidential data and security-leaks to be addressed with care
  • 12. Conclusion Crowdsourcing model Volume Velocity Variety Veracity Google Maps terabytes high low medium Duolingo terabytes medium high high reCAPTCHA petabytes very high very high very high Amazon Turk petabytes medium very high high Wikipedia petabytes medium high very high
  • 13. References 1. http://crowdsourcingweek.com/you-have-helped-digitize-millions-of-books-through-online- collaboration/ 2. http://www.loopinsight.com/2014/03/14/duolingo-recaptcha-and-a-magnificent-piece-of- crowdsourcing/ 3. http://www.cracked.com/article_19431_5-mind-blowing-things-crowds-do-better-than- experts.html 4. http://royal.pingdom.com/2012/02/08/google-maps-turns-7-years-old-amazing-facts-and-figures/ 5. http://en.wikipedia.org/wiki/Amazon_Mechanical_Turk 6. http://www.pomona.edu/academics/departments/psychology/files/Buhrmester%20- Crowdsourcing-Amazon-MTurk.pdf 7. http://hcil2.cs.umd.edu/trs/2010-09/2010-09.pdf 8. http://www.slideshare.net/davidgracia/crowdsourcing-at-wikipedia-8586584 9. http://info.articleonepartners.com/crowdsourcing-series-wikipedia-the-godfather-of- crowdsourcing/ 10. http://ezinearticles.com/?Wikipedia---A-Successful-Crowdsourcing-Project&id=3736803
  • 14. Question & Answers time! :-) Source:http://2.bp.blogspot.com/ Thank you, UTSAHA 2k’14.