SlideShare a Scribd company logo
Microtask Crowdsourcing
Applications for Linked Data
Architecture of
Linked Data Applications
Presentation Tier
Logic Tier
Data Tier

Integrated
Dataset

Data Access
Component

Republication

Republication
Component

Data Integration Component
Vocabulary
Mapping

Interlinking

SPARQL Wr.

Physical Wrapper

R2R Transf.

Cleansing

LD Wrapper

RDF/
XML
Web Data accessed via APIs

SPARQL
Endpoints

EUCLID – Microtask crowdsourcing
applications for Linked Data

Relational Data

Linked Data
2
Data Tier
Data Integration Component
Data Access
Component

Data Integration Component
Vocabulary
Mapping

Interlinking

Cleansing

• Consolidates the data retrieved from heterogeneous sources.
• This component may operate at:
– Schema level: Performs vocabulary mappings in order to translate
data into a single unified schema. Links correspond to RDFS properties
CH 2
or OWL property and class axioms.
– Instance level: Performs entity linking, e.g., entity resolution via
owl:sameAs links
CH 3
EUCLID – Microtask crowdsourcing
applications for Linked Data

3
Data Tier (2)
Data Integration Component
Data Access
Component

Data Integration Component
Vocabulary
Mapping

Interlinking

Cleansing

The data integration component can be enhanced by including
microtask crowdsourcing apporaches:
• Cleansing or data assessments: Assessment of DBpedia triples
• Vocabulary mapping: CrowdMAP
• Interlinking: ZenCrowd
EUCLID – Microtask crowdsourcing
applications for Linked Data

4
Other Crowdsourcing-based
Solutions for Linked Data Tasks
• Query understanding: CrowdDQ

• Ontology population: OntoGame
• Linked Data curation: Urbanopoly
• …

EUCLID – Microtask crowdsourcing
applications for Linked Data

5
DBPEDIA QUALITY ASSESSMENT

EUCLID – Microtask crowdsourcing
applications for Linked Data
Assessing DBpedia Triples
Correct

{s p o .}
Dataset

{s p o .}
Incorrect +
Quality issue

1. Selecting LD quality issues generated by erroneous extraction
mechanisms and that can be detected by the crowd
2. Selecting the appropriate crowdsourcing approaches
3. Designing and generating the interfaces to present the data to the
crowd
EUCLID – Microtask crowdsourcing
applications for Linked Data
Selecting LD Quality
Issues to Crowdsource
Three categories of quality problems occur
pervasively in DBpedia [Zaveri2013]
and can be crowdsourced:
• Incorrect object
 Example: dbpedia:Dave_Dobbyn dbprop:dateOfBirth “3”.

• Incorrect data type
 Example: dbpedia:Torishima_Izu_Islands foaf:name “鳥島”@en.

• Incorrect link to “external Web pages”
 Example: dbpedia:John-Two-Hawks dbpediaowl:wikiPageExternalLink
<http://cedarlakedvd.com/>

EUCLID – Microtask crowdsourcing
applications for Linked Data
Selecting Appropriate
Crowdsourcing Approaches
Verify

Find

Contest

Microtasks

LD Experts
Difficult task
Final prize

Workers
Easy task
Micropayments

TripleCheckMate

MTurk

[Kontoskostas2013]

Adapted from [Bernstein2010]
EUCLID – Microtask crowdsourcing
applications for Linked Data
Presenting the Data
to the Crowd
Microtask interfaces: MTurk tasks

Incorrect object

• Selection of foaf:name or
rdfs:label to extract humanreadable descriptions
• Real object values extracted
automatically from Wikipedia
infoboxes

Incorrect data type

• Link to the Wikipedia article via
foaf:isPrimaryTopicOf

Incorrect outlink

• Preview of external pages by
implementing HTML iframe
EUCLID – Microtask crowdsourcing
applications for Linked Data
Results
Object values

Data types

Interlinks

Linked Data
experts

0.7151

0.8270

0.1525

MTurk

0.8977

0.4752

0.9412

(majority voting)

• Both forms of crowdsourcing can be applied to detect
certain LD quality issues
• The effort of LD experts must be applied on those tasks
demanding specific-domain skills
• MTurk crowd are exceptionally good at performing
comparison of data entries
EUCLID – Microtask crowdsourcing
applications for Linked Data

11
ZENCROWD

EUCLID – Microtask crowdsourcing
applications for Linked Data
ZenCrowd: Entity Linking by
the Crowd

• Combine both algorithmic and manual linking
• Automate manual linking via crowdsourcing
• Dynamically assess human workers with a
probabilistic reasoning framework
Crowd

Machines
EUCLID – Microtask crowdsourcing
applications for Linked Data

Algorithms
13
http://dbpedia.org/resource/Facebook

HTML:
<p>Facebook is not waiting for its initial
public offering to make its first big
purchase.</p><p>In its largest
acquisition to date, the social network
has purchased Instagram, the popular
photo-sharing application, for about $1
billion in cash and stock, the company
said Monday.</p>

http://dbpedia.org/resource/Instagram
owl:sameAs

fbase:Instagram

Google

RDFa
enrichment

Android

<p><span
about="http://dbpedia.org/resource/Facebook"><cit
e property=”rdfs:label">Facebook</cite> is not
waiting for its initial public offering to make its first
big purchase.</span></p><p><span
about="http://dbpedia.org/resource/Instagram">In
its largest acquisition to date, the social network has
purchased <cite
property=”rdfs:label">Instagram</cite> , the popular
photo-sharing application, for about $1 billion in cash
and stock, the company said Monday.</span></p>

EUCLID – Microtask crowdsourcing
applications for Linked Data

14
ZenCrowd Architecture
HTML
Pages

Input

Z enCrowd

Micro
Matching
Tasks

MicroTask Manager

Entity
Extractors

Crowdsourcing
Platform

HTML+ RDFa
Pages
Output

Algorithmic
Matchers

Decision Engine
Probabilistic
Network

LOD Index Get Entity

Workers Decisions

LOD Open Data Cloud

Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. ZenCrowd: Leveraging Probabilistic
Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking. In: 21st International Conference on
World Wide Web (WWW 2012).
EUCLID – Microtask crowdsourcing
applications for Linked Data

15
Entity Factor Graphs
• Graph components

pw1( )

w1

– Workers, links, clicks
Observed
variables
– Prior probabilities
c11
c21
– Link Factors
Link
– Constraints
factors

w2

c12

lf1( )

• Probabilistic
Inference

SameAs
l1
constraints

c22

c13

lf2( )

sa1-2( )

pl1( )

– Select all links with
posterior prob >τ

Worker
priors

pw2( )

l2
pl2( )

c23
lf3( )

u2-3( )

l3

Dataset
Unicity
constraints

pl3( )

Link priors
2 workers, 6 clicks, 3 candidate links

EUCLID – Microtask crowdsourcing
applications for Linked Data

16
Lessons Learnt
• Crowdsourcing + Prob reasoning works!
• But
– Different worker communities perform differently
– Many low quality workers
– Completion time may vary (based on reward)

• Need to find the right workers for your task
(see WWW13 paper)

EUCLID – Microtask crowdsourcing
applications for Linked Data

17
ZenCrowd Summary
• ZenCrowd: Probabilistic reasoning over automatic and
crowdsourcing methods for entity linking
• Standard crowdsourcing improves 6% over automatic
• 4% - 35% improvement over standard crowdsourcing
• 14% average improvement over automatic approaches

http://exascale.info/zencrowd/
• Follow up-work (VLDBJ):
– Also used for instance matching across datasets
– 3-way blocking with the crowd
EUCLID – Microtask crowdsourcing
applications for Linked Data

18
CROWDQ – CROWD-POWERED
QUERY UNDERSTANDING
EUCLID – Microtask crowdsourcing
applications for Linked Data
Motivation
• Web Search Engines can answer simple factual
queries directly on the result page
• Users with complex information needs are
often unsatisfied
• Purely automatic techniques are not enough
• We want to solve it with Crowdsourcing!

EUCLID – Microtask crowdsourcing
applications for Linked Data

20
CrowdQ
• CrowdQ is the first system that uses
crowdsourcing to
– Understand the intended meaning
– Build a structured query template
– Answer the query over Linked Open Data

Gianluca Demartini, Beth Trushkowsky, Tim Kraska, and Michael Franklin. CrowdQ:
Crowdsourced Query Understanding. In: 6th Biennial Conference on Innovative Data Systems
Research (CIDR 2013).
EUCLID – Microtask crowdsourcing
applications for Linked Data

21
22
CrowdQ Architecture
Off-line: query template generation with the help of the crowd
On-line: query template matching using NLP and search over open data
Keyword Query

On#
line'Complex'Query
Processing
Complex
query
classifier

User

Y

Off#
line'Complex'Query
Decomposition
query

POS + NER tagging
N

N

Structured Query

Vetrical
selection,
Unstructured
Search, ...

Crowd
Manager

Match with existing Queries Templ +
Answer Types
query templates

t1
t2

t3

Template Generation

Answer
Composition

Query Template Index

SERP

Query
Log

Structured
LOD Search

Crowdsourcing
Platform

Result Joiner

23
LOD Open Data Cloud
Hybrid Human-Machine
Pipeline
Q= birthdate of actors of forrest gump
Query annotation

Noun

Noun

Named entity

Verification

Is forrest gump this entity in the query?

Entity Relations

Which is the relation between: actors and forrest gump

Schema element

Starring

Verification

Is the relation between:
Indiana Jones – Harrison Ford
Back to the Future – Michael J. Fox
of the same type as
Forrest Gump – actors

starring

<dbpedia-owl:starring>

EUCLID – Microtask crowdsourcing
applications for Linked Data

24
Structured query generation
Q= birthdate of actors of forrest gump
SELECT ?y ?x
WHERE { ?y <dbpedia-owl:birthdate> ?x .
?z <dbpedia-owl:starring> ?y .
?z <rdfs:label> ‘Forrest Gump’ }

Results from BTC09:

EUCLID – Microtask crowdsourcing
applications for Linked Data

25
CROWDMAP & OTHERS

EUCLID – Microtask crowdsourcing
applications for Linked Data
CrowdMAP
• Experiments using MTurk, CrowdFlower and established benchmarks
• Enhancing the results of automatic techniques
• Fast, accurate, cost-effective
[Sarasua, Simperl, Noy, ISWC2012]

CartP
301-304

100R50P
Edas-Iasted

100R50P
Ekaw-Iasted

100R50P
Cmt-Ekaw

100R50P
ConfOf-Ekaw

Imp
301-304

PRECISION

0.53

0.8

1.0

1.0

0.93

0.73

RECALL

1.0

0.42

0.7

0.75

0.65

1.0

27
Taste IT! Try IT!
•
•
•
•

Restaurant review Android app developed in the Insemtives project
Uses Dbpedia concepts to generate structured reviews
Uses mechanism design/gamification to configure incentives
User study
–

2274 reviews by 180 reviewers referring to 900 restaurants, using 5667 DPpedia concepts

2500
2000
1500
1000
500
0
CAFE

FASTFOOD

PUB

RESTAURANT

Numer of reviews

Number of semantic annotations (type of cuisine)
Number of semantic annotations (dishes)

https://play.google.com/store/apps/details?id=insemtives.android&hl=en
11/11/2013

EUCLID – Microtask crowdsourcing
applications for Linked Data

28
LODrefine

http://research.zemanta.com/crowds-to-the-rescue/
11/11/2013

EUCLID – Microtask crowdsourcing
applications for Linked Data

29
Ontology Population

11/11/2013

EUCLID – Microtask crowdsourcing
applications for Linked Data

30
Linked Data Curation

EUCLID – Microtask crowdsourcing
applications for Linked Data

31
Problems and Challenges
•

What is feasible and how can tasks be optimally translated into microtasks?
– Examples: data quality assessment for technical and contextual features; subjective vs
objective tasks (also in modeling); open-ended questions

•

What to show to users
– Natural language descriptions of Linked Data/SPARQL
– How much context
– What form of rendering
– How about links?

•

How to combine with automatic tools
–

Which results to validate
•
•

•

Low precision (no fun for gamers...)
Low recall (vs all possible questions)

How to embed it into an existing application
– Tasks are fine granular, perceived as additional burden to the actual functionality

•

What to do with the resulting data?
– Integration into existing practices
– Vocabularies!

11/11/2013

EUCLID – Microtask crowdsourcing
applications for Linked Data

32
Web site:
https://sites.google.com/site/microtasktutorial/
SLIDES and EXERCISES:
https://github.com/maribelacosta/crowdsourcingtutorial

Full-day tutorial ISWC2013
Sydney Australia
11/11/2013

EUCLID – Microtask crowdsourcing
applications for Linked Data

33
For exercises, quiz and further material visit our website:

http://www.euclid-project.eu

Course

eBook

Other channels:

@euclid_project

euclidproject
EUCLID – Microtask crowdsourcing
applications for Linked Data

euclidproject
34

More Related Content

What's hot (20)

PPTX
Linked data life cycles
Michael Hausenblas
 
PPT
euclid_linkedup WWW tutorial (Besnik Fetahu)
Besnik Fetahu
 
PDF
Semantic Technologies and Triplestores for Business Intelligence
Marin Dimitrov
 
PDF
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
National Information Standards Organization (NISO)
 
PDF
Maximising (Re)Usability of Library metadata using Linked Data
Asuncion Gomez-Perez
 
PDF
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
eswcsummerschool
 
PDF
DBpedia Tutorial - Feb 2015, Dublin
m_ackermann
 
PPTX
NISO/DCMI Webinar: Metadata for Managing Scientific Research Data
National Information Standards Organization (NISO)
 
PPTX
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
Ontotext
 
PDF
The web of interlinked data and knowledge stripped
Sören Auer
 
PDF
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Ontotext
 
PPTX
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
Peter Haase
 
PPTX
April 24, 2013 NISO/DCMI Webinar: Deployment of RDA (Resource Description and...
National Information Standards Organization (NISO)
 
PDF
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
Ana Roxin
 
PPTX
Introduction to Linked Data Platform (LDP)
Hector Correa
 
PPTX
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Cory Lampert
 
PPTX
Introduction to W3C Linked Data Platform
Nandana Mihindukulasooriya
 
PPTX
Learning W3C Linked Data Platform with examples
Nandana Mihindukulasooriya
 
PPTX
NISO/DCMI Webinar: Metadata for Public Sector Administration
National Information Standards Organization (NISO)
 
PPT
The Power of Semantic Technologies to Explore Linked Open Data
Ontotext
 
Linked data life cycles
Michael Hausenblas
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
Besnik Fetahu
 
Semantic Technologies and Triplestores for Business Intelligence
Marin Dimitrov
 
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
National Information Standards Organization (NISO)
 
Maximising (Re)Usability of Library metadata using Linked Data
Asuncion Gomez-Perez
 
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
eswcsummerschool
 
DBpedia Tutorial - Feb 2015, Dublin
m_ackermann
 
NISO/DCMI Webinar: Metadata for Managing Scientific Research Data
National Information Standards Organization (NISO)
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
Ontotext
 
The web of interlinked data and knowledge stripped
Sören Auer
 
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Ontotext
 
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
Peter Haase
 
April 24, 2013 NISO/DCMI Webinar: Deployment of RDA (Resource Description and...
National Information Standards Organization (NISO)
 
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
Ana Roxin
 
Introduction to Linked Data Platform (LDP)
Hector Correa
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Cory Lampert
 
Introduction to W3C Linked Data Platform
Nandana Mihindukulasooriya
 
Learning W3C Linked Data Platform with examples
Nandana Mihindukulasooriya
 
NISO/DCMI Webinar: Metadata for Public Sector Administration
National Information Standards Organization (NISO)
 
The Power of Semantic Technologies to Explore Linked Open Data
Ontotext
 

Viewers also liked (17)

PPTX
Querying Linked Data on Android
EUCLID project
 
PDF
Conference Live: Accessible and Sociable Conference Semantic Data
Anna Lisa Gentile
 
PDF
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
Maribel Acosta Deibe
 
PDF
CrowdSem 2013 Workshop @ISWC2013
Lora Aroyo
 
PPTX
Online Learning and Linked Data: An Introduction
EUCLID project
 
PPTX
Best Practices for Linked Data Education
EUCLID project
 
PPTX
Speech Technology and Big Data
EUCLID project
 
PPTX
Crowdsourcing Linked Data Quality Assessment
Maribel Acosta Deibe
 
PPTX
Data Science Curriculum for Professionals
EUCLID project
 
PPTX
Mapping Relational Databases to Linked Data
EUCLID project
 
PPTX
Relational Database to RDF (RDB2RDF)
EUCLID project
 
PPTX
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
Maribel Acosta Deibe
 
PDF
Comment manager des geeks - Devoxx 2015
Publicis Sapient Engineering
 
PDF
Annotation Processor, trésor caché de la JVM
Raphaël Brugier
 
PPSX
Building and managing a research team %281%29
Deanship of Scientific Research , Umm Al Qura University
 
PDF
Conférence sur les annotations Java par Olivier Croisier (Zenika) au Paris JUG
Zenika
 
PDF
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
SlideShare
 
Querying Linked Data on Android
EUCLID project
 
Conference Live: Accessible and Sociable Conference Semantic Data
Anna Lisa Gentile
 
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
Maribel Acosta Deibe
 
CrowdSem 2013 Workshop @ISWC2013
Lora Aroyo
 
Online Learning and Linked Data: An Introduction
EUCLID project
 
Best Practices for Linked Data Education
EUCLID project
 
Speech Technology and Big Data
EUCLID project
 
Crowdsourcing Linked Data Quality Assessment
Maribel Acosta Deibe
 
Data Science Curriculum for Professionals
EUCLID project
 
Mapping Relational Databases to Linked Data
EUCLID project
 
Relational Database to RDF (RDB2RDF)
EUCLID project
 
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
Maribel Acosta Deibe
 
Comment manager des geeks - Devoxx 2015
Publicis Sapient Engineering
 
Annotation Processor, trésor caché de la JVM
Raphaël Brugier
 
Building and managing a research team %281%29
Deanship of Scientific Research , Umm Al Qura University
 
Conférence sur les annotations Java par Olivier Croisier (Zenika) au Paris JUG
Zenika
 
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
SlideShare
 
Ad

Similar to Microtask Crowdsourcing Applications for Linked Data (20)

PDF
Entity-Centric Data Management
eXascale Infolab
 
PDF
Enabling Citizen-empowered Apps over Linked Data
Diego López-de-Ipiña González-de-Artaza
 
PPTX
Entity centric data_management_2013
eXascale Infolab
 
PDF
Linked Data and Semantic Web Application Development by Peter Haase
Laboratory of Information Science and Semantic Technologies
 
PPT
Sem tech 2011 v8
dallemang
 
PPTX
Human Computation for Big Data
eXascale Infolab
 
PPTX
The FLuID Meta Model: Incrementally Compute Schema-level Indices for the Web...
Till Blume
 
PPTX
Linked Energy Data Generation
Filip Radulovic
 
PPTX
Intro to Spark development
Spark Summit
 
PDF
Introduction to Spark Training
Spark Summit
 
PDF
Poster
Kevin Razavet
 
PDF
ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...
eswcsummerschool
 
ODP
2011 07 14_fractalperspective
Curran Kelleher
 
PDF
La bi, l'informatique décisionnelle et les graphes
Cédric Fauvet
 
PPT
LOD2 Webinar Series: CubeViz
LOD2 Creating Knowledge out of Interlinked Data
 
PDF
Big Data to SMART Data : Process Scenario
CHAKER ALLAOUI
 
PPT
Cyberinfrastructure and Applications Overview: Howard University June22
marpierc
 
PDF
Hala skafkeynote@conferencedata2021
hala Skaf
 
PDF
Using the Semantic Web Stack to Make Big Data Smarter
Matheus Mota
 
PDF
Koneksys - Offering Services to Connect Data using the Data Web
Koneksys
 
Entity-Centric Data Management
eXascale Infolab
 
Enabling Citizen-empowered Apps over Linked Data
Diego López-de-Ipiña González-de-Artaza
 
Entity centric data_management_2013
eXascale Infolab
 
Linked Data and Semantic Web Application Development by Peter Haase
Laboratory of Information Science and Semantic Technologies
 
Sem tech 2011 v8
dallemang
 
Human Computation for Big Data
eXascale Infolab
 
The FLuID Meta Model: Incrementally Compute Schema-level Indices for the Web...
Till Blume
 
Linked Energy Data Generation
Filip Radulovic
 
Intro to Spark development
Spark Summit
 
Introduction to Spark Training
Spark Summit
 
ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...
eswcsummerschool
 
2011 07 14_fractalperspective
Curran Kelleher
 
La bi, l'informatique décisionnelle et les graphes
Cédric Fauvet
 
Big Data to SMART Data : Process Scenario
CHAKER ALLAOUI
 
Cyberinfrastructure and Applications Overview: Howard University June22
marpierc
 
Hala skafkeynote@conferencedata2021
hala Skaf
 
Using the Semantic Web Stack to Make Big Data Smarter
Matheus Mota
 
Koneksys - Offering Services to Connect Data using the Data Web
Koneksys
 
Ad

Recently uploaded (20)

PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 

Microtask Crowdsourcing Applications for Linked Data

  • 2. Architecture of Linked Data Applications Presentation Tier Logic Tier Data Tier Integrated Dataset Data Access Component Republication Republication Component Data Integration Component Vocabulary Mapping Interlinking SPARQL Wr. Physical Wrapper R2R Transf. Cleansing LD Wrapper RDF/ XML Web Data accessed via APIs SPARQL Endpoints EUCLID – Microtask crowdsourcing applications for Linked Data Relational Data Linked Data 2
  • 3. Data Tier Data Integration Component Data Access Component Data Integration Component Vocabulary Mapping Interlinking Cleansing • Consolidates the data retrieved from heterogeneous sources. • This component may operate at: – Schema level: Performs vocabulary mappings in order to translate data into a single unified schema. Links correspond to RDFS properties CH 2 or OWL property and class axioms. – Instance level: Performs entity linking, e.g., entity resolution via owl:sameAs links CH 3 EUCLID – Microtask crowdsourcing applications for Linked Data 3
  • 4. Data Tier (2) Data Integration Component Data Access Component Data Integration Component Vocabulary Mapping Interlinking Cleansing The data integration component can be enhanced by including microtask crowdsourcing apporaches: • Cleansing or data assessments: Assessment of DBpedia triples • Vocabulary mapping: CrowdMAP • Interlinking: ZenCrowd EUCLID – Microtask crowdsourcing applications for Linked Data 4
  • 5. Other Crowdsourcing-based Solutions for Linked Data Tasks • Query understanding: CrowdDQ • Ontology population: OntoGame • Linked Data curation: Urbanopoly • … EUCLID – Microtask crowdsourcing applications for Linked Data 5
  • 6. DBPEDIA QUALITY ASSESSMENT EUCLID – Microtask crowdsourcing applications for Linked Data
  • 7. Assessing DBpedia Triples Correct {s p o .} Dataset {s p o .} Incorrect + Quality issue 1. Selecting LD quality issues generated by erroneous extraction mechanisms and that can be detected by the crowd 2. Selecting the appropriate crowdsourcing approaches 3. Designing and generating the interfaces to present the data to the crowd EUCLID – Microtask crowdsourcing applications for Linked Data
  • 8. Selecting LD Quality Issues to Crowdsource Three categories of quality problems occur pervasively in DBpedia [Zaveri2013] and can be crowdsourced: • Incorrect object  Example: dbpedia:Dave_Dobbyn dbprop:dateOfBirth “3”. • Incorrect data type  Example: dbpedia:Torishima_Izu_Islands foaf:name “鳥島”@en. • Incorrect link to “external Web pages”  Example: dbpedia:John-Two-Hawks dbpediaowl:wikiPageExternalLink <http://cedarlakedvd.com/> EUCLID – Microtask crowdsourcing applications for Linked Data
  • 9. Selecting Appropriate Crowdsourcing Approaches Verify Find Contest Microtasks LD Experts Difficult task Final prize Workers Easy task Micropayments TripleCheckMate MTurk [Kontoskostas2013] Adapted from [Bernstein2010] EUCLID – Microtask crowdsourcing applications for Linked Data
  • 10. Presenting the Data to the Crowd Microtask interfaces: MTurk tasks Incorrect object • Selection of foaf:name or rdfs:label to extract humanreadable descriptions • Real object values extracted automatically from Wikipedia infoboxes Incorrect data type • Link to the Wikipedia article via foaf:isPrimaryTopicOf Incorrect outlink • Preview of external pages by implementing HTML iframe EUCLID – Microtask crowdsourcing applications for Linked Data
  • 11. Results Object values Data types Interlinks Linked Data experts 0.7151 0.8270 0.1525 MTurk 0.8977 0.4752 0.9412 (majority voting) • Both forms of crowdsourcing can be applied to detect certain LD quality issues • The effort of LD experts must be applied on those tasks demanding specific-domain skills • MTurk crowd are exceptionally good at performing comparison of data entries EUCLID – Microtask crowdsourcing applications for Linked Data 11
  • 12. ZENCROWD EUCLID – Microtask crowdsourcing applications for Linked Data
  • 13. ZenCrowd: Entity Linking by the Crowd • Combine both algorithmic and manual linking • Automate manual linking via crowdsourcing • Dynamically assess human workers with a probabilistic reasoning framework Crowd Machines EUCLID – Microtask crowdsourcing applications for Linked Data Algorithms 13
  • 14. http://dbpedia.org/resource/Facebook HTML: <p>Facebook is not waiting for its initial public offering to make its first big purchase.</p><p>In its largest acquisition to date, the social network has purchased Instagram, the popular photo-sharing application, for about $1 billion in cash and stock, the company said Monday.</p> http://dbpedia.org/resource/Instagram owl:sameAs fbase:Instagram Google RDFa enrichment Android <p><span about="http://dbpedia.org/resource/Facebook"><cit e property=”rdfs:label">Facebook</cite> is not waiting for its initial public offering to make its first big purchase.</span></p><p><span about="http://dbpedia.org/resource/Instagram">In its largest acquisition to date, the social network has purchased <cite property=”rdfs:label">Instagram</cite> , the popular photo-sharing application, for about $1 billion in cash and stock, the company said Monday.</span></p> EUCLID – Microtask crowdsourcing applications for Linked Data 14
  • 15. ZenCrowd Architecture HTML Pages Input Z enCrowd Micro Matching Tasks MicroTask Manager Entity Extractors Crowdsourcing Platform HTML+ RDFa Pages Output Algorithmic Matchers Decision Engine Probabilistic Network LOD Index Get Entity Workers Decisions LOD Open Data Cloud Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking. In: 21st International Conference on World Wide Web (WWW 2012). EUCLID – Microtask crowdsourcing applications for Linked Data 15
  • 16. Entity Factor Graphs • Graph components pw1( ) w1 – Workers, links, clicks Observed variables – Prior probabilities c11 c21 – Link Factors Link – Constraints factors w2 c12 lf1( ) • Probabilistic Inference SameAs l1 constraints c22 c13 lf2( ) sa1-2( ) pl1( ) – Select all links with posterior prob >τ Worker priors pw2( ) l2 pl2( ) c23 lf3( ) u2-3( ) l3 Dataset Unicity constraints pl3( ) Link priors 2 workers, 6 clicks, 3 candidate links EUCLID – Microtask crowdsourcing applications for Linked Data 16
  • 17. Lessons Learnt • Crowdsourcing + Prob reasoning works! • But – Different worker communities perform differently – Many low quality workers – Completion time may vary (based on reward) • Need to find the right workers for your task (see WWW13 paper) EUCLID – Microtask crowdsourcing applications for Linked Data 17
  • 18. ZenCrowd Summary • ZenCrowd: Probabilistic reasoning over automatic and crowdsourcing methods for entity linking • Standard crowdsourcing improves 6% over automatic • 4% - 35% improvement over standard crowdsourcing • 14% average improvement over automatic approaches http://exascale.info/zencrowd/ • Follow up-work (VLDBJ): – Also used for instance matching across datasets – 3-way blocking with the crowd EUCLID – Microtask crowdsourcing applications for Linked Data 18
  • 19. CROWDQ – CROWD-POWERED QUERY UNDERSTANDING EUCLID – Microtask crowdsourcing applications for Linked Data
  • 20. Motivation • Web Search Engines can answer simple factual queries directly on the result page • Users with complex information needs are often unsatisfied • Purely automatic techniques are not enough • We want to solve it with Crowdsourcing! EUCLID – Microtask crowdsourcing applications for Linked Data 20
  • 21. CrowdQ • CrowdQ is the first system that uses crowdsourcing to – Understand the intended meaning – Build a structured query template – Answer the query over Linked Open Data Gianluca Demartini, Beth Trushkowsky, Tim Kraska, and Michael Franklin. CrowdQ: Crowdsourced Query Understanding. In: 6th Biennial Conference on Innovative Data Systems Research (CIDR 2013). EUCLID – Microtask crowdsourcing applications for Linked Data 21
  • 22. 22
  • 23. CrowdQ Architecture Off-line: query template generation with the help of the crowd On-line: query template matching using NLP and search over open data Keyword Query On# line'Complex'Query Processing Complex query classifier User Y Off# line'Complex'Query Decomposition query POS + NER tagging N N Structured Query Vetrical selection, Unstructured Search, ... Crowd Manager Match with existing Queries Templ + Answer Types query templates t1 t2 t3 Template Generation Answer Composition Query Template Index SERP Query Log Structured LOD Search Crowdsourcing Platform Result Joiner 23 LOD Open Data Cloud
  • 24. Hybrid Human-Machine Pipeline Q= birthdate of actors of forrest gump Query annotation Noun Noun Named entity Verification Is forrest gump this entity in the query? Entity Relations Which is the relation between: actors and forrest gump Schema element Starring Verification Is the relation between: Indiana Jones – Harrison Ford Back to the Future – Michael J. Fox of the same type as Forrest Gump – actors starring <dbpedia-owl:starring> EUCLID – Microtask crowdsourcing applications for Linked Data 24
  • 25. Structured query generation Q= birthdate of actors of forrest gump SELECT ?y ?x WHERE { ?y <dbpedia-owl:birthdate> ?x . ?z <dbpedia-owl:starring> ?y . ?z <rdfs:label> ‘Forrest Gump’ } Results from BTC09: EUCLID – Microtask crowdsourcing applications for Linked Data 25
  • 26. CROWDMAP & OTHERS EUCLID – Microtask crowdsourcing applications for Linked Data
  • 27. CrowdMAP • Experiments using MTurk, CrowdFlower and established benchmarks • Enhancing the results of automatic techniques • Fast, accurate, cost-effective [Sarasua, Simperl, Noy, ISWC2012] CartP 301-304 100R50P Edas-Iasted 100R50P Ekaw-Iasted 100R50P Cmt-Ekaw 100R50P ConfOf-Ekaw Imp 301-304 PRECISION 0.53 0.8 1.0 1.0 0.93 0.73 RECALL 1.0 0.42 0.7 0.75 0.65 1.0 27
  • 28. Taste IT! Try IT! • • • • Restaurant review Android app developed in the Insemtives project Uses Dbpedia concepts to generate structured reviews Uses mechanism design/gamification to configure incentives User study – 2274 reviews by 180 reviewers referring to 900 restaurants, using 5667 DPpedia concepts 2500 2000 1500 1000 500 0 CAFE FASTFOOD PUB RESTAURANT Numer of reviews Number of semantic annotations (type of cuisine) Number of semantic annotations (dishes) https://play.google.com/store/apps/details?id=insemtives.android&hl=en 11/11/2013 EUCLID – Microtask crowdsourcing applications for Linked Data 28
  • 30. Ontology Population 11/11/2013 EUCLID – Microtask crowdsourcing applications for Linked Data 30
  • 31. Linked Data Curation EUCLID – Microtask crowdsourcing applications for Linked Data 31
  • 32. Problems and Challenges • What is feasible and how can tasks be optimally translated into microtasks? – Examples: data quality assessment for technical and contextual features; subjective vs objective tasks (also in modeling); open-ended questions • What to show to users – Natural language descriptions of Linked Data/SPARQL – How much context – What form of rendering – How about links? • How to combine with automatic tools – Which results to validate • • • Low precision (no fun for gamers...) Low recall (vs all possible questions) How to embed it into an existing application – Tasks are fine granular, perceived as additional burden to the actual functionality • What to do with the resulting data? – Integration into existing practices – Vocabularies! 11/11/2013 EUCLID – Microtask crowdsourcing applications for Linked Data 32
  • 33. Web site: https://sites.google.com/site/microtasktutorial/ SLIDES and EXERCISES: https://github.com/maribelacosta/crowdsourcingtutorial Full-day tutorial ISWC2013 Sydney Australia 11/11/2013 EUCLID – Microtask crowdsourcing applications for Linked Data 33
  • 34. For exercises, quiz and further material visit our website: http://www.euclid-project.eu Course eBook Other channels: @euclid_project euclidproject EUCLID – Microtask crowdsourcing applications for Linked Data euclidproject 34

Editor's Notes

  • #8: What quality issues are humans able to detect?
  • #17: embarrassingly parallelizable