SlideShare a Scribd company logo
W E R N E R L E Y H
H O M E R O F O N S E C A F I L H O
U N I V E R S I T Y O F S A O P A U L O , B R A Z I L
H U M A N F A C T O R S A N D S Y S T E M S I N T E R A C T I O N ( S Y S I ) ,
H U M A N F A C T O R S O N C R I S E S M A N A G E M E N T ( S 6 0 ) ,
W E D N E S D A Y J U L Y 1 9 , 1 3 : 3 0 - 1 5 : 3 0 ,
L O S A N G E L E S , C A , U S A
h t t p : / / w w w . a h f e 2 0 1 7 . o r g / p r o g r a m 1 . h t m l
R e l a t e d w o r k a n d p u b l i c a t i o n :
h t t p s : / / l i n k . s p r i n g e r . c o m / c h a p t e r / 1 0 . 1 0 0 7 / 9 7 8 - 3 - 3 1 9 - 6 0 3 6 6 - 7 _ 9
INTERLINKING STANDARDIZED
OPENSTREETMAP DATA AND CITIZEN
SCIENCE DATA IN THE OPENDATA CLOUD
Overview: The great picture
Summarizing
Our
Contributions
To
INTERLINKING STANDARDIZED
OPENSTREETMAP DATA AND CITIZEN
SCIENCE DATA
In the
OPENDATA CLOUD
Overview: The great picture –
What, Why, How, When and Where
What The aim of this work is to explore the OPPORTUNITIES offered by
SEMANTIC STANDARDIZATION to interlink primary spatial data from
OPENSTREETMAP(OSM) with the LINKEDOPENDATA Cloud(LOD)
Why One way to think about ENVIRONMENTAL DATA is as a NETWORK
OF CONNECTED ENTITIES, such as physical features, events,
publications, people, species, sequences, images, and data collections that
form the ENVIRONMENTAL KNOWLEDGE GRAPH. Many questions
in environmental informatics can be framed as paths in this graph.
How WIKIDATA is a world readable and writable COMMUNITY-DRIVEN
KNOWLEDGE BASE. It offers the opportunity to collaboratively
construct an open access knowledge graph that spans biology, medicine, and
all other domains of knowledge. IN THIS STUDY we discuss the
OPPORTUNITIES AND CHALLENGES provided by exploring Wikidata
as a CENTRAL INTEGRATION FACILITY by interlink it with OSM, a
COMMUNITY DRIVEN SPATIAL DATA COLLECTION.
When 2017
Where International
Overview: Outline and Content
 Part 0: Overview: The great picture
 What, Why, How, When and Where
 Part 1: Motivation: Exploring geodata in the semantic web
 Part 2: Introduction: Context and Former work
 Part 3: Main Challenges
 Amount of data + producers;
 Integration of data;
 Language in common for Semantic Interoperability;
 Part 4: Main Contributions
 Wikidata: Opportunities & challenges for Citizen Science;
 Part 4: Applied approaches
 The “Five stars of Linked Data Vocabulary Use”
 Wikidata´s Sparql Query Service
 Part 5: Results
 Interlinking Wikidata and Openstreetmap using SPARQL-queries: Try it !!!
 OverPass-based Text-Mining of OpenStreetMap data
 Part 6: Discussions, Current Limits and Conclusions
 Citation needed!, Text-Mining
 Annex
Motivation: Exploring geodata in the semantic web
Great work has already be done:
Motivation: LinkedGeoData Strengths
➢ LinkedGeoData uses the comprehensive OpenStreetMap spatial data collection
➢ to create a large spatial knowledge base.
➢ It consists of more than 3 billion nodes and 300 million ways and the resulting
RDF data comprises approximately 20 billion triples.
➢ The data is available according to the LINKED DATA PRINCIPLES and
INTERLINKED with WIKIDATA, DBPEDIA, WIKIMAPIA, GEONAMES,
NATURAL EARTH
➢ Please compare: http://linkedgeodata.org/About
The LINKEDGEODATA (LGD) data set is a work of the Agile Knowledge
Engineering and Semantic Web (AKSW) research group at the University of
Leipzig, a group mostly known for DBPEDIA, that uses the GeoSPARQL
vocabulary to represent OpenStreetMap data.
Motivation: LinkedGeoData Limitations
LinkedGeoData aims at TRIPLIFYING OPENSTREETMAP DUMPS EVERY
SIX MONTHS by map-ping OSM tags with reference to a publicly available ontology.
This is a very useful resource because it makes available classes that map keys and tags
used in Open Street Map nodes, but:
➢ LinkedGeoData approach cannot be used for all those scenarios where
TIMELINESS AND FRESHNESS OF INFORMATION is a must have. e.g. in
DISASTER RECOVERY where information needs to be available as soon as
possible.
➢ Although very useful and structured, a STATIC ONTOLOGY as the one modelled
within the LinkedGeoData project, cannot follow continuous data variations due to
USERS FREEDOM IN INSERTING NEW TAGS AND VALUES.
Please compare:
Anelli et al.: https://link.springer.com/chapter/10.1007/978-3-319-42007-3_29
Motivation: LinkedGeoData and Standardization
While there is an abundance of 5 star Linked Data available today, and while
the literature has largely focused on describing datasets (e.g., by adding
provenance information, or interlinking them),
➢ finding,
➢ querying, and
➢ integrating and interlinking
these data, within the datasets, is, to say the least, DIFFICULT.
➢ In practice, querying Linked Data that do not refer to a vocabulary is difficult
and understanding whether the results reflect the intended query is almost
impossible.
➢ LinkedGeoData, as it regards Openstreetmap classes, do only consider already
popular OSM-keys (tags) and do not promote the use of CONTROLLED
SCIENTIFIC VOCABULARIES.
Please compare:
➢ Janowicz et al.: http://corescholar.libraries.wright.edu/cse/158/
➢ Anelli et al.: https://link.springer.com/chapter/10.1007/978-3-319-42007-3_29
➢ Hall et al.: http://dl.acm.org/citation.cfm?doid=3025453.3025940
➢ Leyh et al.: https://link.springer.com/chapter/10.1007/978-3-319-60642-2_39
Introduction: Context and Former work
IN THE ANNEX WE ARE INTRODUCING
➢ Tim Berners Lee 5 star-schema of Linked Open Data
➢ The Five Stars of Linked Data Vocabulary Use
➢ Linked Data: RDF Data Model & SPARQL
➢ Wikipedia, Wikidata and OpenStreetMap
➢ Wikidata: Establishing computable trust
PLEASE CONSIDER ALSO OUR MAIN REFERENCES:
Good et al. http://icbo.cgrb.oregonstate.edu/node/331
Rafes and Germain https://hal.inria.fr/hal-01168496v1
Janowicz et al. http://corescholar.libraries.wright.edu/cse/158/
Erxleben et al. www.gtn-h.info/wp-content/uploads/2015/10/GTNH-7_Report.pdf
Hall et al. http://dl.acm.org/citation.cfm?doid=3025453.3025940
Anelli et al. https://link.springer.com/chapter/10.1007/978-3-319-42007-3_29
Berners-Lee https://www.w3.org/DesignIssues/LinkedData.html
Senaratne et al. http://www.tandfonline.com/doi/full/10.1080/13658816.2016.1189556
Main Challenges: Amount of data + producers,
Integration of data, Semantic Interoperability
I Amount of data + producers;
E.g.: According to a new study published by a top industry trade group,
Americans used nearly 10 billion gigabytes of mobile data last year:
That's more than double they used the year before.
https://www.washingtonpost.com/news/the-
switch/wp/2016/05/23/americans-are-using-a-whopping-amount-of-
data-these-days/
And:
Data consumers are increasingly becoming data producers
II Integration of this data:
III A Language in common for Semantic Interoperability
Main Contributions: Opportunities & challenges of
Wikidata + Five Stars of Linked Data Vocabulary Use …
… for Citizen Science: Good et al. outline questions on OPPORTUNITIES AND
CHALLENGES that WIKIDATA provides to the broad BIO-CURATION
COMMUNITY:
➢ This questions will be applied and discussed in our present work to improve
INTEROPERABILITY between OFFICIAL and COMMUNITY based
KNOWLEDGE, by reusing of controlled vocabularies.
➢ The present work can be regarded as a CITIZEN SCIENCE - ORIENTED
response to the FIVE STARS OF LINKED DATA VOCABULARY USE
system put forward by Janowicz et al. (references).
Applied approaches:
Summary
➢ (MA1) The FIVE STARS OF LINKED DATA VOCABULARY USE.
➢ (MA2) Literature Review.
➢ (MA3) WIKIDATA´S SPARQL QUERY SERVICE (WDQS).
➢ (MA4) Mapping against OPPORTUNITIES and CHALLENGES of WIKIDATA.
➢ (MA5) We explored the so-called TALK-PAGES in COMMUNITY-WIKI.
Applied approaches:
Ontologies to make your data more usable
Results - Wikidata-based integration:
Information gathering (queries): Try it !!! !!!
Interlinking
Wikidata
and
Openstreetmap
World-Map of
hospitals
(http://tinyurl.co
m/k46vxg2):
Try it !!! !!!
Results - Wikidata-based integration:
Information gathering (queries): Try it !!! !!!
Interlinking Wikidata
and Openstreetmap –
World-Map of hospitals
(http://tinyurl.com/k46vxg2):
Try it !!! !!!
Discussion - Current Limits:
Wikidata trustworthiness: Citation needed!
A KEY FEATURE OF THE WIKIDATA is the capacity to provide PROVENANCE
FOR ITS CLAIMS (the triples that compose the knowledge graph) through
references.
➢ Each claim can be supported by any number of REFERENCES TO
SUPPORTING SOURCES of information.
➢ Unfortunately, currently MANY CLAIMS WERE NOT ASSIGNED
REFERENCES.
Please compare:
Good et al.: http://icbo.cgrb.oregonstate.edu/node/331
Katie Mika: https://library.mcz.harvard.edu/blog/role-librarians-wikidata-and%C2%A0wikicite
Discussion - Current Limits:
when linking between Wikidata and OpenStreetMap
Linking from OSM to Wikidata
➢ In OpenStreetMap Wikidata entities can be linked on every kind of osm object
using the wikidata key.
➢ An other way to link objects is by using Key:Wikipedia
but (!!!)
Linking from Wikidata to OSM
➢ This is be problematic, since OSM's IDs are NOT stable.
➢ A very limited way is to link from Wikidata entities to OpenStreetMap relations
by using Property P402
Please compare: http://wiki.openstreetmap.org/wiki/Wikidata
Discussion - Complementary approach:
OverPass-based Text-Mining of OpenStreetMap data
(Geographic) Datasets are typically integrated (“joined”) exploring
➢ Descriptive Context Dimensions
➢ “who, what, where, when, why, and how”
provided by the attributes of (geographic) features.
In the case of OpenStreetMap this can be done by
➢ Exploring “KEY” and “VALUE” pairs
provided by OSM-TAG as attributes to describe “OSM-node” representing for
example Points of Interests (POIs)
➢ A particular powerful solution when applied with protocols
based on internationally accepted metadata standards
Conclusions:
Summary
This study shows how “standardized” WIKIDATA and OPENSTREETMAP
“interlinking” can lead to LOD-integration,
particularly when the capacity of WIKIDATA is explored as a POWERFUL
INTEGRATOR THROUGHOUT the Semantic Web.
The study provides a description of conceptual considerations as well as the
WIKIDATA-BASED INTEGRATION
of
“PRIMARY COMMUNITY DATA (e.g. OpenstreetMap)”
with
AUTHORITATIVE LINKED-OPEN-DATA (e.g. WikiData)”
Werner Leyh
h t t p s : / / w i k i . o s g e o . o r g / w i k i / U s e r : W e r n e r L e y h
Grupo de Pesquisa CNPq/USP
I N F R A E S T R U T U R A D E D A D O S E S P A C I A I S ( G E P I D E )
h t t p : / / d g p . c n p q . b r / b u s c a o p e r a c i o n a l / d e t a l h e g r u p o . j s p ? g r u p o =
0 0 6 7 1 0 7 H R Y 8 K T 0
Questions ?
Interested in linking Wikidata, Openstreetmap and
scientific Datasets?
Join us !
Annex
Annex:
Tim Berners Lee 5 star-schema of Linked Open Data
☆ Data is available on the Web, in whatever format.
☆☆ Available as machine-readable structured data, (i.e., not a scanned image).
☆☆☆ Available in a non-proprietary format, (i.e, CSV, not Microsoft Excel).
☆☆☆☆ Published using open standards from the W3C (RDF and SPARQL).
☆☆☆☆☆ All of the above and links to other Linked Open Data.
Annex:
The Five Stars of Linked Data Vocabulary Use
The Five Stars of Linked Data Vocabulary Use:
Interestingly,
the original ‘Tim Berners Lee 5 star-schema”
does not make any assumptions about the use of vocabularies.
In practice, however,
querying Linked Data that do not refer to a vocabulary is difficult
and
➢ understanding whether the results reflect the intended query is
almost impossible.
See Five Stars of Linked Data Vocabulary Use
http://semantic-web-journal.net/content/ five-stars-linked-data-vocabulary-use
Annex:
Linked Data: RDF Data Model & SPARQL
RDF breaks every
piece of information
down in triples:
➢ Subject – a
resource, which
is identified with
a URI.
➢ Predicate – a
URI-identified
reused
specification of
the relationship.
➢ Object – a
resource or
literal to which
the subject is
related.
<Bob> <is a> <person>.
<Bob> <is interested in> <the Mona Lisa>.
<the Mona Lisa> <was created by> <Leonardo da Vinci>
Source: https://www.w3.org/TR/2014/NOTE-rdf11-
primer-20140624/#section-triple
Annex:
Wikipedia’s mission
Annex:
Wikipedia’s scale
➢ 30 million articles
➢ 286 languages
➢ 2 billion edits
➢ 8000 views per second
➢ 500 million monthly visitors
➢ 5th most popular website
➢ 2000 x larger than Britannica
Annex:
Wikidata’s mission
Wikidata is the community created knowledge base of Wikipedia:
One reason for this strong community participation
is the tight integration with Wikipedia:
➢ as of today,
➢ almost EVERY Wikipedia page
➢ in EVERY language
➢ incorporates content from Wikidata
(compare Erxleben et al. in Ref.)
Annex:
WikiData characteristics
➢ An item for any notable subject
➢ “Claims” (or “statements”) about each item,
using clearly-defined “properties”
➢ Properties may have “qualifiers”
➢ Every item is an “instance of” or “subclass
of” another
➢ Claims show relationships between items
➢ Multi-lingual labels and descriptions
➢ Claims may include identifiers
Annex:
WikiData characteristics
Annex:
OpenStreetMap mission
"OpenStreetMap is a project aimed squarely at creating and providing free
geographic data such as street maps to anyone who wants them."
-- www.openstreetmap.org
Annex:
OpenStreetMap characteristics
OpenStreetMap Basic Definitions:
➢ It’s a very large database of XML data
➢ Each feature is of a certain basic type, and
is defined by Tags (key value pairs)
➢ Basic types: Nodes (points), Ways (lines),
Areas (polygons), Relations (groups)
➢ Tags: (domain.)key (concept) = value
(instance)

More Related Content

PPTX
URI Disambiguation in the Context of Linked Data
butest
 
PPTX
FAIR Signposting: A KISS Approach to a Burning Issue
Herbert Van de Sompel
 
PPT
Semantic Web Good News
Frank van Harmelen
 
PPT
Linked Data Overview - AGI Technical SIG
Chris Ewing
 
PDF
Open Data and Data Journalism
Irina Radchenko
 
PPTX
Towards a Unified PageRank for DBpedia and Wikidata
Andreas Thalhammer
 
PPTX
An Initial Analysis Of The Contextual Information Available Within Auction Po...
Thomas Lancaster
 
PDF
Information Extraction from Web-Scale N-Gram Data
Gerard de Melo
 
URI Disambiguation in the Context of Linked Data
butest
 
FAIR Signposting: A KISS Approach to a Burning Issue
Herbert Van de Sompel
 
Semantic Web Good News
Frank van Harmelen
 
Linked Data Overview - AGI Technical SIG
Chris Ewing
 
Open Data and Data Journalism
Irina Radchenko
 
Towards a Unified PageRank for DBpedia and Wikidata
Andreas Thalhammer
 
An Initial Analysis Of The Contextual Information Available Within Auction Po...
Thomas Lancaster
 
Information Extraction from Web-Scale N-Gram Data
Gerard de Melo
 

What's hot (20)

PDF
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
Martin Hepp
 
PDF
How links can make your open data even greater
Cristina Sarasua
 
PPTX
How much does $1.7 billion buy?
Martin Klein
 
PDF
Semantic Web Applications in Libraries: The Road to BIBFRAME
National Information Standards Organization (NISO)
 
PPTX
Introduction to Linked Data
Juan Sequeda
 
PDF
Data Journalism at HSE conference
Irina Radchenko
 
PDF
Exploration, visualization and querying of linked open data sources
Laura Po
 
PPT
Data, data, data
andrewxhill
 
PDF
Linked Open Data
Laura Hollink
 
PDF
Open hpi semweb-06-part5
Nadine Ludwig
 
PPTX
Hack a LOD Schalk Clariah WP4
Ruben Schalk
 
PDF
Introduction to linked data
Laura Po
 
PDF
Data Journalism - Finding Data
Bahareh Heravi
 
PDF
Integrating Covid-19 Bioassays in the Open Research Knowledge Graph
Jennifer D'Souza
 
PPTX
Discovering Scholarly Orphans Using ORCID
Martin Klein
 
PDF
Open Data Challenges and Opportunities
Jieh-Shan YEH
 
PPTX
Linking Data with sameAs: Challenges and Solutions - Workshop
Adrian Stevenson
 
DOC
Done reread detecting phrase-level duplication on the world wide we
James Arnold
 
PDF
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
Mark Matienzo
 
PPTX
Keystone summer school 2015 paolo-missier-provenance
Paolo Missier
 
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
Martin Hepp
 
How links can make your open data even greater
Cristina Sarasua
 
How much does $1.7 billion buy?
Martin Klein
 
Semantic Web Applications in Libraries: The Road to BIBFRAME
National Information Standards Organization (NISO)
 
Introduction to Linked Data
Juan Sequeda
 
Data Journalism at HSE conference
Irina Radchenko
 
Exploration, visualization and querying of linked open data sources
Laura Po
 
Data, data, data
andrewxhill
 
Linked Open Data
Laura Hollink
 
Open hpi semweb-06-part5
Nadine Ludwig
 
Hack a LOD Schalk Clariah WP4
Ruben Schalk
 
Introduction to linked data
Laura Po
 
Data Journalism - Finding Data
Bahareh Heravi
 
Integrating Covid-19 Bioassays in the Open Research Knowledge Graph
Jennifer D'Souza
 
Discovering Scholarly Orphans Using ORCID
Martin Klein
 
Open Data Challenges and Opportunities
Jieh-Shan YEH
 
Linking Data with sameAs: Challenges and Solutions - Workshop
Adrian Stevenson
 
Done reread detecting phrase-level duplication on the world wide we
James Arnold
 
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
Mark Matienzo
 
Keystone summer school 2015 paolo-missier-provenance
Paolo Missier
 
Ad

Similar to Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the OpenData Cloud (20)

PDF
A Conceptual Building-Block and Practical OpenStreetMap-Interface for Sharing...
Werner Leyh
 
PPTX
Upgrading maps with Linked Data
Francisco J. Lopez-Pellicer
 
PPSX
Linked Data to Improve the OER Experience
The Open Education Consortium
 
PDF
Open Government Data on the Web - A Semantic Approach
Peter Krantz
 
PDF
Maintaining scholarly standards in the digital age: Publishing historical gaz...
Humphrey Southall
 
PPTX
Ontologies for Emergency & Disaster Management
Stephane Fellah
 
PPTX
Linked Open Data Utrecht University Library
Ruben Schalk
 
PDF
Linked Open Government Data: What’s Next?
Li Ding
 
PPTX
The Semantic Web Exists. What Next?
Anna Fensel
 
PDF
Metadata as Linked Data for Research Data Repositories
andrea huang
 
PDF
Linked Open (Geo)Data and the Distributed Ontology Language – a perfect match
Christoph Lange
 
PPTX
2014 10 23 (fie2014) emadrid upm roadmap towards the openness of educational ...
eMadrid network
 
DOCX
Linked data migrational framework
Muthu Kumaar Thangavelu
 
PPT
Core Geospatial Ontologies
Stephane Fellah
 
PPTX
Linked Open Data Alignment and Enrichment Using Bootstrapping Based Techniques
Prateek Jain
 
PPTX
Spatial Semantics for Better Interoperability and Analysis: Challenges and Ex...
Artificial Intelligence Institute at UofSC
 
PDF
Planetdata simpda
Elena Simperl
 
PDF
PlanetData: Consuming Structured Data at Web Scale
PlanetData Network of Excellence
 
PDF
Methodological Guidelines for Publishing Linked Data
Boris Villazón-Terrazas
 
PDF
Implementing Linked Data in Low-Resource Conditions
AIMS (Agricultural Information Management Standards)
 
A Conceptual Building-Block and Practical OpenStreetMap-Interface for Sharing...
Werner Leyh
 
Upgrading maps with Linked Data
Francisco J. Lopez-Pellicer
 
Linked Data to Improve the OER Experience
The Open Education Consortium
 
Open Government Data on the Web - A Semantic Approach
Peter Krantz
 
Maintaining scholarly standards in the digital age: Publishing historical gaz...
Humphrey Southall
 
Ontologies for Emergency & Disaster Management
Stephane Fellah
 
Linked Open Data Utrecht University Library
Ruben Schalk
 
Linked Open Government Data: What’s Next?
Li Ding
 
The Semantic Web Exists. What Next?
Anna Fensel
 
Metadata as Linked Data for Research Data Repositories
andrea huang
 
Linked Open (Geo)Data and the Distributed Ontology Language – a perfect match
Christoph Lange
 
2014 10 23 (fie2014) emadrid upm roadmap towards the openness of educational ...
eMadrid network
 
Linked data migrational framework
Muthu Kumaar Thangavelu
 
Core Geospatial Ontologies
Stephane Fellah
 
Linked Open Data Alignment and Enrichment Using Bootstrapping Based Techniques
Prateek Jain
 
Spatial Semantics for Better Interoperability and Analysis: Challenges and Ex...
Artificial Intelligence Institute at UofSC
 
Planetdata simpda
Elena Simperl
 
PlanetData: Consuming Structured Data at Web Scale
PlanetData Network of Excellence
 
Methodological Guidelines for Publishing Linked Data
Boris Villazón-Terrazas
 
Implementing Linked Data in Low-Resource Conditions
AIMS (Agricultural Information Management Standards)
 
Ad

More from Werner Leyh (8)

PDF
Bim manager-d77-effizientes planen und verwalten von bauwerken (ubersicht)
Werner Leyh
 
PDF
Bim manager-d07-informationsmanagement und qualitatssicherung
Werner Leyh
 
PDF
Bim manager-d06-modell- und objektinformationen, objektgeometrie
Werner Leyh
 
PDF
Bim manager-d04-die funf bim-faktoren
Werner Leyh
 
PDF
Bim manager-d03-aktueller stand der entwicklung und anwendung
Werner Leyh
 
PDF
Bim manager-d02-bim definition, bedeutung und auswirkung
Werner Leyh
 
PDF
Bim manager-d01-effizientes planen und verwalten von bauwerken
Werner Leyh
 
PDF
Citizen Science Involving Collections of Standardized Community Data
Werner Leyh
 
Bim manager-d77-effizientes planen und verwalten von bauwerken (ubersicht)
Werner Leyh
 
Bim manager-d07-informationsmanagement und qualitatssicherung
Werner Leyh
 
Bim manager-d06-modell- und objektinformationen, objektgeometrie
Werner Leyh
 
Bim manager-d04-die funf bim-faktoren
Werner Leyh
 
Bim manager-d03-aktueller stand der entwicklung und anwendung
Werner Leyh
 
Bim manager-d02-bim definition, bedeutung und auswirkung
Werner Leyh
 
Bim manager-d01-effizientes planen und verwalten von bauwerken
Werner Leyh
 
Citizen Science Involving Collections of Standardized Community Data
Werner Leyh
 

Recently uploaded (20)

PPT
1. Basic Principles of Medical Microbiology Part 1.ppt
separatedwalk
 
PDF
study of microbiologically influenced corrosion of 2205 duplex stainless stee...
ahmadfreak180
 
PPTX
Hydrocarbons Pollution. OIL pollutionpptx
AkCreation33
 
PPTX
Introduction to biochemistry.ppt-pdf_shotrs!
Vishnukanchi darade
 
PDF
Bacteria, Different sizes and Shapes of of bacteria
Vishal Sakhare
 
PPTX
Home Garden as a Component of Agroforestry system : A survey-based Study
AkhangshaRoy
 
PPTX
How to Add SBCGlobal.net Email to MacBook Air in Minutes
raymondjones7273
 
PDF
Multiwavelength Study of a Hyperluminous X-Ray Source near NGC6099: A Strong ...
Sérgio Sacani
 
PDF
Gamifying Agent-Based Models in Cormas: Towards the Playable Architecture for...
ESUG
 
PPTX
General Characters and classification up to Order Level of Sub Class Pterygot...
Dr Showkat Ahmad Wani
 
PDF
Vera C. Rubin Observatory of interstellar Comet 3I ATLAS - July 21, 2025.pdf
SOCIEDAD JULIO GARAVITO
 
PDF
Renewable Energy Resources (Solar, Wind, Nuclear, Geothermal) Presentation
RimshaNaeem23
 
PDF
The Cosmic Symphony: How Photons Shape the Universe and Our Place Within It
kutatomoshi
 
PDF
A water-rich interior in the temperate sub-Neptune K2-18 b revealed by JWST
Sérgio Sacani
 
PDF
Directing Generative AI for Pharo Documentation
ESUG
 
PDF
Sujay Rao Mandavilli Multi-barreled appraoch to educational reform FINAL FINA...
Sujay Rao Mandavilli
 
PPTX
METABOLIC_SYNDROME Dr Shadab- kgmu lucknow pptx
ShadabAlam169087
 
PPTX
Unit 4 - Astronomy and Astrophysics - Milky Way And External Galaxies
RDhivya6
 
PDF
Identification of Bacteria notes by EHH.pdf
Eshwarappa H
 
PPTX
fghvqwhfugqaifbiqufbiquvbfuqvfuqyvfqvfouiqvfq
PERMISONJERWIN
 
1. Basic Principles of Medical Microbiology Part 1.ppt
separatedwalk
 
study of microbiologically influenced corrosion of 2205 duplex stainless stee...
ahmadfreak180
 
Hydrocarbons Pollution. OIL pollutionpptx
AkCreation33
 
Introduction to biochemistry.ppt-pdf_shotrs!
Vishnukanchi darade
 
Bacteria, Different sizes and Shapes of of bacteria
Vishal Sakhare
 
Home Garden as a Component of Agroforestry system : A survey-based Study
AkhangshaRoy
 
How to Add SBCGlobal.net Email to MacBook Air in Minutes
raymondjones7273
 
Multiwavelength Study of a Hyperluminous X-Ray Source near NGC6099: A Strong ...
Sérgio Sacani
 
Gamifying Agent-Based Models in Cormas: Towards the Playable Architecture for...
ESUG
 
General Characters and classification up to Order Level of Sub Class Pterygot...
Dr Showkat Ahmad Wani
 
Vera C. Rubin Observatory of interstellar Comet 3I ATLAS - July 21, 2025.pdf
SOCIEDAD JULIO GARAVITO
 
Renewable Energy Resources (Solar, Wind, Nuclear, Geothermal) Presentation
RimshaNaeem23
 
The Cosmic Symphony: How Photons Shape the Universe and Our Place Within It
kutatomoshi
 
A water-rich interior in the temperate sub-Neptune K2-18 b revealed by JWST
Sérgio Sacani
 
Directing Generative AI for Pharo Documentation
ESUG
 
Sujay Rao Mandavilli Multi-barreled appraoch to educational reform FINAL FINA...
Sujay Rao Mandavilli
 
METABOLIC_SYNDROME Dr Shadab- kgmu lucknow pptx
ShadabAlam169087
 
Unit 4 - Astronomy and Astrophysics - Milky Way And External Galaxies
RDhivya6
 
Identification of Bacteria notes by EHH.pdf
Eshwarappa H
 
fghvqwhfugqaifbiqufbiquvbfuqvfuqyvfqvfouiqvfq
PERMISONJERWIN
 

Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the OpenData Cloud

  • 1. W E R N E R L E Y H H O M E R O F O N S E C A F I L H O U N I V E R S I T Y O F S A O P A U L O , B R A Z I L H U M A N F A C T O R S A N D S Y S T E M S I N T E R A C T I O N ( S Y S I ) , H U M A N F A C T O R S O N C R I S E S M A N A G E M E N T ( S 6 0 ) , W E D N E S D A Y J U L Y 1 9 , 1 3 : 3 0 - 1 5 : 3 0 , L O S A N G E L E S , C A , U S A h t t p : / / w w w . a h f e 2 0 1 7 . o r g / p r o g r a m 1 . h t m l R e l a t e d w o r k a n d p u b l i c a t i o n : h t t p s : / / l i n k . s p r i n g e r . c o m / c h a p t e r / 1 0 . 1 0 0 7 / 9 7 8 - 3 - 3 1 9 - 6 0 3 6 6 - 7 _ 9 INTERLINKING STANDARDIZED OPENSTREETMAP DATA AND CITIZEN SCIENCE DATA IN THE OPENDATA CLOUD
  • 2. Overview: The great picture Summarizing Our Contributions To INTERLINKING STANDARDIZED OPENSTREETMAP DATA AND CITIZEN SCIENCE DATA In the OPENDATA CLOUD
  • 3. Overview: The great picture – What, Why, How, When and Where What The aim of this work is to explore the OPPORTUNITIES offered by SEMANTIC STANDARDIZATION to interlink primary spatial data from OPENSTREETMAP(OSM) with the LINKEDOPENDATA Cloud(LOD) Why One way to think about ENVIRONMENTAL DATA is as a NETWORK OF CONNECTED ENTITIES, such as physical features, events, publications, people, species, sequences, images, and data collections that form the ENVIRONMENTAL KNOWLEDGE GRAPH. Many questions in environmental informatics can be framed as paths in this graph. How WIKIDATA is a world readable and writable COMMUNITY-DRIVEN KNOWLEDGE BASE. It offers the opportunity to collaboratively construct an open access knowledge graph that spans biology, medicine, and all other domains of knowledge. IN THIS STUDY we discuss the OPPORTUNITIES AND CHALLENGES provided by exploring Wikidata as a CENTRAL INTEGRATION FACILITY by interlink it with OSM, a COMMUNITY DRIVEN SPATIAL DATA COLLECTION. When 2017 Where International
  • 4. Overview: Outline and Content  Part 0: Overview: The great picture  What, Why, How, When and Where  Part 1: Motivation: Exploring geodata in the semantic web  Part 2: Introduction: Context and Former work  Part 3: Main Challenges  Amount of data + producers;  Integration of data;  Language in common for Semantic Interoperability;  Part 4: Main Contributions  Wikidata: Opportunities & challenges for Citizen Science;  Part 4: Applied approaches  The “Five stars of Linked Data Vocabulary Use”  Wikidata´s Sparql Query Service  Part 5: Results  Interlinking Wikidata and Openstreetmap using SPARQL-queries: Try it !!!  OverPass-based Text-Mining of OpenStreetMap data  Part 6: Discussions, Current Limits and Conclusions  Citation needed!, Text-Mining  Annex
  • 5. Motivation: Exploring geodata in the semantic web Great work has already be done:
  • 6. Motivation: LinkedGeoData Strengths ➢ LinkedGeoData uses the comprehensive OpenStreetMap spatial data collection ➢ to create a large spatial knowledge base. ➢ It consists of more than 3 billion nodes and 300 million ways and the resulting RDF data comprises approximately 20 billion triples. ➢ The data is available according to the LINKED DATA PRINCIPLES and INTERLINKED with WIKIDATA, DBPEDIA, WIKIMAPIA, GEONAMES, NATURAL EARTH ➢ Please compare: http://linkedgeodata.org/About The LINKEDGEODATA (LGD) data set is a work of the Agile Knowledge Engineering and Semantic Web (AKSW) research group at the University of Leipzig, a group mostly known for DBPEDIA, that uses the GeoSPARQL vocabulary to represent OpenStreetMap data.
  • 7. Motivation: LinkedGeoData Limitations LinkedGeoData aims at TRIPLIFYING OPENSTREETMAP DUMPS EVERY SIX MONTHS by map-ping OSM tags with reference to a publicly available ontology. This is a very useful resource because it makes available classes that map keys and tags used in Open Street Map nodes, but: ➢ LinkedGeoData approach cannot be used for all those scenarios where TIMELINESS AND FRESHNESS OF INFORMATION is a must have. e.g. in DISASTER RECOVERY where information needs to be available as soon as possible. ➢ Although very useful and structured, a STATIC ONTOLOGY as the one modelled within the LinkedGeoData project, cannot follow continuous data variations due to USERS FREEDOM IN INSERTING NEW TAGS AND VALUES. Please compare: Anelli et al.: https://link.springer.com/chapter/10.1007/978-3-319-42007-3_29
  • 8. Motivation: LinkedGeoData and Standardization While there is an abundance of 5 star Linked Data available today, and while the literature has largely focused on describing datasets (e.g., by adding provenance information, or interlinking them), ➢ finding, ➢ querying, and ➢ integrating and interlinking these data, within the datasets, is, to say the least, DIFFICULT. ➢ In practice, querying Linked Data that do not refer to a vocabulary is difficult and understanding whether the results reflect the intended query is almost impossible. ➢ LinkedGeoData, as it regards Openstreetmap classes, do only consider already popular OSM-keys (tags) and do not promote the use of CONTROLLED SCIENTIFIC VOCABULARIES. Please compare: ➢ Janowicz et al.: http://corescholar.libraries.wright.edu/cse/158/ ➢ Anelli et al.: https://link.springer.com/chapter/10.1007/978-3-319-42007-3_29 ➢ Hall et al.: http://dl.acm.org/citation.cfm?doid=3025453.3025940 ➢ Leyh et al.: https://link.springer.com/chapter/10.1007/978-3-319-60642-2_39
  • 9. Introduction: Context and Former work IN THE ANNEX WE ARE INTRODUCING ➢ Tim Berners Lee 5 star-schema of Linked Open Data ➢ The Five Stars of Linked Data Vocabulary Use ➢ Linked Data: RDF Data Model & SPARQL ➢ Wikipedia, Wikidata and OpenStreetMap ➢ Wikidata: Establishing computable trust PLEASE CONSIDER ALSO OUR MAIN REFERENCES: Good et al. http://icbo.cgrb.oregonstate.edu/node/331 Rafes and Germain https://hal.inria.fr/hal-01168496v1 Janowicz et al. http://corescholar.libraries.wright.edu/cse/158/ Erxleben et al. www.gtn-h.info/wp-content/uploads/2015/10/GTNH-7_Report.pdf Hall et al. http://dl.acm.org/citation.cfm?doid=3025453.3025940 Anelli et al. https://link.springer.com/chapter/10.1007/978-3-319-42007-3_29 Berners-Lee https://www.w3.org/DesignIssues/LinkedData.html Senaratne et al. http://www.tandfonline.com/doi/full/10.1080/13658816.2016.1189556
  • 10. Main Challenges: Amount of data + producers, Integration of data, Semantic Interoperability I Amount of data + producers; E.g.: According to a new study published by a top industry trade group, Americans used nearly 10 billion gigabytes of mobile data last year: That's more than double they used the year before. https://www.washingtonpost.com/news/the- switch/wp/2016/05/23/americans-are-using-a-whopping-amount-of- data-these-days/ And: Data consumers are increasingly becoming data producers II Integration of this data: III A Language in common for Semantic Interoperability
  • 11. Main Contributions: Opportunities & challenges of Wikidata + Five Stars of Linked Data Vocabulary Use … … for Citizen Science: Good et al. outline questions on OPPORTUNITIES AND CHALLENGES that WIKIDATA provides to the broad BIO-CURATION COMMUNITY: ➢ This questions will be applied and discussed in our present work to improve INTEROPERABILITY between OFFICIAL and COMMUNITY based KNOWLEDGE, by reusing of controlled vocabularies. ➢ The present work can be regarded as a CITIZEN SCIENCE - ORIENTED response to the FIVE STARS OF LINKED DATA VOCABULARY USE system put forward by Janowicz et al. (references).
  • 12. Applied approaches: Summary ➢ (MA1) The FIVE STARS OF LINKED DATA VOCABULARY USE. ➢ (MA2) Literature Review. ➢ (MA3) WIKIDATA´S SPARQL QUERY SERVICE (WDQS). ➢ (MA4) Mapping against OPPORTUNITIES and CHALLENGES of WIKIDATA. ➢ (MA5) We explored the so-called TALK-PAGES in COMMUNITY-WIKI.
  • 13. Applied approaches: Ontologies to make your data more usable
  • 14. Results - Wikidata-based integration: Information gathering (queries): Try it !!! !!! Interlinking Wikidata and Openstreetmap World-Map of hospitals (http://tinyurl.co m/k46vxg2): Try it !!! !!!
  • 15. Results - Wikidata-based integration: Information gathering (queries): Try it !!! !!! Interlinking Wikidata and Openstreetmap – World-Map of hospitals (http://tinyurl.com/k46vxg2): Try it !!! !!!
  • 16. Discussion - Current Limits: Wikidata trustworthiness: Citation needed! A KEY FEATURE OF THE WIKIDATA is the capacity to provide PROVENANCE FOR ITS CLAIMS (the triples that compose the knowledge graph) through references. ➢ Each claim can be supported by any number of REFERENCES TO SUPPORTING SOURCES of information. ➢ Unfortunately, currently MANY CLAIMS WERE NOT ASSIGNED REFERENCES. Please compare: Good et al.: http://icbo.cgrb.oregonstate.edu/node/331 Katie Mika: https://library.mcz.harvard.edu/blog/role-librarians-wikidata-and%C2%A0wikicite
  • 17. Discussion - Current Limits: when linking between Wikidata and OpenStreetMap Linking from OSM to Wikidata ➢ In OpenStreetMap Wikidata entities can be linked on every kind of osm object using the wikidata key. ➢ An other way to link objects is by using Key:Wikipedia but (!!!) Linking from Wikidata to OSM ➢ This is be problematic, since OSM's IDs are NOT stable. ➢ A very limited way is to link from Wikidata entities to OpenStreetMap relations by using Property P402 Please compare: http://wiki.openstreetmap.org/wiki/Wikidata
  • 18. Discussion - Complementary approach: OverPass-based Text-Mining of OpenStreetMap data (Geographic) Datasets are typically integrated (“joined”) exploring ➢ Descriptive Context Dimensions ➢ “who, what, where, when, why, and how” provided by the attributes of (geographic) features. In the case of OpenStreetMap this can be done by ➢ Exploring “KEY” and “VALUE” pairs provided by OSM-TAG as attributes to describe “OSM-node” representing for example Points of Interests (POIs) ➢ A particular powerful solution when applied with protocols based on internationally accepted metadata standards
  • 19. Conclusions: Summary This study shows how “standardized” WIKIDATA and OPENSTREETMAP “interlinking” can lead to LOD-integration, particularly when the capacity of WIKIDATA is explored as a POWERFUL INTEGRATOR THROUGHOUT the Semantic Web. The study provides a description of conceptual considerations as well as the WIKIDATA-BASED INTEGRATION of “PRIMARY COMMUNITY DATA (e.g. OpenstreetMap)” with AUTHORITATIVE LINKED-OPEN-DATA (e.g. WikiData)”
  • 20. Werner Leyh h t t p s : / / w i k i . o s g e o . o r g / w i k i / U s e r : W e r n e r L e y h Grupo de Pesquisa CNPq/USP I N F R A E S T R U T U R A D E D A D O S E S P A C I A I S ( G E P I D E ) h t t p : / / d g p . c n p q . b r / b u s c a o p e r a c i o n a l / d e t a l h e g r u p o . j s p ? g r u p o = 0 0 6 7 1 0 7 H R Y 8 K T 0 Questions ? Interested in linking Wikidata, Openstreetmap and scientific Datasets? Join us !
  • 21. Annex
  • 22. Annex: Tim Berners Lee 5 star-schema of Linked Open Data ☆ Data is available on the Web, in whatever format. ☆☆ Available as machine-readable structured data, (i.e., not a scanned image). ☆☆☆ Available in a non-proprietary format, (i.e, CSV, not Microsoft Excel). ☆☆☆☆ Published using open standards from the W3C (RDF and SPARQL). ☆☆☆☆☆ All of the above and links to other Linked Open Data.
  • 23. Annex: The Five Stars of Linked Data Vocabulary Use The Five Stars of Linked Data Vocabulary Use: Interestingly, the original ‘Tim Berners Lee 5 star-schema” does not make any assumptions about the use of vocabularies. In practice, however, querying Linked Data that do not refer to a vocabulary is difficult and ➢ understanding whether the results reflect the intended query is almost impossible. See Five Stars of Linked Data Vocabulary Use http://semantic-web-journal.net/content/ five-stars-linked-data-vocabulary-use
  • 24. Annex: Linked Data: RDF Data Model & SPARQL RDF breaks every piece of information down in triples: ➢ Subject – a resource, which is identified with a URI. ➢ Predicate – a URI-identified reused specification of the relationship. ➢ Object – a resource or literal to which the subject is related. <Bob> <is a> <person>. <Bob> <is interested in> <the Mona Lisa>. <the Mona Lisa> <was created by> <Leonardo da Vinci> Source: https://www.w3.org/TR/2014/NOTE-rdf11- primer-20140624/#section-triple
  • 26. Annex: Wikipedia’s scale ➢ 30 million articles ➢ 286 languages ➢ 2 billion edits ➢ 8000 views per second ➢ 500 million monthly visitors ➢ 5th most popular website ➢ 2000 x larger than Britannica
  • 27. Annex: Wikidata’s mission Wikidata is the community created knowledge base of Wikipedia: One reason for this strong community participation is the tight integration with Wikipedia: ➢ as of today, ➢ almost EVERY Wikipedia page ➢ in EVERY language ➢ incorporates content from Wikidata (compare Erxleben et al. in Ref.)
  • 28. Annex: WikiData characteristics ➢ An item for any notable subject ➢ “Claims” (or “statements”) about each item, using clearly-defined “properties” ➢ Properties may have “qualifiers” ➢ Every item is an “instance of” or “subclass of” another ➢ Claims show relationships between items ➢ Multi-lingual labels and descriptions ➢ Claims may include identifiers
  • 30. Annex: OpenStreetMap mission "OpenStreetMap is a project aimed squarely at creating and providing free geographic data such as street maps to anyone who wants them." -- www.openstreetmap.org
  • 31. Annex: OpenStreetMap characteristics OpenStreetMap Basic Definitions: ➢ It’s a very large database of XML data ➢ Each feature is of a certain basic type, and is defined by Tags (key value pairs) ➢ Basic types: Nodes (points), Ways (lines), Areas (polygons), Relations (groups) ➢ Tags: (domain.)key (concept) = value (instance)