Europeana as a Linked Data
(Quality) case
Antoine Isaac
with slides from Hugo Manguinhas, Valentine Charles, Juliane Stiller,
Mónica Marrero and other colleagues
3rd Workshop on Humanities in the Semantic Web (WHiSe)
Co-located with the 15th Extended Semantic Web Conference (ESWC 2020)
June 2, 2020
Outline
CC BY-SA
• Brief intro to Europeana
• Metadata quality challenges
• Using Linked Data technology to make data richer
• Encouraging data enhancements across the board
• How all this fits Research-related efforts
Who is Europeana?
CC BY-SA
● A non-profit foundation
● A community of 2400 experts in digital heritage: the
Europeana Network
● A mission: improve access to Europe's digital cultural
heritage
What is Europeana?
CC BY-SA
● The European Commission's digital platform for cultural
heritage
● Providing access to over 58M objects from over 3500
museums, libraries, archives
What is Europeana?
CC BY-SA
● An Open Data platform providing several services
● Europeana portal: https://europeana.eu
● Europeana APIs: https://pro.europeana.eu/resources/apis
How does it work?
France, Public Domain
1914, National Library of France
Agence de presse Meurisse
Concours de cycles nautiques sur le lac
d’Enghien : Berregent piloté par Austerling
Title here
CC BY-SACC BY-SA
What’s inside Europeana?
Europeana Essentials
CC BY-SACC BY-SA
● Descriptive and technical metadata: title, creator, subject,
rights…
● Editorial content like virtual exhibitions
● (recently started) user-generated metadata, incl.
transcriptions, semantic annotations
● Thumbnails
As a rule, digitized content is served on our partners’
websites
Except for some specific projects
● Newspapers
● WWI user-generated content
Data flow in Europeana’s network
Data providers: cultural institutions that provide metadata and links to
digitized content
Aggregators: organizations or projects
that gather data from a specific country
or domain (music, fashion,
archaeology…)
France, Public Domain
1932, National Library of France
Agence de presse Mondial Photo-Presse.
Tournoi royal de motos à Londres :
changement d'une roue de side-car en
marche
Data Quality Issues
in Cultural Heritage
Caveat: some
examples have been
already cleaned 
Title here
CC BY-SA
Sparseness of (meta)data
CC BY-SA
Title here
CC BY-SA
Heterogeneity
Europeana Essentials
CC BY-SACC BY-SA
58M objects, from 3,500 institutions
● Many different themes and types of objects
Books, newspapers, letters, diaries, archival papers, paintings, maps, drawings, photographs,
music, spoken word, radio broadcasts, film, newsreels, fashion, sculpture, 3D objects, and
more
● Libraries, archives, museums have different ways to describe objects.
Even within a sector, big differences can be observed
Title here
CC BY-SA
Multilinguism
Europeana Essentials
CC BY-SACC BY-SA
58M objects, from 44 countries
● Officially we get metadata in 38 languages
● But there are more languages used in individual metadata
fields
Title here
CC BY-SA
Multilinguism
Europeana Essentials
CC BY-SACC BY-SA
● Officially we get metadata in 38 languages
● But there are more languages used in individual metadata
fields
• Over 400 language codes
e.g., 6 values in x-aramaic-latn - not a valid code by the way
• The most common case is lack of language information!
How to get more
homogeneous, richer &
multilingual data?
France, Public Domain
1914, National Library of France
Agence de presse Meurisse
Concours de cycles nautiques sur le lac
d’Enghien : Berregent piloté par Austerling
Title here
CC BY-SA
Data modeling for interoperability
and richer metadata
CC BY-SA
● Like many aggregators, we ask our providers to give metadata using
one metadata model: the Europeana Data Model (EDM)
● But we cannot do whatever we like: we do not operate in isolation!
● Our approach must be
○ easy and rewarding for our partners
○ based on community-agreed best practices
A community sport
• Involving (technical) experts from libraries, archives, museums and
academics – the EuropeanaTech community
• Adopting a collaborative, softer form of standardization
http://pro.europeana.eu/europeana-tech
Europeana Assembly General Meeting, Rijksmuseum,
Amsterdam, 2015
Title here
CC BY-SA
Prior to EDM: flat metadata
records
CC BY-SA
● No links between objects and persons, places…
● Mixing data on real object and digital content
● Causing a lot of mapping quality problems
Title here
CC BY-SACC BY-SA
Following Best Practices, such as the
Linked Open Data principles
http://vimeo.com/36752317
Massive re-use of vocabularies in EDM
CC BY-SA
Plus
• Web Annotation
• RDA
• WGS84
• EBUcore
• ccRel
• ODRL
• DOAP
• SVCS
• DCAT
• ADMS
…
(sometimes only for one property!)
http://pro.europeana.eu/edm-documentation
EDM in Linked Open vocabularies (LOV)
OAI-ORE FOAF
Title here
CC BY-SA
Title here
CC BY-SA
Europeana Essentials
CC BY-SA
Data modeling for interoperability
and richer metadata
CC BY-SA
Clavecin, Bartolomeo Cristofori
Cite de la Musique,
MIMO - Musical Instruments Museums Online|CC BY-NC-SA
http://pro.europeana.eu/edm-documentation
Enriching metadata
CC BY-SA
• EDM gives a base for (linking to) multilingual, semantic metadata
• data as resources with web URIs, not only strings
• We encourage data providers to contribute their own links/data to
local or external vocabularies
https://pro.europeana.eu/page/europeana-semantic-enrichment
CC BY-SA
LOD Vocabularies currently recognized by Europeana in providers'
metadata
Vocabulary URL
MIMO Concepts http://www.mimo-db.eu/
MIMO Instrument makers http://www.mimo-db.eu/
The Getty - Art & Architecture Thesaurus (AAT) http://vocab.getty.edu/
The Getty - Union List of Artist Names (ULAN) http://vocab.getty.edu/
Virtual International Authority File (VIAF) http://viaf.org/viaf/
Geonames http://sws.geonames.org/
IconClass http://iconclass.org/
Gemeinsame Normdatei (GND) http://d-nb.info/gnd
Israel Museum Jerusalem Concepts http://www.imj.org.il/imagine/thesaurus/objects/
Partage Plus concepts http://partage.vocnet.org/
data.europeana.eu WWI Concepts from Library of Congress
Subject Headings (LCSH) http://data.europeana.eu/concept/loc
Europeana Sounds Genres http://data.europeana.eu/concept/soundgenres/
EAGLE Material & Object Type http://www.eagle-network.eu/voc/
DISMARC Formats & Genres http://purl.org/dismarc/ns/
UDC http://udcdata.info/rdf/
UNESCO Thesaurus http://vocabularies.unesco.org/thesaurus/
YSO General Finnish Ontology https://finto.fi/yso/en/
https://pro.europeana.eu/page/europeana-semantic-enrichment
Title here
CC BY-SACC BY-SA
Enriching metadata
CC BY-SA
• EDM gives a base for (linking to) multilingual, semantic metadata
• data as resources with web URIs, not only strings
• We encourage data providers to contribute their own links/data to
local or external vocabularies
• We are going to further develop crowdsourcing/"nichesourcing" of
metadata
• In parallel, we apply automatic enrichment to link object metadata to
reference datasets for places, persons, concepts
https://pro.europeana.eu/page/europeana-semantic-enrichment
Title here
CC BY-SACC BY-SA
Enriching metadata
CC BY-SA
Title here
CC BY-SACC BY-SA
Enriching metadata –
Contextual Entities
CC BY-SA
We are building an "Entity Collection"
• Centralized point of reference and access to data about contextual
entities: places, agents (persons and organizations), concepts...
• Caching and curating data from the wider Linked Open Data cloud
• A sort of Europeana knowledge graph
• With a dedicated API
https://pro.europeana.eu/page/entity#entity-collection
Data currently in the Entity Collection
CC BY-SA
• Places
a subset of Geonames, corresponding to places which are part of
European countries and of some specific feature classes.
• Agents
a subset of DBpedia corresponding to most of the instances of
dbp:Artist with some exceptions, and integrated from 49 DBpedia
language editions.
• Concepts
a subset of DBpedia and Wikidata corresponding to a selection of
concepts matching our needs, e.g., WWI battles, music genres
(Europeana Sounds aggregator) and a photography vocabulary
(Europeana Photography aggregator)
• Organizations
Extracted from Europeana's CRM and aligned to Wikidata when
possible
216,302
resources
1,572
resources
165,005
resources
1,077
resources
https://pro.europeana.eu/page/entity#entity-collection
Selecting data sources
CC BY-SA
• Availability and access: open license, published as linked data
• Granularity, size and coverage: multilingual data, with a rather generic
scope. But too generic or too large datasets can create too much ambiguity
for the simple processes we have (e.g., enrichment)
• Quality: intrinsic aspects like correctness of representation
• Connectivity: good data sources are well-connected internally and
externally to other datasets
An example
DBpedia resource for “Mozart” in the Entity Collection
CC BY-SA
Coreference links to 6 other
datasets
(e.g. Freebase, Wikidata)
Inter-linking information
Preferred labels for 48
languages
An enrichment example
Links to contextual entities
And what it allows
And what it allows
Title here
CC BY-SACC BY-SA
Multilingual enrichment is not easy!
Poisonous India or the Importance of a Semantic and Multilingual
Enrichment Strategy
Marlies Olensky, Juliane Stiller, Evelyn Dröge, MTSR 2012
http://link.springer.com/chapter/10.1007%2F978-3-642-35233-
1_25
Encouraging everyone
on the way to improve
their data
University Of Edinburgh, CC BY
Roslin Glass Slides, creator unknown
Photograph of two men step cutting on the ice face of
the Tasman Glacier, New Zealand in the late 19th or
early 20th century.
Title here
CC BY-SA
Challenges for working on
quality improvement
● Methodological frameworks are not easy to apply
● Getting stakeholders interested is hard for us
● Communication lines are rather long
● It’s a sensitive area
● It’s hard to get users involved
CC BY-SA
Title here
CC BY-SA
A general effort on quality
CC BY-SA
We have set up a Data Quality Committee to analyze
quality issues and make recommendations to the
Europeana community about:
○ Mandatory metadata elements
○ Metadata checking and normalization
○ Multilingualism
…
http://pro.europeana.eu/get-involved/europeana-tech/data-quality-committee
https://pro.europeana.eu/post/publishing-framework
Title here
CC BY-SA
CC BY-SA
Convincing by impact
CC BY-SA
Europeana Publishing Framework: Metadata
languages attributes happy users
(using Europeana portal in their
native language)
links to vocabularies context
(for users browsing Europeana portal by
persons, places, or concepts)
enabling elements visibility
(collections being findable along various
dimensions: by subject, type, creator, date)
A community sport, again!
• Involving (technical) experts from libraries, archives, museums and
academics – the EuropeanaTech community
• Adopting a collaborative, softer form of standardization
http://pro.europeana.eu/europeana-tech
Europeana Assembly General Meeting, Rijksmuseum,
Amsterdam, 2015
France, Public Domain
1932, National Library of France
Agence de presse Mondial Photo-Presse.
Tournoi royal de motos à Londres :
changement d'une roue de side-car en
marche
Europeana and the
Research
community
Europeana Research
Partnerships
Expertise
Research Grants
Programme
CommunityEuropeana portal
Connections
Europeana APIs
Europeana R&D
Projects
Europeana Research
CC BY-SA
https://pro.europeana.eu/page/europeana-research
Europeana & CLARIN
• 180K Europeana sources loaded into
CLARIN’s Virtual Language Observatory,
Europeana now largest provider of
individual metadata records in the VLO
• Selection based on quality, accessability,
processability and reusability
• Full case study at https://bit.ly/2J5w8jc
• Challenge for SW (not new!): generic &
rich models/formats vs. community-
specific & easier to consume
Building partnerships with research infrastructures
Europeana Research
CC BY-SA
Title here
CC BY-SA
Semantic Web technology can
help too, here
Europeana is involved in initiatives that can help bridge gaps
● International Image Interoperability Framework (IIIF)
● Not only images : representation of document structures, (linking to)
metadata, etc.
● With a strong focus on research cases (manuscripts, newspapers)
Cf. https://www.slideshare.net/antoineisaac/iiif-and-the-europeana-mission
● Linked Art
● Shared Model based on LOD to describe Art
● Re-using a (LOUD) subset of CIDOC CRM
CC BY-SA
https://iiif.io
https://linked.art
Title here
CC BY-SA
Semantic Web technology can
help too, here
● The SW approaches enables to create links between
underlying models and vocabularies
● W3C Web Annotation, CIDOC CRM, EDM
● Vocabularies expressed using SKOS
● Heavy reliance on JSON-LD
● Importance of data patterns
● Linked Open Usable Data - Rob Sanderson (Getty)
● See for example “The Importance of being LOUD”
CC BY-SA
https://www.slideshare.net/azaroth42/the-importance-of-being-loud
Title here
CC BY-SA
https://twitter.com/jbaiter_/status/1267553133942751232
It can work!
A community sport, again…
Helping FAIRification
of Cultural Data
University Of Edinburgh, CC BY
Roslin Glass Slides, creator unknown
Photograph of two men step cutting on the ice face of
the Tasman Glacier, New Zealand in the late 19th or
early 20th century.
Title here
CC BY-SA
How do Europeana's data and services
meet the FAIR requirements?
Europeana Essentials
CC BY-SACC BY-SA
Findable
● The Europeana aggregation network partially homogenizes its data via
a shared data model
● Providers and Europeana seek to enrich the data with multilingual,
semantic resources
● We promote persistent identifiers and links across them
● Europeana provides a search engine
● Data is made findable through other platforms (e.g., CLARIN)
https://pro.europeana.eu/post/europeana-and-the-fair-principles-for-research-data
Title here
CC BY-SA
How do Europeana's data and services
meet the FAIR requirements?
Europeana Essentials
CC BY-SACC BY-SA
Accessible
● Data is published as (Linked Data) web resources
● Freely available, standard web APIs
Interoperable
● Europeana uses a community-based model
● Following best practices, such as mixing and re-using existing data
models and vocabularies
● We promote more open and richer content access protocols (IIIF)
https://pro.europeana.eu/post/europeana-and-the-fair-principles-for-research-data
Title here
CC BY-SA
How do Europeana's data and services
meet the FAIR requirements?
Europeana Essentials
CC BY-SACC BY-SA
Re-usable
● The conditions for re-using digitized content are made clear, using
shared vocabularies (Creative Commons, RightsStatements.org)
● Metadata is fully open – CC0
● Data model seeks to bridge with other communities’ models, such as
W3C Web Annotation, Schema.org
https://pro.europeana.eu/post/europeana-and-the-fair-principles-for-research-data
CC BY-SA
• Active in 2014-2016
• To develop the open data ecosystem, facilitating better
communication between developers and publishers;
• To provide guidance to publishers, promoting the re-use
of data;
• To foster trust in the data among developers
• Linked Data, but not only!
Data on the Web
Best Practices
Working Group
https://www.w3.org/2013/dwbp/
CC BY-SA
• Use terms from shared vocabularies, preferably standardized ones
• Check that classes, properties, terms, elements or attributes used to
represent a dataset do not replicate those defined by vocabularies used for
other datasets.
• e.g. using the Linked Open Vocabularies repository
• Or if you have to replicate, indicate mappings clearly
Best Practice 15: Reuse vocabularies,
preferably standardized ones
Data on the Web Best Practices W3C Recommendation
CC BY-SA
• Accept that (OWL) semantics establish precise specs and can enable
automated reasoning but that complex vocabularies require more effort to
produce and hamper reuse of data
• Minimize ontological commitment of your vocabulary – or seek to minimize the
commitment of others’ vocabularies
• Check that inference does not produce too many statements that are
unnecessary for target applications
• Check examples of “softer” specs, e.g. Schema.org or SKOS
Best Practice 16: Choose the right
formalization level
Data on the Web Best Practices W3C Recommendation
Title here
CC BY-SA
Is it perfect?
Europeana Essentials
CC BY-SACC BY-SA
No. In particular we would always like to get more input
from users and researchers (the perspective is very CH-
focused).
But we’re working on it and we hope the situation is better
than if we wouldn't have done anything!
Has Semantic Web technology helped?
YES
Want to engage?
Do you want to hear more about these issues? Check coming “Enriching
research – enriching metadata” webinars
Europeana Research has a grants programme to fund events that bring
together cultural heritage and researchers. Check future calls!
Join the Europeana Network
and (one of) its communities!
CC BY-SA
https://www.raa.se/in-english/events-seminars-and-cultural-experiences/workshop-
on-digitised-collections-enriching-research-enriching-metadata/
https://pro.europeana.eu/page/grants-programme
https://pro.europeana.eu
Title here
CC BY-SACC BY-SA
Title here
CC BY-SA
Name of image | Creator
Providing organization|
Country, licence
Name of image | Creator
Providing organization| Country, licence
antoine.isaac@europeana.eu
@antoine_isaac

Europeana as a Linked Data (Quality) case

  • 1.
    Europeana as aLinked Data (Quality) case Antoine Isaac with slides from Hugo Manguinhas, Valentine Charles, Juliane Stiller, Mónica Marrero and other colleagues 3rd Workshop on Humanities in the Semantic Web (WHiSe) Co-located with the 15th Extended Semantic Web Conference (ESWC 2020) June 2, 2020
  • 2.
    Outline CC BY-SA • Briefintro to Europeana • Metadata quality challenges • Using Linked Data technology to make data richer • Encouraging data enhancements across the board • How all this fits Research-related efforts
  • 3.
    Who is Europeana? CCBY-SA ● A non-profit foundation ● A community of 2400 experts in digital heritage: the Europeana Network ● A mission: improve access to Europe's digital cultural heritage
  • 4.
    What is Europeana? CCBY-SA ● The European Commission's digital platform for cultural heritage ● Providing access to over 58M objects from over 3500 museums, libraries, archives
  • 6.
    What is Europeana? CCBY-SA ● An Open Data platform providing several services ● Europeana portal: https://europeana.eu ● Europeana APIs: https://pro.europeana.eu/resources/apis
  • 7.
    How does itwork? France, Public Domain 1914, National Library of France Agence de presse Meurisse Concours de cycles nautiques sur le lac d’Enghien : Berregent piloté par Austerling
  • 8.
    Title here CC BY-SACCBY-SA What’s inside Europeana? Europeana Essentials CC BY-SACC BY-SA ● Descriptive and technical metadata: title, creator, subject, rights… ● Editorial content like virtual exhibitions ● (recently started) user-generated metadata, incl. transcriptions, semantic annotations ● Thumbnails As a rule, digitized content is served on our partners’ websites Except for some specific projects ● Newspapers ● WWI user-generated content
  • 9.
    Data flow inEuropeana’s network Data providers: cultural institutions that provide metadata and links to digitized content Aggregators: organizations or projects that gather data from a specific country or domain (music, fashion, archaeology…)
  • 10.
    France, Public Domain 1932,National Library of France Agence de presse Mondial Photo-Presse. Tournoi royal de motos à Londres : changement d'une roue de side-car en marche Data Quality Issues in Cultural Heritage Caveat: some examples have been already cleaned 
  • 11.
    Title here CC BY-SA Sparsenessof (meta)data CC BY-SA
  • 12.
    Title here CC BY-SA Heterogeneity EuropeanaEssentials CC BY-SACC BY-SA 58M objects, from 3,500 institutions ● Many different themes and types of objects Books, newspapers, letters, diaries, archival papers, paintings, maps, drawings, photographs, music, spoken word, radio broadcasts, film, newsreels, fashion, sculpture, 3D objects, and more ● Libraries, archives, museums have different ways to describe objects. Even within a sector, big differences can be observed
  • 13.
    Title here CC BY-SA Multilinguism EuropeanaEssentials CC BY-SACC BY-SA 58M objects, from 44 countries ● Officially we get metadata in 38 languages ● But there are more languages used in individual metadata fields
  • 14.
    Title here CC BY-SA Multilinguism EuropeanaEssentials CC BY-SACC BY-SA ● Officially we get metadata in 38 languages ● But there are more languages used in individual metadata fields • Over 400 language codes e.g., 6 values in x-aramaic-latn - not a valid code by the way • The most common case is lack of language information!
  • 15.
    How to getmore homogeneous, richer & multilingual data? France, Public Domain 1914, National Library of France Agence de presse Meurisse Concours de cycles nautiques sur le lac d’Enghien : Berregent piloté par Austerling
  • 16.
    Title here CC BY-SA Datamodeling for interoperability and richer metadata CC BY-SA ● Like many aggregators, we ask our providers to give metadata using one metadata model: the Europeana Data Model (EDM) ● But we cannot do whatever we like: we do not operate in isolation! ● Our approach must be ○ easy and rewarding for our partners ○ based on community-agreed best practices
  • 17.
    A community sport •Involving (technical) experts from libraries, archives, museums and academics – the EuropeanaTech community • Adopting a collaborative, softer form of standardization http://pro.europeana.eu/europeana-tech Europeana Assembly General Meeting, Rijksmuseum, Amsterdam, 2015
  • 18.
    Title here CC BY-SA Priorto EDM: flat metadata records CC BY-SA ● No links between objects and persons, places… ● Mixing data on real object and digital content ● Causing a lot of mapping quality problems
  • 19.
    Title here CC BY-SACCBY-SA Following Best Practices, such as the Linked Open Data principles http://vimeo.com/36752317
  • 20.
    Massive re-use ofvocabularies in EDM CC BY-SA Plus • Web Annotation • RDA • WGS84 • EBUcore • ccRel • ODRL • DOAP • SVCS • DCAT • ADMS … (sometimes only for one property!) http://pro.europeana.eu/edm-documentation EDM in Linked Open vocabularies (LOV) OAI-ORE FOAF
  • 21.
    Title here CC BY-SA Titlehere CC BY-SA Europeana Essentials CC BY-SA Data modeling for interoperability and richer metadata CC BY-SA Clavecin, Bartolomeo Cristofori Cite de la Musique, MIMO - Musical Instruments Museums Online|CC BY-NC-SA http://pro.europeana.eu/edm-documentation
  • 22.
    Enriching metadata CC BY-SA •EDM gives a base for (linking to) multilingual, semantic metadata • data as resources with web URIs, not only strings • We encourage data providers to contribute their own links/data to local or external vocabularies https://pro.europeana.eu/page/europeana-semantic-enrichment
  • 23.
    CC BY-SA LOD Vocabulariescurrently recognized by Europeana in providers' metadata Vocabulary URL MIMO Concepts http://www.mimo-db.eu/ MIMO Instrument makers http://www.mimo-db.eu/ The Getty - Art & Architecture Thesaurus (AAT) http://vocab.getty.edu/ The Getty - Union List of Artist Names (ULAN) http://vocab.getty.edu/ Virtual International Authority File (VIAF) http://viaf.org/viaf/ Geonames http://sws.geonames.org/ IconClass http://iconclass.org/ Gemeinsame Normdatei (GND) http://d-nb.info/gnd Israel Museum Jerusalem Concepts http://www.imj.org.il/imagine/thesaurus/objects/ Partage Plus concepts http://partage.vocnet.org/ data.europeana.eu WWI Concepts from Library of Congress Subject Headings (LCSH) http://data.europeana.eu/concept/loc Europeana Sounds Genres http://data.europeana.eu/concept/soundgenres/ EAGLE Material & Object Type http://www.eagle-network.eu/voc/ DISMARC Formats & Genres http://purl.org/dismarc/ns/ UDC http://udcdata.info/rdf/ UNESCO Thesaurus http://vocabularies.unesco.org/thesaurus/ YSO General Finnish Ontology https://finto.fi/yso/en/ https://pro.europeana.eu/page/europeana-semantic-enrichment
  • 24.
    Title here CC BY-SACCBY-SA Enriching metadata CC BY-SA • EDM gives a base for (linking to) multilingual, semantic metadata • data as resources with web URIs, not only strings • We encourage data providers to contribute their own links/data to local or external vocabularies • We are going to further develop crowdsourcing/"nichesourcing" of metadata • In parallel, we apply automatic enrichment to link object metadata to reference datasets for places, persons, concepts https://pro.europeana.eu/page/europeana-semantic-enrichment
  • 25.
    Title here CC BY-SACCBY-SA Enriching metadata CC BY-SA
  • 26.
    Title here CC BY-SACCBY-SA Enriching metadata – Contextual Entities CC BY-SA We are building an "Entity Collection" • Centralized point of reference and access to data about contextual entities: places, agents (persons and organizations), concepts... • Caching and curating data from the wider Linked Open Data cloud • A sort of Europeana knowledge graph • With a dedicated API https://pro.europeana.eu/page/entity#entity-collection
  • 27.
    Data currently inthe Entity Collection CC BY-SA • Places a subset of Geonames, corresponding to places which are part of European countries and of some specific feature classes. • Agents a subset of DBpedia corresponding to most of the instances of dbp:Artist with some exceptions, and integrated from 49 DBpedia language editions. • Concepts a subset of DBpedia and Wikidata corresponding to a selection of concepts matching our needs, e.g., WWI battles, music genres (Europeana Sounds aggregator) and a photography vocabulary (Europeana Photography aggregator) • Organizations Extracted from Europeana's CRM and aligned to Wikidata when possible 216,302 resources 1,572 resources 165,005 resources 1,077 resources https://pro.europeana.eu/page/entity#entity-collection
  • 28.
    Selecting data sources CCBY-SA • Availability and access: open license, published as linked data • Granularity, size and coverage: multilingual data, with a rather generic scope. But too generic or too large datasets can create too much ambiguity for the simple processes we have (e.g., enrichment) • Quality: intrinsic aspects like correctness of representation • Connectivity: good data sources are well-connected internally and externally to other datasets
  • 29.
    An example DBpedia resourcefor “Mozart” in the Entity Collection CC BY-SA Coreference links to 6 other datasets (e.g. Freebase, Wikidata) Inter-linking information Preferred labels for 48 languages
  • 30.
    An enrichment example Linksto contextual entities
  • 31.
  • 32.
  • 33.
    Title here CC BY-SACCBY-SA Multilingual enrichment is not easy! Poisonous India or the Importance of a Semantic and Multilingual Enrichment Strategy Marlies Olensky, Juliane Stiller, Evelyn Dröge, MTSR 2012 http://link.springer.com/chapter/10.1007%2F978-3-642-35233- 1_25
  • 34.
    Encouraging everyone on theway to improve their data University Of Edinburgh, CC BY Roslin Glass Slides, creator unknown Photograph of two men step cutting on the ice face of the Tasman Glacier, New Zealand in the late 19th or early 20th century.
  • 35.
    Title here CC BY-SA Challengesfor working on quality improvement ● Methodological frameworks are not easy to apply ● Getting stakeholders interested is hard for us ● Communication lines are rather long ● It’s a sensitive area ● It’s hard to get users involved CC BY-SA
  • 36.
    Title here CC BY-SA Ageneral effort on quality CC BY-SA We have set up a Data Quality Committee to analyze quality issues and make recommendations to the Europeana community about: ○ Mandatory metadata elements ○ Metadata checking and normalization ○ Multilingualism … http://pro.europeana.eu/get-involved/europeana-tech/data-quality-committee
  • 37.
  • 38.
    Title here CC BY-SA CCBY-SA Convincing by impact
  • 39.
    CC BY-SA Europeana PublishingFramework: Metadata languages attributes happy users (using Europeana portal in their native language) links to vocabularies context (for users browsing Europeana portal by persons, places, or concepts) enabling elements visibility (collections being findable along various dimensions: by subject, type, creator, date)
  • 40.
    A community sport,again! • Involving (technical) experts from libraries, archives, museums and academics – the EuropeanaTech community • Adopting a collaborative, softer form of standardization http://pro.europeana.eu/europeana-tech Europeana Assembly General Meeting, Rijksmuseum, Amsterdam, 2015
  • 41.
    France, Public Domain 1932,National Library of France Agence de presse Mondial Photo-Presse. Tournoi royal de motos à Londres : changement d'une roue de side-car en marche Europeana and the Research community
  • 42.
    Europeana Research Partnerships Expertise Research Grants Programme CommunityEuropeanaportal Connections Europeana APIs Europeana R&D Projects Europeana Research CC BY-SA https://pro.europeana.eu/page/europeana-research
  • 43.
    Europeana & CLARIN •180K Europeana sources loaded into CLARIN’s Virtual Language Observatory, Europeana now largest provider of individual metadata records in the VLO • Selection based on quality, accessability, processability and reusability • Full case study at https://bit.ly/2J5w8jc • Challenge for SW (not new!): generic & rich models/formats vs. community- specific & easier to consume Building partnerships with research infrastructures Europeana Research CC BY-SA
  • 44.
    Title here CC BY-SA SemanticWeb technology can help too, here Europeana is involved in initiatives that can help bridge gaps ● International Image Interoperability Framework (IIIF) ● Not only images : representation of document structures, (linking to) metadata, etc. ● With a strong focus on research cases (manuscripts, newspapers) Cf. https://www.slideshare.net/antoineisaac/iiif-and-the-europeana-mission ● Linked Art ● Shared Model based on LOD to describe Art ● Re-using a (LOUD) subset of CIDOC CRM CC BY-SA https://iiif.io https://linked.art
  • 45.
    Title here CC BY-SA SemanticWeb technology can help too, here ● The SW approaches enables to create links between underlying models and vocabularies ● W3C Web Annotation, CIDOC CRM, EDM ● Vocabularies expressed using SKOS ● Heavy reliance on JSON-LD ● Importance of data patterns ● Linked Open Usable Data - Rob Sanderson (Getty) ● See for example “The Importance of being LOUD” CC BY-SA https://www.slideshare.net/azaroth42/the-importance-of-being-loud
  • 46.
  • 47.
  • 48.
    Helping FAIRification of CulturalData University Of Edinburgh, CC BY Roslin Glass Slides, creator unknown Photograph of two men step cutting on the ice face of the Tasman Glacier, New Zealand in the late 19th or early 20th century.
  • 49.
    Title here CC BY-SA Howdo Europeana's data and services meet the FAIR requirements? Europeana Essentials CC BY-SACC BY-SA Findable ● The Europeana aggregation network partially homogenizes its data via a shared data model ● Providers and Europeana seek to enrich the data with multilingual, semantic resources ● We promote persistent identifiers and links across them ● Europeana provides a search engine ● Data is made findable through other platforms (e.g., CLARIN) https://pro.europeana.eu/post/europeana-and-the-fair-principles-for-research-data
  • 50.
    Title here CC BY-SA Howdo Europeana's data and services meet the FAIR requirements? Europeana Essentials CC BY-SACC BY-SA Accessible ● Data is published as (Linked Data) web resources ● Freely available, standard web APIs Interoperable ● Europeana uses a community-based model ● Following best practices, such as mixing and re-using existing data models and vocabularies ● We promote more open and richer content access protocols (IIIF) https://pro.europeana.eu/post/europeana-and-the-fair-principles-for-research-data
  • 51.
    Title here CC BY-SA Howdo Europeana's data and services meet the FAIR requirements? Europeana Essentials CC BY-SACC BY-SA Re-usable ● The conditions for re-using digitized content are made clear, using shared vocabularies (Creative Commons, RightsStatements.org) ● Metadata is fully open – CC0 ● Data model seeks to bridge with other communities’ models, such as W3C Web Annotation, Schema.org https://pro.europeana.eu/post/europeana-and-the-fair-principles-for-research-data
  • 52.
    CC BY-SA • Activein 2014-2016 • To develop the open data ecosystem, facilitating better communication between developers and publishers; • To provide guidance to publishers, promoting the re-use of data; • To foster trust in the data among developers • Linked Data, but not only! Data on the Web Best Practices Working Group https://www.w3.org/2013/dwbp/
  • 53.
    CC BY-SA • Useterms from shared vocabularies, preferably standardized ones • Check that classes, properties, terms, elements or attributes used to represent a dataset do not replicate those defined by vocabularies used for other datasets. • e.g. using the Linked Open Vocabularies repository • Or if you have to replicate, indicate mappings clearly Best Practice 15: Reuse vocabularies, preferably standardized ones Data on the Web Best Practices W3C Recommendation
  • 54.
    CC BY-SA • Acceptthat (OWL) semantics establish precise specs and can enable automated reasoning but that complex vocabularies require more effort to produce and hamper reuse of data • Minimize ontological commitment of your vocabulary – or seek to minimize the commitment of others’ vocabularies • Check that inference does not produce too many statements that are unnecessary for target applications • Check examples of “softer” specs, e.g. Schema.org or SKOS Best Practice 16: Choose the right formalization level Data on the Web Best Practices W3C Recommendation
  • 55.
    Title here CC BY-SA Isit perfect? Europeana Essentials CC BY-SACC BY-SA No. In particular we would always like to get more input from users and researchers (the perspective is very CH- focused). But we’re working on it and we hope the situation is better than if we wouldn't have done anything! Has Semantic Web technology helped? YES
  • 56.
    Want to engage? Doyou want to hear more about these issues? Check coming “Enriching research – enriching metadata” webinars Europeana Research has a grants programme to fund events that bring together cultural heritage and researchers. Check future calls! Join the Europeana Network and (one of) its communities! CC BY-SA https://www.raa.se/in-english/events-seminars-and-cultural-experiences/workshop- on-digitised-collections-enriching-research-enriching-metadata/ https://pro.europeana.eu/page/grants-programme https://pro.europeana.eu
  • 57.
    Title here CC BY-SACCBY-SA Title here CC BY-SA Name of image | Creator Providing organization| Country, licence Name of image | Creator Providing organization| Country, licence [email protected] @antoine_isaac