@openaire_euOpenAIRE-Connect Review
23rd of April, 2018 - Brussels
The OpenAIRE Research Graph
Bringing scholarly communication back into the
hands of scientists
PaoloManghi
InstituteofInformationScienceandTechnologies
ConsiglioNazionaledelleRicerche
Materializing the Open Science Graph
Project
communit
y
FunderFunding
Product
Publicatio
n
Researc
h Data
Software
Organizatio
n
Source
Other
res.
products
Mining
Deduplication
End-user feedback
Scientific product
catalogue
Harvesting
GUIDE
LINES
Research Infrastructures Publishing
IT
OpenAIREAdvance1stReview|Luxembourg|10Oct2019
Providing an open metadata
research graph of interlinked
scientific products, with Open
Access information, linked to
funding information and research
communities
The OpenAIRE research graph
Open
Complete
De-duplicated
Transparent
Participatory
Decentralized
Trusted
De-duplicated
More information about the de-duplication framework used by OpenAIRE can be found
searching on Zenodo for :
• “De-duplicating the OpenAIRE Scholarly Communication Big Graph” (poster)
• “GDup: De-Duplication of Scholarly Communication Big Graphs”
Metadata records
corresponding to equivalent
objects are merged
Scientific products
Organizations
Complete: community-trusted sources
Academic Graph
… and more
… and more
… and more
… and more
… and more
… and more
OpenAIREAdvance1stReview|Luxembourg|10Oct2019
• Rely on quality scholarly
communication sources of
different kinds
Participatory
• Include solutions and content
from any interested and known
content provider in scholarly
communication
Institutional repositories
Aggregators
Data archives
Software repositories
Research infrastructure sources
Funder grant databases
Authors & Orgs entity registries
Publishers & journals
• Metadata in the graph includes provenance when harvested
and reliability indicators when obtained from mining
Transparent
• Preservation and ownership beyond OpenAIRE
Exchanged with other graph initiatives
Broker Service: Redistributed via subscription and
notification to contributing data sources
(provide.openaire.eu)
• Openly accessible via APIs
(develop.openaire.eu)
Decentralized
• Authors in the loop to enrich their ORCID record
• Validation of end-user ”claims”
Trusted (November 2019)
Populating the Graph
Harvesting: Revised Classification of Research
Products
Publications
• Article
• Preprint
• Report
• …
Datasets
• Dataset
• Collection
• Clinical Trials
• …
Software
• Research
Software
• …
Other Research
Products
• Service
• Workflow
• Interactive
Resource
• …
Institutional/
publication
repositories
Journals/
publishers
Data
repositories
Other
Products
repositories
Software
repositories
Workshop Técnico OpenAIRE / LA Referencia | 29-30 October, 2019 | Costa Rica
Open Science publishing
Bridging RIs and Scholarly Communication
Transparency and reproducibility
e-Infrastructures and
Research Infrastructures
Scholarly Communication
infrastructure
Dataset
Method Thematic
Service
Dataset
Experiment Publishing
the experiment
Input
Dataset
Input
Method
Output
Dataset
Experiment
product
Thematic Service
Parameters
Experiment
repo
Research data,
Software,
Workflows,
Publications
Data repo
Method repo
Publications
IT
Harvesting
OpenAIREAdvance1stReview|Luxembourg|10Oct2019
• EPOS Research Infrastructure
Reproducibility
Transparency
Seamless publishing
Open Science publishing workflows
OpenAIREAdvance1stReview|Luxembourg|10Oct2019
Pre-processed sources
Article-dataset links
480Mi links
CrossRef enriched
85Mi publication records
DOIBoost
Academic Graph
Published every 6 months
(new versions to be published next week)
OpenAIREAdvance1stReview|Luxembourg|10Oct2019
Context Propagation
Product
Source
Country
Project
Organization
communit
y
Product
Project Source
Product
Project
Product
supplementedBy
fundedBy
hostedBy
(institutional repository)
located
Funder
funds
(National Funder)
fundedBy
jurisdiction
located
ofInterestofInterest
fundedBy
hostedBy
Product
supplementedBy
157K
8Mi 10K
OpenAIREAdvance1stReview|Luxembourg|10Oct2019
Production: Open Access CAPs
BETA: Open Science CAPs
0
10000000
20000000
30000000
40000000
50000000
60000000
70000000
80000000
90000000
100000000
Old CAP New CAP
literature
0
2000000
4000000
6000000
8000000
10000000
12000000
Old CAP New CAP
research data
0
20000
40000
60000
80000
100000
120000
140000
Old CAP New CAP
software
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
4500000
Old CAP New CAP
other
110Mi
30Mi
1Mi
10Mi
100K
180K
3Mi
7.5Mi
Harvested content
• Data sources
10K +
• Records
~480Mi
• Publication full-texts
~12Mi (Springer N. coming)
• Links (also text-mined)
~960Mi
PROD BETA PROD BETA
PROD BETAPROD BETA
OpenAIREAdvance1stReview|Luxembourg|10Oct2019
Microsoft Research (being drafted)
Unpaywall (ongoing)
ORCID membership (November 2019)
RDA IG Open Science Graphs for FAIR Data
FREYA, ResearchGraph, OpenCitations,
Open Knowledge Research Graph
IG Session at RDA Helsinki 2019 (15th of October 2019)
Liaisons
Academic Graph
OpenAIREAdvance1stReview|Luxembourg|10Oct2019
• October-November 2019:
OpenAIRE Research Graph open for consultation
Collecting feedback via Trello (operational end of September)
• December 2019:
OpenAIRE Research Graph
in production
BETA Graph Open Consultation
http://beta.explore.openaire.eu
OpenAIREAdvance1stReview|Luxembourg|10Oct2019
Trello for for feedback
Thank you!
Paolo Manghi
paolo.manghi@isti.cnr.it
Architecture,
technologies, and
infrastructure
Metadata
records
files
cleaned
records
Full-text
cache
Transform
Clean
Identify
equivelent
products
and
organisation
s
Aggregation subsystem
De-duplication
subsystem
Information Inference subsystem
Data Sources
Populate
Merge equivalent objects
Data provision
subsystem
Collect
Native graph
“slices”
Publishing
subsystem
Data Monitoring
Action Sets
(similarity
rels)
Front-end
Native
graph
Deduped
graph
Extract full-text
Copy of deduped
graph
Enrich graphs with links
Action Set
(inferred
links)
Enriched
graph
Propagation
Text-mining of
the full-texts and
the graph to
derive new
semantic links
Architecture and technologies: today
Task 9.1. System administration -
infrastructure: before Jan 2018
Public
System
20srv
122CPU
320GB
8TB
Mining
System
21srv
406CPU
2TB
385TB
Data provision
System
23srv
154CPU
430GB
23TB
Testing
System
5srv
30CPU
100GB
3TB
Public
System
44srv
274CPU
905GB
20TB
Mining
System
22srv
414CPU
2.2TB
388TB
Data provision
System
23srv
154CPU
430GB
24TB
Testing
System
14srv
86CPU
302GB
9TB
OpenAIREAdvance1stReview|Luxembourg|10Oct2019

More Related Content

PPTX
OpenAIRE Open Innovation call: Next Generation Repositories
PPTX
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
PPTX
European open science cloud
PPTX
OpenAIRE implementing open science
PDF
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
PDF
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)
PPTX
OpenAIRE-connect: Services for open science
PPTX
Open access to publications in Horizon 2020
OpenAIRE Open Innovation call: Next Generation Repositories
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
European open science cloud
OpenAIRE implementing open science
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)
OpenAIRE-connect: Services for open science
Open access to publications in Horizon 2020

What's hot (20)

PDF
A Research Data Catalogue supporting Blue Growth: the BlueBRIDGE case
PDF
Intact danish workshop_20171001
PDF
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
PPTX
20200130_Mannocci_OpenAIRE_ResearchGraph
PDF
ICOS: Integrated Carbon Observation System Open data to open our eyes to clim...
PPTX
The European Open Science Cloud
PDF
Towards a Linked Data Publishing Methodology
PPTX
Grant Funding Programme
PPTX
EOSC-hub and OpenAIRE-Advance collaboration (Presentation at RDA 11th plenary)
PDF
OpenAIRE: Science. Set Free, Iryna Kuchma, EIFL
PPTX
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
PDF
Knowledge Exchange Consensus: Monitoring of Open Access Publications and Cost...
PPTX
Using Open Research Data for Public Policy Making: Opportunities of Virtual R...
PDF
7th Content Providers Community Call
PDF
Wide access to spatial Citizen Science data - ECSA Berlin 2016
PPTX
Demonstration of the 4C cost comparison tool
PDF
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
PDF
The Services of the OpenAIREplus Infrastructure for Scholarly Communication –...
PPTX
Scaling Usage Statistics across Repositories as an OpenAIRE Analytics Service...
PPTX
From Box to Hydra via Archivematica
A Research Data Catalogue supporting Blue Growth: the BlueBRIDGE case
Intact danish workshop_20171001
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
20200130_Mannocci_OpenAIRE_ResearchGraph
ICOS: Integrated Carbon Observation System Open data to open our eyes to clim...
The European Open Science Cloud
Towards a Linked Data Publishing Methodology
Grant Funding Programme
EOSC-hub and OpenAIRE-Advance collaboration (Presentation at RDA 11th plenary)
OpenAIRE: Science. Set Free, Iryna Kuchma, EIFL
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
Knowledge Exchange Consensus: Monitoring of Open Access Publications and Cost...
Using Open Research Data for Public Policy Making: Opportunities of Virtual R...
7th Content Providers Community Call
Wide access to spatial Citizen Science data - ECSA Berlin 2016
Demonstration of the 4C cost comparison tool
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
The Services of the OpenAIREplus Infrastructure for Scholarly Communication –...
Scaling Usage Statistics across Repositories as an OpenAIRE Analytics Service...
From Box to Hydra via Archivematica
Ad

Similar to 20191119_The OpenAIRE Research Graph (20)

PPTX
A user journey in OpenAIRE services through the lens of repository managers -...
PPTX
Facilitate Research Communities Adoption of Open Science Publishing Principle...
PPTX
Introduction to OpenAIRE services and the OpenAIRE Research Graph
PPTX
Moving content across the OpenAIRE infrastructure boundaries (6th RDA Plenary)
PDF
OpenAIRE-connect project poster presented at RDA10 ( RDA Tenth Plenary Meetin...
PPTX
OpenAIRE content in support of Open Science monitoring (Presentation by Paolo...
PPTX
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
PPTX
OpenAIRE services in support of “Open Science as-a-Service” - Presentation at...
PPT
OpenAIRE-Connect: Open Science as a Service for repositories and research com...
PPTX
Make your content count - OpenAIRE Content providers Dashboard: service for r...
PPTX
Infraestructuras, recursos y servicios de OpenAIRE. OpenAIRE Workshop Spain, ...
PPTX
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
PPTX
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
PPTX
OpenAIRE services and tools for Open Science
PPTX
A user journey in OpenAIRE services through the lens of repository managers -...
PPTX
OpenAIRE services & tools: Zenodo and what's next (Danish OpenAIRE workshop)
PPTX
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
PPTX
Belgium webinar - openAIRE Research Graph
PPTX
OpenAIRE: eInfrastructure for Open Science
PPTX
Enabling better science: Results and vision of the OpenAIRE infrastructure an...
A user journey in OpenAIRE services through the lens of repository managers -...
Facilitate Research Communities Adoption of Open Science Publishing Principle...
Introduction to OpenAIRE services and the OpenAIRE Research Graph
Moving content across the OpenAIRE infrastructure boundaries (6th RDA Plenary)
OpenAIRE-connect project poster presented at RDA10 ( RDA Tenth Plenary Meetin...
OpenAIRE content in support of Open Science monitoring (Presentation by Paolo...
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
OpenAIRE services in support of “Open Science as-a-Service” - Presentation at...
OpenAIRE-Connect: Open Science as a Service for repositories and research com...
Make your content count - OpenAIRE Content providers Dashboard: service for r...
Infraestructuras, recursos y servicios de OpenAIRE. OpenAIRE Workshop Spain, ...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools for Open Science
A user journey in OpenAIRE services through the lens of repository managers -...
OpenAIRE services & tools: Zenodo and what's next (Danish OpenAIRE workshop)
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Belgium webinar - openAIRE Research Graph
OpenAIRE: eInfrastructure for Open Science
Enabling better science: Results and vision of the OpenAIRE infrastructure an...
Ad

More from OpenAIRE (20)

PDF
10th OpenAIRE Content Providers Community Call
PDF
9th Content Providers Community Call\
PPTX
OpenAIRE in the European Open Science Cloud (EOSC)
PDF
8th Content Providers Community Call
PDF
OpenAIRE PROVIDE Dashboard for Turkish repository managers
PDF
What will it cost to manage and share my data?
PDF
6th Content Providers Community Call
PPTX
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
PPTX
20200504_Research Data & the GDPR: How Open is Open?
PDF
20200504_Data, Data Ownership and Open Science
PPTX
20200429_Research Data & the GDPR: How Open is Open? (updated version)
PDF
20200429_Data, Data Ownership and Open Science
PPTX
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
PDF
COVID-19: Activities, tools, best practice and contact points in Greece
PDF
5th Content Providers Community Call
PDF
4th Content Providers Community Call
PDF
3rd Content Providers Community Call
PDF
2nd Content Providers Community Call
PDF
1st Content Providers Community Call
PDF
IPR and Exploitation
10th OpenAIRE Content Providers Community Call
9th Content Providers Community Call\
OpenAIRE in the European Open Science Cloud (EOSC)
8th Content Providers Community Call
OpenAIRE PROVIDE Dashboard for Turkish repository managers
What will it cost to manage and share my data?
6th Content Providers Community Call
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200504_Research Data & the GDPR: How Open is Open?
20200504_Data, Data Ownership and Open Science
20200429_Research Data & the GDPR: How Open is Open? (updated version)
20200429_Data, Data Ownership and Open Science
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
COVID-19: Activities, tools, best practice and contact points in Greece
5th Content Providers Community Call
4th Content Providers Community Call
3rd Content Providers Community Call
2nd Content Providers Community Call
1st Content Providers Community Call
IPR and Exploitation

Recently uploaded (20)

PDF
ECG Practice from Passmedicine for MRCP Part 2 2024.pdf
PDF
Traditional Healing Practices: A Model for Integrative Care in Diabetes Mana...
PDF
Cosmology using numerical relativity - what hapenned before big bang?
PDF
cell_morphology_organelles_Physiology_ 07_02_2019.pdf
PDF
From Molecular Interactions to Solubility in Deep Eutectic Solvents: Explorin...
PPTX
Platelet disorders - thrombocytopenia.pptx
PPTX
Cells and Organs of the Immune System (Unit-2) - Majesh Sir.pptx
PPTX
BPharm_Hospital_Organization_Complete_PPT.pptx
PPTX
HAEMATOLOGICAL DISEASES lack of red blood cells, which carry oxygen throughou...
PPTX
Thyroid disorders presentation for MBBS.pptx
PDF
The Future of Telehealth: Engineering New Platforms for Care (www.kiu.ac.ug)
PDF
No dilute core produced in simulations of giant impacts on to Jupiter
PDF
Telemedicine: Transforming Healthcare Delivery in Remote Areas (www.kiu.ac.ug)
PPT
Biochemestry- PPT ON Protein,Nitrogenous constituents of Urine, Blood, their ...
PPTX
Toxicity Studies in Drug Development Ensuring Safety, Efficacy, and Global Co...
PDF
Sustainable Biology- Scopes, Principles of sustainiability, Sustainable Resou...
PPTX
limit test definition and all limit tests
PDF
Geothermal Energy: Unlocking the Earth’s Heat for Power (www.kiu.ac.ug)
PPTX
LOGA.,M ScBIOCHEMISTRY.,DMLT.,DYMH.,DA.,PGDCA.,//*hplc chromatography pptx*//
PPTX
Arterial Blood Pressure_Blood Flow_Hemodynamics.pptx
ECG Practice from Passmedicine for MRCP Part 2 2024.pdf
Traditional Healing Practices: A Model for Integrative Care in Diabetes Mana...
Cosmology using numerical relativity - what hapenned before big bang?
cell_morphology_organelles_Physiology_ 07_02_2019.pdf
From Molecular Interactions to Solubility in Deep Eutectic Solvents: Explorin...
Platelet disorders - thrombocytopenia.pptx
Cells and Organs of the Immune System (Unit-2) - Majesh Sir.pptx
BPharm_Hospital_Organization_Complete_PPT.pptx
HAEMATOLOGICAL DISEASES lack of red blood cells, which carry oxygen throughou...
Thyroid disorders presentation for MBBS.pptx
The Future of Telehealth: Engineering New Platforms for Care (www.kiu.ac.ug)
No dilute core produced in simulations of giant impacts on to Jupiter
Telemedicine: Transforming Healthcare Delivery in Remote Areas (www.kiu.ac.ug)
Biochemestry- PPT ON Protein,Nitrogenous constituents of Urine, Blood, their ...
Toxicity Studies in Drug Development Ensuring Safety, Efficacy, and Global Co...
Sustainable Biology- Scopes, Principles of sustainiability, Sustainable Resou...
limit test definition and all limit tests
Geothermal Energy: Unlocking the Earth’s Heat for Power (www.kiu.ac.ug)
LOGA.,M ScBIOCHEMISTRY.,DMLT.,DYMH.,DA.,PGDCA.,//*hplc chromatography pptx*//
Arterial Blood Pressure_Blood Flow_Hemodynamics.pptx

20191119_The OpenAIRE Research Graph

  • 1. @openaire_euOpenAIRE-Connect Review 23rd of April, 2018 - Brussels The OpenAIRE Research Graph Bringing scholarly communication back into the hands of scientists PaoloManghi InstituteofInformationScienceandTechnologies ConsiglioNazionaledelleRicerche
  • 2. Materializing the Open Science Graph Project communit y FunderFunding Product Publicatio n Researc h Data Software Organizatio n Source Other res. products Mining Deduplication End-user feedback Scientific product catalogue Harvesting GUIDE LINES Research Infrastructures Publishing IT OpenAIREAdvance1stReview|Luxembourg|10Oct2019
  • 3. Providing an open metadata research graph of interlinked scientific products, with Open Access information, linked to funding information and research communities The OpenAIRE research graph Open Complete De-duplicated Transparent Participatory Decentralized Trusted
  • 4. De-duplicated More information about the de-duplication framework used by OpenAIRE can be found searching on Zenodo for : • “De-duplicating the OpenAIRE Scholarly Communication Big Graph” (poster) • “GDup: De-Duplication of Scholarly Communication Big Graphs” Metadata records corresponding to equivalent objects are merged Scientific products Organizations
  • 5. Complete: community-trusted sources Academic Graph … and more … and more … and more … and more … and more … and more OpenAIREAdvance1stReview|Luxembourg|10Oct2019
  • 6. • Rely on quality scholarly communication sources of different kinds Participatory • Include solutions and content from any interested and known content provider in scholarly communication Institutional repositories Aggregators Data archives Software repositories Research infrastructure sources Funder grant databases Authors & Orgs entity registries Publishers & journals
  • 7. • Metadata in the graph includes provenance when harvested and reliability indicators when obtained from mining Transparent
  • 8. • Preservation and ownership beyond OpenAIRE Exchanged with other graph initiatives Broker Service: Redistributed via subscription and notification to contributing data sources (provide.openaire.eu) • Openly accessible via APIs (develop.openaire.eu) Decentralized
  • 9. • Authors in the loop to enrich their ORCID record • Validation of end-user ”claims” Trusted (November 2019)
  • 11. Harvesting: Revised Classification of Research Products Publications • Article • Preprint • Report • … Datasets • Dataset • Collection • Clinical Trials • … Software • Research Software • … Other Research Products • Service • Workflow • Interactive Resource • … Institutional/ publication repositories Journals/ publishers Data repositories Other Products repositories Software repositories Workshop Técnico OpenAIRE / LA Referencia | 29-30 October, 2019 | Costa Rica
  • 12. Open Science publishing Bridging RIs and Scholarly Communication Transparency and reproducibility e-Infrastructures and Research Infrastructures Scholarly Communication infrastructure Dataset Method Thematic Service Dataset Experiment Publishing the experiment Input Dataset Input Method Output Dataset Experiment product Thematic Service Parameters Experiment repo Research data, Software, Workflows, Publications Data repo Method repo Publications IT Harvesting OpenAIREAdvance1stReview|Luxembourg|10Oct2019
  • 13. • EPOS Research Infrastructure Reproducibility Transparency Seamless publishing Open Science publishing workflows OpenAIREAdvance1stReview|Luxembourg|10Oct2019
  • 14. Pre-processed sources Article-dataset links 480Mi links CrossRef enriched 85Mi publication records DOIBoost Academic Graph Published every 6 months (new versions to be published next week) OpenAIREAdvance1stReview|Luxembourg|10Oct2019
  • 15. Context Propagation Product Source Country Project Organization communit y Product Project Source Product Project Product supplementedBy fundedBy hostedBy (institutional repository) located Funder funds (National Funder) fundedBy jurisdiction located ofInterestofInterest fundedBy hostedBy Product supplementedBy 157K 8Mi 10K OpenAIREAdvance1stReview|Luxembourg|10Oct2019
  • 16. Production: Open Access CAPs BETA: Open Science CAPs 0 10000000 20000000 30000000 40000000 50000000 60000000 70000000 80000000 90000000 100000000 Old CAP New CAP literature 0 2000000 4000000 6000000 8000000 10000000 12000000 Old CAP New CAP research data 0 20000 40000 60000 80000 100000 120000 140000 Old CAP New CAP software 0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 Old CAP New CAP other 110Mi 30Mi 1Mi 10Mi 100K 180K 3Mi 7.5Mi Harvested content • Data sources 10K + • Records ~480Mi • Publication full-texts ~12Mi (Springer N. coming) • Links (also text-mined) ~960Mi PROD BETA PROD BETA PROD BETAPROD BETA OpenAIREAdvance1stReview|Luxembourg|10Oct2019
  • 17. Microsoft Research (being drafted) Unpaywall (ongoing) ORCID membership (November 2019) RDA IG Open Science Graphs for FAIR Data FREYA, ResearchGraph, OpenCitations, Open Knowledge Research Graph IG Session at RDA Helsinki 2019 (15th of October 2019) Liaisons Academic Graph OpenAIREAdvance1stReview|Luxembourg|10Oct2019
  • 18. • October-November 2019: OpenAIRE Research Graph open for consultation Collecting feedback via Trello (operational end of September) • December 2019: OpenAIRE Research Graph in production BETA Graph Open Consultation http://beta.explore.openaire.eu OpenAIREAdvance1stReview|Luxembourg|10Oct2019
  • 19. Trello for for feedback
  • 22. Metadata records files cleaned records Full-text cache Transform Clean Identify equivelent products and organisation s Aggregation subsystem De-duplication subsystem Information Inference subsystem Data Sources Populate Merge equivalent objects Data provision subsystem Collect Native graph “slices” Publishing subsystem Data Monitoring Action Sets (similarity rels) Front-end Native graph Deduped graph Extract full-text Copy of deduped graph Enrich graphs with links Action Set (inferred links) Enriched graph Propagation Text-mining of the full-texts and the graph to derive new semantic links Architecture and technologies: today
  • 23. Task 9.1. System administration - infrastructure: before Jan 2018 Public System 20srv 122CPU 320GB 8TB Mining System 21srv 406CPU 2TB 385TB Data provision System 23srv 154CPU 430GB 23TB Testing System 5srv 30CPU 100GB 3TB Public System 44srv 274CPU 905GB 20TB Mining System 22srv 414CPU 2.2TB 388TB Data provision System 23srv 154CPU 430GB 24TB Testing System 14srv 86CPU 302GB 9TB OpenAIREAdvance1stReview|Luxembourg|10Oct2019

Editor's Notes

  • #3: How does OpenAIRE materializes the graph? Collection records (dedup) Collection full-texts for OA publications Mining full-texts of publications to find links to data, software, other product, projects, research communities and infrastructures and enhance metadata with affiliation information, subjects/keywords: article-data and data-data links are around 120 Mi, article-article similarity links are around 300Mi
  • #4: GOAL: High quality open graph for Open (because it must be), Complete (all «trusted»/known sources), Deduplicated (must be disambiguated for statistics), transparent (provenance), participatory (not a closed network), decentralised (ownership and redistribution), trusted (manual curation)
  • #5: Supported entity types People to come with orcid collaboration Algorithm can be improved but some cases can be handled only manually
  • #7: Any interested content provider can join the network to provide content. Not a closed network. Interoperability guidelines help in the process.
  • #8: Mining trust: probability of the mining information to be correct
  • #9: OpenAIRE DOES NOT own the graph
  • #16: Supported entity types People to come with orcid collaboration Algorithm can be improved but some cases can be handled only manually
  • #17: In production today we acquire content according to Open Access-driven CAP: 30 mi pubs with links to other objects (e.g. 1Mi datasets, etc) In BETA we acquire content according to Open Science-driven CAP: this means we collect EVERYTHING (that is in a trusted source) menaing also non-OA content