Let's do data research work: the
creation of a portal with research
information from Catalan
Universities
Ramon Ros i Gorné
also Lluís M. Anglada i de Ferrer, Sandra Reoyo i Tudó and
Ricard de la Vega i Sivera
(CSUC)
Open Respositories 2014 
Helsinki, June 13th
Outline
1. Who we are
2. What we have (DSpace repositories)
3. The PRC project and firsts decisions
 Identifiers
 Software
 Data mapping
 Data flow
 Data exchange format
4. Current status
5. Work to be done
New merged consortium in 2014
for catalan universities with more services and projects
• The current CBUC ones
• The current CESCA ones
• Join purchases (electricity, printing,
cleaning, facilities, etc.)
• Common data center
• Portal for the research output (PRC)
• Electronic administrative procedures.
• Etc.
Outline
1. Who we are
2. What we have (DSpace repositories)
3. The PRC project and firsts decisions
 Identifiers
 Software
 Data mapping
 Data flow
 Data exchange format
4. Current status
5. Work to be done
CSUC’s DSpace repositories
Coming soon on 2014
from 2001
www.tdx.cat
from 2009
www.mdx.cat
from 2012
repositori.filmoteca.cat
Coming soon on 2014
from 2005
www.recercat.cat
from 2010
calaix.gencat.cat
Pilot on 2012
from 2013
www.cirax.cat
Outline
1. Who we are
2. What we have (DSpace repositories)
3. The PRC project and firsts decisions
 Identifiers
 Software
 Data mapping
 Data flow
 Data exchange format
4. Current status
5. Work to be done
Situation in 2012
– CBUC promotes IR since 1999
– Some universities (UPC & UPF) already have
research portals
– There are new standards and protocols that
help interoperability between IR and CRIS
– Research output is becoming more important
for the univeristy managers.
What
• To create a portal to find the research outputs of the Catalan
research system
Why
• To increase the visibility of the research done in Catalonia
• To foster OA
• To increase interoperability between data
How
• Taking advantage of the leverage work previously done
– In IR, CRIS and statistical data (Uneix)
• The central idea: the works done for the portal will improve
local IR and CRIS
• Following international best practices
– Narcis / Holland; HKU Scholars Hub / Hong Kong
Decision in 2012
PRC building. Firsts decisions
 Identifiers  ORCID
 Software  DSpace + CINECA CRIS
 Data mapping
 Data flow  from local CRIS systems
 Data exchange format  CERIF XML
ORCID as researcher identifier
1. Selection of identifier
– Decision based in a CBUC report: Sistemes d’identificació unívoca
d’investigadors / Àngel Borrego
2. Technical work
– Modify all the local CRIS in order to allow to load the ORCID identifier
– Promotion of ORCID id in other working groups: repositories, CCUC,
Mendeley…
3. ORCID diffusion
– We studied the ORCID API to create ORCID id automatically, but we
decided not to use it
– Merchandising, translations, videos, ‘good practices’ document ...
– UB (the biggest university) have a mandate for an ORCID id in some process
related with research assessment
Evoloution of ORCID registered
researchers
* Data provided by ORCID. Number of researchers registered with their university email.
0 200 400 600 800 1000 1200 1400 1600 1800
UB
UAB
UPC
UPF
UdG
UdL
URV
UOC
UVic
UIC
URL
oct
‐13
feb
‐14
abr
‐14
jun
‐14
oct‐13 feb‐14 abr‐14 jun‐14 TOTAL
UB 206 106 1263 128 1703
UAB 176 90 36 287 589
UPC 368 59 39 196 662
UPF 135 75 299 119 628
UdG 69 38 16 20 143
UdL 6 7 1 2 16
URV 102 48 42 25 217
UOC 43 11 11 14 79
UVic 18 150 2 24 194
UIC 11 2 5 41 59
URL 30 33 78 22 163
TOTAL 1164 619 1792 878 4453
Software
• Based on DSpace‐CRIS of CINECA (like Hong Kong 
University)
• Main challenges (to adapt/develop)
– From one institution to multi‐institution
– From submit contents to harvest from local CRIS instances
– Massive import mechanisms are needed (XML‐CERIF….)
PRC entities
Universities
Departaments 
& Institutes
Research
groups
Researchers
Research
projects
Publications
(Articles + 
Books+ ETDs)
Lots of discussion on data mapping...
DSpace with the CRIS module.
Main entities
15
DSpace
Publication
CRIS module
Person
OrganizationOrganization
Project
DSpace with the CRIS module.
Detailed entities
16
DSpace
Publication
CRIS module
Person. Researcher
Organization. Research groupOrganization. University ‐> comunities
Organization. Department ‐> collections
Author
Project
Data flow, protocols, sources and formats
Other
DRAC
Universitas XXI
GREC
SIGMA
UNEIX
Local and consortia
repositories. 
Mainly DSpace
Catalan
government
DataWarehouse
PRC. Based on
DSpace+Cineca CRIS.
12 university CRIS 
systems (from 4 
different vendors)
Protocol: OAI‐PMH/SWORD
Format: DC
Protocol: OAI‐PMH
Format: CERIF‐XML
Protocol: XLS files
Format: UNEIX defined
CERIF model
cfExpertise
AndSkills
cfEquipmentcfFunding
cfFacility
cfService
cfCitation
cfEvent
cfLanguage cfCurrency
cfCountry
cfCurriculum
Vitae
cfPrize
cfQualification
cfGeographic
BoundingBox
cfPostalAddress
cfElectronicAddress
cfPerson
cfProject
cfOrganisation
Unit
cfResultPatent
cfResult
Publication
cfResultProduct
cfIndicator cfMeasurement
cfFederated
Identifier
Simplification of CERIF for PRC
Simplified CERIF subset for PRC
cfPerson
cfProject
cfOrganisation
Unit
cfResult
Publication
Outline
1. Who we are
2. What we have (DSpace repositories)
3. The PRC project and firsts decisions
 Identifiers
 Software
 Data mapping
 Data flow
 Data exchange format
4. Current status
5. Work to be done
Main achievements
• Good working team
• People from ≠ universities and ≠ services
• Agreement: to use ORCID for researchers
• Already done
– We succeed to export 20 complete data records from 11
universities (using 5 different CRIS)
– All the CRIS systems already have a field for ORCID
– A good program selected
• Adopted by EUROCRIS as repository because CERIF compliance
Step 1: prototipe
Sample data
Manual entry
Step 2: first batch load
Data sample from all universities.
CSV/XLS format Step 3: full batch load
All data from all universities.
CSV/XLS format
Step 4: CERIF‐XML 
ingest
First manual CERIF‐XML ingest
Step 5: OAI‐PMH
automatic ingest.
Full syncronization with local 
CRIS systems.
Implementation steps
Outline
1. Who we are
2. What we have (DSpace repositories)
3. The PRC project and firsts decisions
 Identifiers
 Software
 Data mapping
 Data flow
 Data exchange format
4. Current status
5. Work to be done
Work to be done & challenges
• Organizational
• More meetings with expert group
• ORCID ids implementation
• MoU for personal data
• External adaptation
• Local CRIS system to adapt XML‐CERIF wrapping (export)
• Portal implementation
• Ingest the full data of all institutions
• Design and build the user interfaces
• Develop the CERIF‐XML import mechanisms
• Think about depuration & deduplication data mechanisms
Thanks!
Ramon Ros i Gorné
(CSUC) 
ramon.ros@csuc.cat
http://www.csuc.cat

More Related Content

PPTX
The Catalan Research portal: collecting information from Catalan universities...
PDF
The Catalan Research portal: collecting information from Catalan universities...
PPTX
Doing it together: spreading ORCID among Catalan universities and researchers
PPTX
The Catalan Research portal: ready to go
PDF
Geo linked data lstd10(v2-boris)
PPTX
PDF
Archiving archaeological data in Austria, Edeltraud Aspöck, Anja Masur OREA/ÖAW
PDF
RDF Data and Image Annotations in ResearchSpace (slides)
The Catalan Research portal: collecting information from Catalan universities...
The Catalan Research portal: collecting information from Catalan universities...
Doing it together: spreading ORCID among Catalan universities and researchers
The Catalan Research portal: ready to go
Geo linked data lstd10(v2-boris)
Archiving archaeological data in Austria, Edeltraud Aspöck, Anja Masur OREA/ÖAW
RDF Data and Image Annotations in ResearchSpace (slides)

What's hot (20)

PDF
Pieterjan Deckers - Medea an online platform for recording metal-detected finds
PDF
EAA2014 Istanbul - Barriers and Opportunities for Linked Open Data use in Arc...
PDF
What is an archaeological research infrastructure and why do we need it? Aims...
PPT
Open Data Publication - Requirements, Good practices, and Benefits
PDF
Claudia Marinica - Supporting Semantic Interoperability in Conservation-Resto...
PPT
Estrategias basadas en la interoperabilidad para la incorporación de contenid...
PPTX
Ariadne overview
PDF
lodlam summit session browsable linked data
PDF
Deploy of CENIEH’s new institutional repository
PPTX
An analysis of the quality issues of the properties available in the Spanish ...
PPT
Museum Linked Open Data: Ontologies, Datasets, Projects
PDF
ArCo, the knowledge graph of the Italian Cultural Heritage: new developments
PDF
Moving ahead: The ARIADNE integration process
PDF
Open Access of Research Data - The Present and Future Situation in Germany
PDF
Developing common European archaeological concepts through extending the CIDO...
PPTX
Sheldon challenge
PDF
Ontotext Cultural Heritage and Digital Humanities Projects
PDF
Open Data in Archaeology, Julian D. Richards
PPTX
Ariadne: Interoperability
PDF
Museum LOD (Ontotext, 1 May 2019, Doha, Qatar)
Pieterjan Deckers - Medea an online platform for recording metal-detected finds
EAA2014 Istanbul - Barriers and Opportunities for Linked Open Data use in Arc...
What is an archaeological research infrastructure and why do we need it? Aims...
Open Data Publication - Requirements, Good practices, and Benefits
Claudia Marinica - Supporting Semantic Interoperability in Conservation-Resto...
Estrategias basadas en la interoperabilidad para la incorporación de contenid...
Ariadne overview
lodlam summit session browsable linked data
Deploy of CENIEH’s new institutional repository
An analysis of the quality issues of the properties available in the Spanish ...
Museum Linked Open Data: Ontologies, Datasets, Projects
ArCo, the knowledge graph of the Italian Cultural Heritage: new developments
Moving ahead: The ARIADNE integration process
Open Access of Research Data - The Present and Future Situation in Germany
Developing common European archaeological concepts through extending the CIDO...
Sheldon challenge
Ontotext Cultural Heritage and Digital Humanities Projects
Open Data in Archaeology, Julian D. Richards
Ariadne: Interoperability
Museum LOD (Ontotext, 1 May 2019, Doha, Qatar)
Ad

Similar to Let's do data research work: the creation of a portal with research information from Catalan Universities (20)

PPTX
Research Data Management: CSUC activities & services
PPTX
Research data management: DMP & repository
PPTX
Research Data Management at Imperial College London
PPTX
ICOS Services and Products
PPTX
Data Management Planning at the DCC
PDF
DSpace-CRIS_An open source solution for Research_EDU15
PDF
Models for integrating institutional repositories and research information ma...
PPTX
From Open Access to Open Data: collaborative work in the university libraries...
PDF
From Open Access to Open Data: Collaborative Work in the University Libraries...
PPTX
A collaborative approach to "filling the digital preservation gap" for Resear...
PPTX
WEBINAR: "How to manage your data to make them open and fair"
PPT
CERIF CRIS UK landscape
PDF
Software and Education at NSF/ACI
PPTX
Building COVID-19 Museum as Open Science Project
 
PPTX
L&P Humphrey Stewart-Shearer-Joint Session Project ARC & Federated DMP Pilot
PPTX
DSpace-CRIS Workshop OR2015: Slides
PPT
VIVO Conference 2013 Panel Slides
PPTX
Data-intensive bioinformatics on HPC and Cloud
PPTX
Creating Data Management Plans with CORA.eiNa DMP
PDF
Service Integration to Enhance RDM
Research Data Management: CSUC activities & services
Research data management: DMP & repository
Research Data Management at Imperial College London
ICOS Services and Products
Data Management Planning at the DCC
DSpace-CRIS_An open source solution for Research_EDU15
Models for integrating institutional repositories and research information ma...
From Open Access to Open Data: collaborative work in the university libraries...
From Open Access to Open Data: Collaborative Work in the University Libraries...
A collaborative approach to "filling the digital preservation gap" for Resear...
WEBINAR: "How to manage your data to make them open and fair"
CERIF CRIS UK landscape
Software and Education at NSF/ACI
Building COVID-19 Museum as Open Science Project
 
L&P Humphrey Stewart-Shearer-Joint Session Project ARC & Federated DMP Pilot
DSpace-CRIS Workshop OR2015: Slides
VIVO Conference 2013 Panel Slides
Data-intensive bioinformatics on HPC and Cloud
Creating Data Management Plans with CORA.eiNa DMP
Service Integration to Enhance RDM
Ad

More from Ricard de la Vega (20)

PDF
The Research Portal of Catalonia: Growing more (information) & more (services)
PDF
Servicios de datos para todo el ciclode investigación
PDF
Visualització de dades
PDF
Visualització de dades
PDF
Padicat: O archivo da web da Catalunha
PDF
La conservació digital d'obres cinematpgràfiques: un projecte del CSUC pel Ce...
PDF
Proyectos cooperativos de ciencia abierta en Catalunya
PDF
Requisitos funcionales para la creación de repositorios consorciados de datos...
PDF
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
PDF
Quatre tuits sobre metodologies àgils
PDF
Preservaçao digital de tese e dissertaçoes
PDF
Informàtic
PDF
Analysis of requirements and benchmarking of CRIS for the Universities of Cat...
PDF
Research Papers Recommender based on Digital Repositories Metadata
PDF
Recomendador de artículos científicos basado en metadatos de repositorios dig...
PDF
Preservaçao digital distribuída de um repositório de teses de doutorado (TDX)
PDF
De què parlem quan parlem de serveis al núvol?
PDF
El Portal de la Investigación de Catalunya, una suma de información de los CR...
PDF
Top ten-dències tecnològiques
PDF
Infraestructures per dades de recerca
The Research Portal of Catalonia: Growing more (information) & more (services)
Servicios de datos para todo el ciclode investigación
Visualització de dades
Visualització de dades
Padicat: O archivo da web da Catalunha
La conservació digital d'obres cinematpgràfiques: un projecte del CSUC pel Ce...
Proyectos cooperativos de ciencia abierta en Catalunya
Requisitos funcionales para la creación de repositorios consorciados de datos...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Quatre tuits sobre metodologies àgils
Preservaçao digital de tese e dissertaçoes
Informàtic
Analysis of requirements and benchmarking of CRIS for the Universities of Cat...
Research Papers Recommender based on Digital Repositories Metadata
Recomendador de artículos científicos basado en metadatos de repositorios dig...
Preservaçao digital distribuída de um repositório de teses de doutorado (TDX)
De què parlem quan parlem de serveis al núvol?
El Portal de la Investigación de Catalunya, una suma de información de los CR...
Top ten-dències tecnològiques
Infraestructures per dades de recerca

Recently uploaded (20)

PPTX
Arterial Blood Pressure_Blood Flow_Hemodynamics.pptx
PDF
Sujay Rao Mandavilli IJISRT25AUG764 context based approaches to population ma...
PDF
Energy Giving Molecules bioenergetics again
PPTX
Heart Lung Preparation_Pressure_Volume.pptx
PPTX
Neuro Ophthalmic diseases and their lesions
PDF
Telemedicine: Transforming Healthcare Delivery in Remote Areas (www.kiu.ac.ug)
PDF
The Future of Telehealth: Engineering New Platforms for Care (www.kiu.ac.ug)
PDF
CHEM - GOC general organic chemistry.ppt
PPT
ZooLec Chapter 13 (Digestive System).ppt
PDF
Microplastics: Environmental Impact and Remediation Strategies
PDF
Chapter 3 - Human Development Poweroint presentation
PPTX
02_OpenStax_Chemistry_Slides_20180406 copy.pptx
PDF
No dilute core produced in simulations of giant impacts on to Jupiter
PPTX
BPharm_Hospital_Organization_Complete_PPT.pptx
PPTX
Toxicity Studies in Drug Development Ensuring Safety, Efficacy, and Global Co...
PPTX
Thyroid disorders presentation for MBBS.pptx
PPT
Chapter 6 Introductory course Biology Camp
PDF
Integrative Oncology: Merging Conventional and Alternative Approaches (www.k...
PPTX
Introduction to Immunology (Unit-1).pptx
PPTX
Spectroscopy techniques in forensic science _ppt.pptx
Arterial Blood Pressure_Blood Flow_Hemodynamics.pptx
Sujay Rao Mandavilli IJISRT25AUG764 context based approaches to population ma...
Energy Giving Molecules bioenergetics again
Heart Lung Preparation_Pressure_Volume.pptx
Neuro Ophthalmic diseases and their lesions
Telemedicine: Transforming Healthcare Delivery in Remote Areas (www.kiu.ac.ug)
The Future of Telehealth: Engineering New Platforms for Care (www.kiu.ac.ug)
CHEM - GOC general organic chemistry.ppt
ZooLec Chapter 13 (Digestive System).ppt
Microplastics: Environmental Impact and Remediation Strategies
Chapter 3 - Human Development Poweroint presentation
02_OpenStax_Chemistry_Slides_20180406 copy.pptx
No dilute core produced in simulations of giant impacts on to Jupiter
BPharm_Hospital_Organization_Complete_PPT.pptx
Toxicity Studies in Drug Development Ensuring Safety, Efficacy, and Global Co...
Thyroid disorders presentation for MBBS.pptx
Chapter 6 Introductory course Biology Camp
Integrative Oncology: Merging Conventional and Alternative Approaches (www.k...
Introduction to Immunology (Unit-1).pptx
Spectroscopy techniques in forensic science _ppt.pptx

Let's do data research work: the creation of a portal with research information from Catalan Universities

  • 1. Let's do data research work: the creation of a portal with research information from Catalan Universities Ramon Ros i Gorné also Lluís M. Anglada i de Ferrer, Sandra Reoyo i Tudó and Ricard de la Vega i Sivera (CSUC) Open Respositories 2014  Helsinki, June 13th
  • 2. Outline 1. Who we are 2. What we have (DSpace repositories) 3. The PRC project and firsts decisions  Identifiers  Software  Data mapping  Data flow  Data exchange format 4. Current status 5. Work to be done
  • 3. New merged consortium in 2014 for catalan universities with more services and projects • The current CBUC ones • The current CESCA ones • Join purchases (electricity, printing, cleaning, facilities, etc.) • Common data center • Portal for the research output (PRC) • Electronic administrative procedures. • Etc.
  • 4. Outline 1. Who we are 2. What we have (DSpace repositories) 3. The PRC project and firsts decisions  Identifiers  Software  Data mapping  Data flow  Data exchange format 4. Current status 5. Work to be done
  • 5. CSUC’s DSpace repositories Coming soon on 2014 from 2001 www.tdx.cat from 2009 www.mdx.cat from 2012 repositori.filmoteca.cat Coming soon on 2014 from 2005 www.recercat.cat from 2010 calaix.gencat.cat Pilot on 2012 from 2013 www.cirax.cat
  • 6. Outline 1. Who we are 2. What we have (DSpace repositories) 3. The PRC project and firsts decisions  Identifiers  Software  Data mapping  Data flow  Data exchange format 4. Current status 5. Work to be done
  • 7. Situation in 2012 – CBUC promotes IR since 1999 – Some universities (UPC & UPF) already have research portals – There are new standards and protocols that help interoperability between IR and CRIS – Research output is becoming more important for the univeristy managers.
  • 8. What • To create a portal to find the research outputs of the Catalan research system Why • To increase the visibility of the research done in Catalonia • To foster OA • To increase interoperability between data How • Taking advantage of the leverage work previously done – In IR, CRIS and statistical data (Uneix) • The central idea: the works done for the portal will improve local IR and CRIS • Following international best practices – Narcis / Holland; HKU Scholars Hub / Hong Kong Decision in 2012
  • 9. PRC building. Firsts decisions  Identifiers  ORCID  Software  DSpace + CINECA CRIS  Data mapping  Data flow  from local CRIS systems  Data exchange format  CERIF XML
  • 10. ORCID as researcher identifier 1. Selection of identifier – Decision based in a CBUC report: Sistemes d’identificació unívoca d’investigadors / Àngel Borrego 2. Technical work – Modify all the local CRIS in order to allow to load the ORCID identifier – Promotion of ORCID id in other working groups: repositories, CCUC, Mendeley… 3. ORCID diffusion – We studied the ORCID API to create ORCID id automatically, but we decided not to use it – Merchandising, translations, videos, ‘good practices’ document ... – UB (the biggest university) have a mandate for an ORCID id in some process related with research assessment
  • 11. Evoloution of ORCID registered researchers * Data provided by ORCID. Number of researchers registered with their university email. 0 200 400 600 800 1000 1200 1400 1600 1800 UB UAB UPC UPF UdG UdL URV UOC UVic UIC URL oct ‐13 feb ‐14 abr ‐14 jun ‐14 oct‐13 feb‐14 abr‐14 jun‐14 TOTAL UB 206 106 1263 128 1703 UAB 176 90 36 287 589 UPC 368 59 39 196 662 UPF 135 75 299 119 628 UdG 69 38 16 20 143 UdL 6 7 1 2 16 URV 102 48 42 25 217 UOC 43 11 11 14 79 UVic 18 150 2 24 194 UIC 11 2 5 41 59 URL 30 33 78 22 163 TOTAL 1164 619 1792 878 4453
  • 12. Software • Based on DSpace‐CRIS of CINECA (like Hong Kong  University) • Main challenges (to adapt/develop) – From one institution to multi‐institution – From submit contents to harvest from local CRIS instances – Massive import mechanisms are needed (XML‐CERIF….)
  • 15. DSpace with the CRIS module. Main entities 15 DSpace Publication CRIS module Person OrganizationOrganization Project
  • 16. DSpace with the CRIS module. Detailed entities 16 DSpace Publication CRIS module Person. Researcher Organization. Research groupOrganization. University ‐> comunities Organization. Department ‐> collections Author Project
  • 17. Data flow, protocols, sources and formats Other DRAC Universitas XXI GREC SIGMA UNEIX Local and consortia repositories.  Mainly DSpace Catalan government DataWarehouse PRC. Based on DSpace+Cineca CRIS. 12 university CRIS  systems (from 4  different vendors) Protocol: OAI‐PMH/SWORD Format: DC Protocol: OAI‐PMH Format: CERIF‐XML Protocol: XLS files Format: UNEIX defined
  • 20. Simplified CERIF subset for PRC cfPerson cfProject cfOrganisation Unit cfResult Publication
  • 21. Outline 1. Who we are 2. What we have (DSpace repositories) 3. The PRC project and firsts decisions  Identifiers  Software  Data mapping  Data flow  Data exchange format 4. Current status 5. Work to be done
  • 22. Main achievements • Good working team • People from ≠ universities and ≠ services • Agreement: to use ORCID for researchers • Already done – We succeed to export 20 complete data records from 11 universities (using 5 different CRIS) – All the CRIS systems already have a field for ORCID – A good program selected • Adopted by EUROCRIS as repository because CERIF compliance
  • 23. Step 1: prototipe Sample data Manual entry Step 2: first batch load Data sample from all universities. CSV/XLS format Step 3: full batch load All data from all universities. CSV/XLS format Step 4: CERIF‐XML  ingest First manual CERIF‐XML ingest Step 5: OAI‐PMH automatic ingest. Full syncronization with local  CRIS systems. Implementation steps
  • 24. Outline 1. Who we are 2. What we have (DSpace repositories) 3. The PRC project and firsts decisions  Identifiers  Software  Data mapping  Data flow  Data exchange format 4. Current status 5. Work to be done
  • 25. Work to be done & challenges • Organizational • More meetings with expert group • ORCID ids implementation • MoU for personal data • External adaptation • Local CRIS system to adapt XML‐CERIF wrapping (export) • Portal implementation • Ingest the full data of all institutions • Design and build the user interfaces • Develop the CERIF‐XML import mechanisms • Think about depuration & deduplication data mechanisms