Scaling Usage Statistics across
Repositories as an OpenAIRE Analytics
Service
Dimitris Pierrakos, ATHENA Research & Innovation Center
Jochen Schirrwagen, Bielefeld University
Pedro Miguel Oliveira Bento Príncipe, University of Minho
Ricardo Saraiva, University of Minho
OR2016 Conference – Dublin – June 2016
OR2016 Conference – Dublin – June 2016
Outline
• Introduction
• Methodology
• Pilots & Preliminary Results
• Conclusions & The Future
OR2016 Conference – Dublin – June 2016
Outline
• Introduction
• Methodology
• Pilots & Preliminary Results
• Conclusions & The Future
OR2016 Conference – Dublin – June 2016
OpenAIRE 2020
• A pan-European Research Information platform to
monitor OA research outcomes from EC and other
national funders.
• Research analytics tools to promote new scientific
metrics & support evidence-based decision-making.
• Implementation of an OpenAIRE usage analytics
service for usage data collected from data providers
OR2016 Conference – Dublin – June 2016
Usage Analysis Service: Aims
• Standard alignment across heterogeneous data
providers for gathering usage data & sharing
statistics.
• Taking care of data privacy policies in EU and
member states.
• Collection, measure and analysis of usage data
(downloads and views).
• Correlate with other altmetrics.
OR2016 Conference – Dublin – June 2016
Outline
• Introduction
• Methodology
• Pilots & Preliminary Results
• Conclusions & The Future
OR2016 Conference – Dublin – June 2016
Methodology
• Tracking phase
• Tier 1 approach: direct tracking
• Tier 2 approach: exploit Sushi Lite API
• Analysis phase
• Import
• Process (COUNTER4 compliance)
• Analyze
OR2016 Conference – Dublin – June 2016
Using Piwik in OpenAIRE
• An Open Source analytics platform
• Tracking via JavaScript embedded in Web pages
• Usage parameters:VisitorID, SessionID,Visitor
IP,Timestamp,Country,and many more
• IP anonymization enabled
• Bots handling
OR2016 Conference – Dublin – June 2016
Counter Code of Practice
• An International, extendible Code of Practice for e-
Resources.
• Measures usage information in a credible,
consistent and compatible way using vendor-
generated data.
• Specifications for:
• Data Collection & Processing
• Usage Analysis Reports
• Currently in Release 4
OR2016 Conference – Dublin – June 2016
Tier 1 Tracking Workflow
Repository
Javascript event trackers
Data Anonymization
Usage
Data
Import Process
OR2016 Conference – Dublin – June 2016
Tier 2: Aggregated Statistics
Workflow using SUSHI Lite
Aggregator
service
Repository 1 Repository 2
Anonymization
Import Process
Usage Data
OR2016 Conference – Dublin – June 2016
Usage Data Analysis
Usage
Statistics
OpenAIRE
Usage Data
OR2016 Conference – Dublin – June 2016
Deduplication Process
Deduplication
Item_xxx
rep1_id
Item_xxx
rep2_id
Dedup_xxx
(rep1_Id,
rep2_id)
Repository 2
Repository 1
• Enhances the
calculation of
usage statistics by
having a single id
for common
records
• Disseminate cross-
repository usage
statistics
OR2016 Conference – Dublin – June 2016
Outline
• Introduction
• Methodology
• Pilots & Preliminary Results
• Conclusions & The Future
OR2016 Conference – Dublin – June 2016
Pilots 1st phase
3 Repositories OpenAIRE Portal
OR2016 Conference – Dublin – June 2016
Pilots 2nd phase
31 Repositories OpenAIRE Portal
IRUS-UK
1 Repository
1 Repository
1 Repository
2 Repositories
OR2016 Conference – Dublin – June 2016
Preliminary Results
Metadata Views – Downloads on Pilot Repositories
0
50000
100000
150000
200000
250000
300000
350000
400000
UMINHO UEVORA UCOIMBRA
views
downloads
OR2016 Conference – Dublin – June 2016
Preliminary Results
Metadata Views – Downloads Duplicate Information
0
5
10
15
20
25
30
Duplicate Articles Views Downloads
UMINHO downloads
UEVORA downloads
UMINHO views
UEVORA views
OR2016 Conference – Dublin – June 2016
Outline
• Introduction
• Methodology
• Pilots & Preliminary Results
• Conclusions & The Future
OR2016 Conference – Dublin – June 2016
Conclusions
✓ Usage Analysis in OpenAIRE
✓ Methodology
✓ Pilot Results
✓ Challenges
✓ Better handling of bots tracking and “gaming”
activity in usage data
✓ Tackling of direct downloads
OR2016 Conference – Dublin – June 2016
The Future
✓Collaboration with National Open
Access Desks (NOADs) for usage
service dissemination
✓Beta release in 2016
✓Production release in 2017.
OR2016 Conference – Dublin – June 2016
http://openaire.eu
@openaire_eu
https://www.facebook.com/groups/openaire/
https://www.linkedin.com/groups/3893548/profile
info@openaire.eu

Scaling Usage Statistics across Repositories as an OpenAIRE Analytics Service - presentation at #OR2016

  • 1.
    Scaling Usage Statisticsacross Repositories as an OpenAIRE Analytics Service Dimitris Pierrakos, ATHENA Research & Innovation Center Jochen Schirrwagen, Bielefeld University Pedro Miguel Oliveira Bento Príncipe, University of Minho Ricardo Saraiva, University of Minho OR2016 Conference – Dublin – June 2016
  • 2.
    OR2016 Conference –Dublin – June 2016 Outline • Introduction • Methodology • Pilots & Preliminary Results • Conclusions & The Future
  • 3.
    OR2016 Conference –Dublin – June 2016 Outline • Introduction • Methodology • Pilots & Preliminary Results • Conclusions & The Future
  • 4.
    OR2016 Conference –Dublin – June 2016 OpenAIRE 2020 • A pan-European Research Information platform to monitor OA research outcomes from EC and other national funders. • Research analytics tools to promote new scientific metrics & support evidence-based decision-making. • Implementation of an OpenAIRE usage analytics service for usage data collected from data providers
  • 5.
    OR2016 Conference –Dublin – June 2016 Usage Analysis Service: Aims • Standard alignment across heterogeneous data providers for gathering usage data & sharing statistics. • Taking care of data privacy policies in EU and member states. • Collection, measure and analysis of usage data (downloads and views). • Correlate with other altmetrics.
  • 6.
    OR2016 Conference –Dublin – June 2016 Outline • Introduction • Methodology • Pilots & Preliminary Results • Conclusions & The Future
  • 7.
    OR2016 Conference –Dublin – June 2016 Methodology • Tracking phase • Tier 1 approach: direct tracking • Tier 2 approach: exploit Sushi Lite API • Analysis phase • Import • Process (COUNTER4 compliance) • Analyze
  • 8.
    OR2016 Conference –Dublin – June 2016 Using Piwik in OpenAIRE • An Open Source analytics platform • Tracking via JavaScript embedded in Web pages • Usage parameters:VisitorID, SessionID,Visitor IP,Timestamp,Country,and many more • IP anonymization enabled • Bots handling
  • 9.
    OR2016 Conference –Dublin – June 2016 Counter Code of Practice • An International, extendible Code of Practice for e- Resources. • Measures usage information in a credible, consistent and compatible way using vendor- generated data. • Specifications for: • Data Collection & Processing • Usage Analysis Reports • Currently in Release 4
  • 10.
    OR2016 Conference –Dublin – June 2016 Tier 1 Tracking Workflow Repository Javascript event trackers Data Anonymization Usage Data Import Process
  • 11.
    OR2016 Conference –Dublin – June 2016 Tier 2: Aggregated Statistics Workflow using SUSHI Lite Aggregator service Repository 1 Repository 2 Anonymization Import Process Usage Data
  • 12.
    OR2016 Conference –Dublin – June 2016 Usage Data Analysis Usage Statistics OpenAIRE Usage Data
  • 13.
    OR2016 Conference –Dublin – June 2016 Deduplication Process Deduplication Item_xxx rep1_id Item_xxx rep2_id Dedup_xxx (rep1_Id, rep2_id) Repository 2 Repository 1 • Enhances the calculation of usage statistics by having a single id for common records • Disseminate cross- repository usage statistics
  • 14.
    OR2016 Conference –Dublin – June 2016 Outline • Introduction • Methodology • Pilots & Preliminary Results • Conclusions & The Future
  • 15.
    OR2016 Conference –Dublin – June 2016 Pilots 1st phase 3 Repositories OpenAIRE Portal
  • 16.
    OR2016 Conference –Dublin – June 2016 Pilots 2nd phase 31 Repositories OpenAIRE Portal IRUS-UK 1 Repository 1 Repository 1 Repository 2 Repositories
  • 17.
    OR2016 Conference –Dublin – June 2016 Preliminary Results Metadata Views – Downloads on Pilot Repositories 0 50000 100000 150000 200000 250000 300000 350000 400000 UMINHO UEVORA UCOIMBRA views downloads
  • 18.
    OR2016 Conference –Dublin – June 2016 Preliminary Results Metadata Views – Downloads Duplicate Information 0 5 10 15 20 25 30 Duplicate Articles Views Downloads UMINHO downloads UEVORA downloads UMINHO views UEVORA views
  • 19.
    OR2016 Conference –Dublin – June 2016 Outline • Introduction • Methodology • Pilots & Preliminary Results • Conclusions & The Future
  • 20.
    OR2016 Conference –Dublin – June 2016 Conclusions ✓ Usage Analysis in OpenAIRE ✓ Methodology ✓ Pilot Results ✓ Challenges ✓ Better handling of bots tracking and “gaming” activity in usage data ✓ Tackling of direct downloads
  • 21.
    OR2016 Conference –Dublin – June 2016 The Future ✓Collaboration with National Open Access Desks (NOADs) for usage service dissemination ✓Beta release in 2016 ✓Production release in 2017.
  • 22.
    OR2016 Conference –Dublin – June 2016 http://openaire.eu @openaire_eu https://www.facebook.com/groups/openaire/ https://www.linkedin.com/groups/3893548/profile [email protected]