SlideShare a Scribd company logo
Cloud	
  Dataverse	
  
Mercè	
  Crosas	
  (IQSS,	
  Harvard	
  University)	
  
with	
  Orran	
  Krieger,	
  Piyanai	
  SaowaraBtada,	
  Ata	
  Turk,	
  Anuj	
  Thakur,	
  
Gustavo	
  Durand,	
  Leonid	
  Andreev	
  
MassachuseIs	
  Open	
  Cloud	
  (MOC)Workshop,	
  December	
  6	
  -­‐7,	
  2016,	
  Boston	
  University	
  
Dataverse	
  IncenUvizes	
  Data	
  Sharing	
  
•  A	
  widely	
  used	
  open-­‐source	
  plaWorm	
  for	
  building	
  
data	
  repositories	
  	
  
•  Gives	
  essenUal	
  incenUves	
  to	
  data	
  authors:	
  
– get	
  aIribuUon	
  and	
  credit	
  through	
  data	
  citaUon	
  
– retain	
  control	
  over	
  data	
  published	
  in	
  the	
  repository	
  
•  Fosters	
  a	
  community	
  to:	
  
– build	
  new	
  standards	
  and	
  best	
  pracUces	
  
– increase	
  research	
  in	
  data	
  sharing	
  
Dataverse	
  repositories	
  are	
  installed	
  	
  
in	
  21	
  sites	
  around	
  the	
  world	
  
Harvard	
  Dataverse	
  repository:	
  
•  63,000	
  datasets;	
  12	
  new	
  datasets	
  
published	
  per	
  day	
  
•  2	
  Million	
  data	
  downloads;	
  1,500	
  
downloads	
  per	
  day	
  
•  15,000	
  registered	
  users	
  
•  5000	
  data	
  authors	
  from	
  500	
  
insUtuUons	
  
Data	
  depositor	
  
Data	
  users	
  
Metadata	
  
Data	
  files	
  
Data	
  +	
  metadata	
  
Access	
  object	
  in	
  Swi`	
  +	
  	
  
Compute	
  with	
  Sahara/Hadoop	
  
download	
  
Swi`	
  
Object	
  
Store	
  
Dataverse	
  Now	
  	
  	
   with	
  Cloud	
  Dataverse	
  
Repository	
  
Publish	
  dataset	
  
Data	
  
Replica3on	
  
Cloud	
  Dataverse	
  will	
  be	
  available	
  to	
  any	
  
Dataverse	
  repository	
  
Each	
  Dataverse	
  repository	
  
can	
  choose	
  to	
  enable	
  the	
  
Cloud	
  Dataverse	
  opUon	
  
Dataverse	
  +	
  MOC	
  are	
  being	
  expanded	
  
to	
  support	
  Cloud	
  Dataverse	
  
2016	
  Summer	
  Pilot:	
  
ü 	
  Dataverse	
  supports	
  an	
  external	
  object	
  store	
  
ü 	
  Data	
  are	
  replicated	
  from	
  a	
  Dataverse	
  repo	
  to	
  MOC	
  
	
  
Working	
  on:	
  
q 	
  From	
  Dataverse,	
  user	
  can	
  access	
  object	
  in	
  Swi`/S3	
  
+	
  compute	
  with	
  Sahara	
  and	
  Hadoop	
  
q 	
  Single	
  authenUcaUon	
  to	
  data	
  access	
  and	
  compute	
  
Cloud	
  Access	
  +	
  
Compute	
  
This	
  dataset	
  has	
  been	
  
enabled	
  in	
  Cloud	
  Dataverse	
  

More Related Content

PDF
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
Microsoft Azure for Research
 
PDF
Accelerating your Research with Microsoft Azure (June 2015)
Microsoft Azure for Research
 
PPTX
A4 r overview deck_1.7
Microsoft Azure for Research
 
PDF
Reproducible Research and the Cloud
Microsoft Azure for Research
 
PPTX
RDA-WDS Publishing Data Interest Group
Anita de Waard
 
PPTX
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Anita de Waard
 
PDF
Doing Research in the Cloud - NIH Workshop Dennis Gannon
Microsoft Azure for Research
 
PDF
Accelerating your research with Microsoft Azure
Microsoft Azure for Research
 
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
Microsoft Azure for Research
 
Accelerating your Research with Microsoft Azure (June 2015)
Microsoft Azure for Research
 
A4 r overview deck_1.7
Microsoft Azure for Research
 
Reproducible Research and the Cloud
Microsoft Azure for Research
 
RDA-WDS Publishing Data Interest Group
Anita de Waard
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Anita de Waard
 
Doing Research in the Cloud - NIH Workshop Dennis Gannon
Microsoft Azure for Research
 
Accelerating your research with Microsoft Azure
Microsoft Azure for Research
 

What's hot (20)

PPTX
WP4: overzicht van de voortgang van WP4 op de CLARIAH-dag 22 januari 2016
CLARIAH
 
PPTX
CYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURE
Matthew Vaughn
 
PDF
WP3: overzicht van de voortgang van WP# op de CLARIAH-dag
CLARIAH
 
PDF
Towards embedded Markup of Learning Resources on the Web
Stefan Dietze
 
PPTX
How Cyverse.org enables scalable data discoverability and re-use
Matthew Vaughn
 
PDF
ieee cloud 2015 keynote talk
Microsoft Azure for Research
 
PDF
Smith - Developing Campus Stakeholders' Collaborations - Sept 8
National Information Standards Organization (NISO)
 
PPT
Seeking serendipity
Andrew Treloar
 
PPTX
Open Science Data Cloud (IEEE Cloud 2011)
Robert Grossman
 
PDF
Academic SEO, or: How do I get my research to show up in search engines and d...
Open Knowledge Maps
 
PDF
Mendeley Open Repositories 2011 Paper
William Gunn
 
PPTX
Datashare cni spring2013
rizkjackson
 
PPTX
Sept 24 NISO Virtual Conference: Library Data in the Cloud
National Information Standards Organization (NISO)
 
PPTX
Observations on a whole lot of Things learned through the 23 (research data) ...
ARDC
 
PDF
balloon: LOD forecasting - cloudy with a chance of services
Kai Schlegel
 
PPTX
UK Research Data Discovery Service metadata schema
Jisc RDM
 
PPTX
Provenance in Support of the ANDS Four Transformations
Andrew Treloar
 
PPTX
ANDS Applications Program: Building Tools to Facilitate Data Reuse
Andrew Treloar
 
PPTX
Research Automation for Data-Driven Discovery
Globus
 
PDF
Dataset Citation and Identification
guest453b14
 
WP4: overzicht van de voortgang van WP4 op de CLARIAH-dag 22 januari 2016
CLARIAH
 
CYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURE
Matthew Vaughn
 
WP3: overzicht van de voortgang van WP# op de CLARIAH-dag
CLARIAH
 
Towards embedded Markup of Learning Resources on the Web
Stefan Dietze
 
How Cyverse.org enables scalable data discoverability and re-use
Matthew Vaughn
 
ieee cloud 2015 keynote talk
Microsoft Azure for Research
 
Smith - Developing Campus Stakeholders' Collaborations - Sept 8
National Information Standards Organization (NISO)
 
Seeking serendipity
Andrew Treloar
 
Open Science Data Cloud (IEEE Cloud 2011)
Robert Grossman
 
Academic SEO, or: How do I get my research to show up in search engines and d...
Open Knowledge Maps
 
Mendeley Open Repositories 2011 Paper
William Gunn
 
Datashare cni spring2013
rizkjackson
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
National Information Standards Organization (NISO)
 
Observations on a whole lot of Things learned through the 23 (research data) ...
ARDC
 
balloon: LOD forecasting - cloudy with a chance of services
Kai Schlegel
 
UK Research Data Discovery Service metadata schema
Jisc RDM
 
Provenance in Support of the ANDS Four Transformations
Andrew Treloar
 
ANDS Applications Program: Building Tools to Facilitate Data Reuse
Andrew Treloar
 
Research Automation for Data-Driven Discovery
Globus
 
Dataset Citation and Identification
guest453b14
 
Ad

Similar to Cloud Dataverse (20)

PPTX
Cloud Dataverse: A Data repository platform for an OpenStack Cloud
Merce Crosas
 
PDF
Dataverse hpdm symposium
Merce Crosas
 
PPTX
Dataverse on the MOC
Merce Crosas
 
PDF
Research software and Dataverse
philipdurbin
 
PDF
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
DeVonne Parks, CEM
 
PDF
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Merce Crosas
 
PPTX
Dataverse for Journals
Merce Crosas
 
PPTX
Data Publishing at Harvard's Research Data Access Symposium
Merce Crosas
 
PDF
Reproducibility and Dataverse
philipdurbin
 
PDF
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Merce Crosas
 
PDF
Dataverse opportunities
vty
 
PPTX
Metaverse for Dataverse
vty
 
PDF
Dataverse, Cloud Dataverse, and DataTags
Merce Crosas
 
PPTX
The Dataverse Commons
Merce Crosas
 
PDF
Dataverse: Helping Researchers Publish Their Data Through Automation
Eleni Castro, MLIS
 
PDF
Dataverse Netowrk Project
Julie Goldman
 
PDF
Surfing the Data Flow
Paul Miller
 
PPTX
Databricks Platform.pptx
Alex Ivy
 
PDF
Processing Drone data @Scale
Dr Hajji Hicham
 
PDF
A Brave new object store world
Effi Ofer
 
Cloud Dataverse: A Data repository platform for an OpenStack Cloud
Merce Crosas
 
Dataverse hpdm symposium
Merce Crosas
 
Dataverse on the MOC
Merce Crosas
 
Research software and Dataverse
philipdurbin
 
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
DeVonne Parks, CEM
 
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Merce Crosas
 
Dataverse for Journals
Merce Crosas
 
Data Publishing at Harvard's Research Data Access Symposium
Merce Crosas
 
Reproducibility and Dataverse
philipdurbin
 
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Merce Crosas
 
Dataverse opportunities
vty
 
Metaverse for Dataverse
vty
 
Dataverse, Cloud Dataverse, and DataTags
Merce Crosas
 
The Dataverse Commons
Merce Crosas
 
Dataverse: Helping Researchers Publish Their Data Through Automation
Eleni Castro, MLIS
 
Dataverse Netowrk Project
Julie Goldman
 
Surfing the Data Flow
Paul Miller
 
Databricks Platform.pptx
Alex Ivy
 
Processing Drone data @Scale
Dr Hajji Hicham
 
A Brave new object store world
Effi Ofer
 
Ad

More from Merce Crosas (17)

PDF
Practical Implementation of research data policies: Solutions with Dataverse
Merce Crosas
 
PDF
Research Data Management @Harvard
Merce Crosas
 
PDF
Can data access combat fake news?
Merce Crosas
 
PDF
Data Repositories Impact
Merce Crosas
 
PDF
FAIR Data Management and FAIR Data Sharing
Merce Crosas
 
PDF
The Data Lifecycle (Harvard DataFest)
Merce Crosas
 
PDF
Making Data Accessible
Merce Crosas
 
PDF
Abcd iqs ssoftware-projects-mercecrosas
Merce Crosas
 
PDF
The DataTags System: Sharing Sensitive Data with Confidence
Merce Crosas
 
PDF
Connecting Dataverse with the Research Life Cycle
Merce Crosas
 
PDF
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...
Merce Crosas
 
PPTX
A very Brief History of Communicating Science
Merce Crosas
 
PDF
Data Citation Implementation at Dataverse
Merce Crosas
 
PDF
Collaboration in science and technology it summit
Merce Crosas
 
PPTX
Collaboration in science and technology
Merce Crosas
 
PPTX
Force11 jddcp intro
Merce Crosas
 
PPTX
The expanding dataverse
Merce Crosas
 
Practical Implementation of research data policies: Solutions with Dataverse
Merce Crosas
 
Research Data Management @Harvard
Merce Crosas
 
Can data access combat fake news?
Merce Crosas
 
Data Repositories Impact
Merce Crosas
 
FAIR Data Management and FAIR Data Sharing
Merce Crosas
 
The Data Lifecycle (Harvard DataFest)
Merce Crosas
 
Making Data Accessible
Merce Crosas
 
Abcd iqs ssoftware-projects-mercecrosas
Merce Crosas
 
The DataTags System: Sharing Sensitive Data with Confidence
Merce Crosas
 
Connecting Dataverse with the Research Life Cycle
Merce Crosas
 
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...
Merce Crosas
 
A very Brief History of Communicating Science
Merce Crosas
 
Data Citation Implementation at Dataverse
Merce Crosas
 
Collaboration in science and technology it summit
Merce Crosas
 
Collaboration in science and technology
Merce Crosas
 
Force11 jddcp intro
Merce Crosas
 
The expanding dataverse
Merce Crosas
 

Recently uploaded (20)

PDF
Practical Measurement Systems Analysis (Gage R&R) for design
Rob Schubert
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
PDF
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PPTX
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PPTX
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
PPTX
Introduction to Data Analytics and Data Science
KavithaCIT
 
PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PPTX
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
PPT
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
PPTX
INFO8116 -Big data architecture and analytics
guddipatel10
 
PPTX
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
Practical Measurement Systems Analysis (Gage R&R) for design
Rob Schubert
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
Introduction to Data Analytics and Data Science
KavithaCIT
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
INFO8116 -Big data architecture and analytics
guddipatel10
 
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 

Cloud Dataverse

  • 1. Cloud  Dataverse   Mercè  Crosas  (IQSS,  Harvard  University)   with  Orran  Krieger,  Piyanai  SaowaraBtada,  Ata  Turk,  Anuj  Thakur,   Gustavo  Durand,  Leonid  Andreev   MassachuseIs  Open  Cloud  (MOC)Workshop,  December  6  -­‐7,  2016,  Boston  University  
  • 2. Dataverse  IncenUvizes  Data  Sharing   •  A  widely  used  open-­‐source  plaWorm  for  building   data  repositories     •  Gives  essenUal  incenUves  to  data  authors:   – get  aIribuUon  and  credit  through  data  citaUon   – retain  control  over  data  published  in  the  repository   •  Fosters  a  community  to:   – build  new  standards  and  best  pracUces   – increase  research  in  data  sharing  
  • 3. Dataverse  repositories  are  installed     in  21  sites  around  the  world   Harvard  Dataverse  repository:   •  63,000  datasets;  12  new  datasets   published  per  day   •  2  Million  data  downloads;  1,500   downloads  per  day   •  15,000  registered  users   •  5000  data  authors  from  500   insUtuUons  
  • 4. Data  depositor   Data  users   Metadata   Data  files   Data  +  metadata   Access  object  in  Swi`  +     Compute  with  Sahara/Hadoop   download   Swi`   Object   Store   Dataverse  Now       with  Cloud  Dataverse   Repository   Publish  dataset   Data   Replica3on  
  • 5. Cloud  Dataverse  will  be  available  to  any   Dataverse  repository   Each  Dataverse  repository   can  choose  to  enable  the   Cloud  Dataverse  opUon  
  • 6. Dataverse  +  MOC  are  being  expanded   to  support  Cloud  Dataverse   2016  Summer  Pilot:   ü   Dataverse  supports  an  external  object  store   ü   Data  are  replicated  from  a  Dataverse  repo  to  MOC     Working  on:   q   From  Dataverse,  user  can  access  object  in  Swi`/S3   +  compute  with  Sahara  and  Hadoop   q   Single  authenUcaUon  to  data  access  and  compute  
  • 7. Cloud  Access  +   Compute   This  dataset  has  been   enabled  in  Cloud  Dataverse