The Data Management
     Ecosystem
              4 April 2013

University of California Curation Center
       California Digital Library
The research data problem

• Journal article               • Research data
  – Uniquely and persistently     – Nope
    identified
  – Concept of “publish”          – Not really

  – Multiple copies               – Typically one

  – Easily findable               – Difficult

  – Services: impact                – Nope
    metrics, citation
    tracking, etc.
                    Research data is seen as a second-
                   class citizen in the scholarly record.
An ecosystem of inter-dependent partners
 Besides data repository and publisher partners...
 • researchers
 • educators
 • citizen science groups
 • funders
 • tenure and promotion committees


  Libraries as neutral connection partners
Where can libraries make a difference?
     Research & Scholarship Lifecycle
               Research


      Save                   Collect
                 Create
               Knowledge

       Share               Publish
Collect > Publish > Share > Save > Research

 Create, edit, share, and save data
                management plans

  Open source curation add-in for
                 Microsoft Excel

       Capture today’s web; build
             tomorrow’s archives
Collect >Publish > Share > Save > Research

     Create and manage persistent
        identifiers: ARKs, DOIs, etc.


An infrastructure to publish and get
    credit for sharing research data
Collect > Publish >Share > Save > Research

                Curation repository:
store, manage, preserve, and share
                       research data
        Open deposit, open access
    repository for spreadsheet data

Data Observation Network for Earth
Collect > Publish > Share > Save >Research

What’s missing to complete the “incentive” circuit?
• Impact measures, citation tracking

    “Connecting the data to the
           research it informs”


Altmetrics tools to measure non-
   traditional products and uses    ,           , etc.
Stable storage: Merritt repository
          • Curation repository open to the UC
            community and beyond
          • Discipline / content agnostic
          • Micro-services architecture
          • Easy-to-use UI or API
          • Hosted or locally deployed
EZID: Long term identifiers made easy
• Precise identification of a
  dataset (DOI or ARK)
• Credit to data producers and
  data publishers
• A link from the traditional
  literature to the data (DataCite)
• Exposure and research metrics
  for datasets
  (Web of Knowledge, Google)

                                      Take control of the
                                      management and distribution
                                      of your research, share and get
                                      credit for it, and build your
                                      reputation through its collection
                                      and documentation
Discovery: DataCiteconsortium
•   Technische Informationsbibliothek         •   Canada Institute for Scientific and
    (TIB), Germany                                Technical Information (CISTI)
                                              •   L’Institut de l’Information Scientifique
•   Australian National Data Service (ANDS)
                                                  et Technique (INIST), France
•   The British Library
                                              •   Library or the ETH Zürich
•   California Digital Library, USA           •   Library of TU Delft, The Netherlands
                                              •   Office of Scientific and Technical
                                                  Information, US Department of Energy
                                              •   Purdue University, USA
                                              •   Technical Information Center of
                                                  Denmark
New distributed framework
    Coordinating Nodes       Flexible, scalable, sustainabl
       Member Nodes
• retain complete metadata
                                       e network
• catalog institutions
   diverse
• subset of all data
• serve local community
• perform basic indexing
• provide resources for
• provide network-wide
managing their data
  services
• ensure data availability
  (preservation)
• provide replication
  services
The rest of the story


        www.cdlib.org/uc3


      John.Kunze@ucop.edu
uc3@ucop.edu for service questions

The Data Management Ecosystem

  • 1.
    The Data Management Ecosystem 4 April 2013 University of California Curation Center California Digital Library
  • 2.
    The research dataproblem • Journal article • Research data – Uniquely and persistently – Nope identified – Concept of “publish” – Not really – Multiple copies – Typically one – Easily findable – Difficult – Services: impact – Nope metrics, citation tracking, etc. Research data is seen as a second- class citizen in the scholarly record.
  • 3.
    An ecosystem ofinter-dependent partners Besides data repository and publisher partners... • researchers • educators • citizen science groups • funders • tenure and promotion committees Libraries as neutral connection partners
  • 4.
    Where can librariesmake a difference? Research & Scholarship Lifecycle Research Save Collect Create Knowledge Share Publish
  • 5.
    Collect > Publish> Share > Save > Research Create, edit, share, and save data management plans Open source curation add-in for Microsoft Excel Capture today’s web; build tomorrow’s archives
  • 6.
    Collect >Publish >Share > Save > Research Create and manage persistent identifiers: ARKs, DOIs, etc. An infrastructure to publish and get credit for sharing research data
  • 7.
    Collect > Publish>Share > Save > Research Curation repository: store, manage, preserve, and share research data Open deposit, open access repository for spreadsheet data Data Observation Network for Earth
  • 8.
    Collect > Publish> Share > Save >Research What’s missing to complete the “incentive” circuit? • Impact measures, citation tracking “Connecting the data to the research it informs” Altmetrics tools to measure non- traditional products and uses , , etc.
  • 9.
    Stable storage: Merrittrepository • Curation repository open to the UC community and beyond • Discipline / content agnostic • Micro-services architecture • Easy-to-use UI or API • Hosted or locally deployed
  • 10.
    EZID: Long termidentifiers made easy • Precise identification of a dataset (DOI or ARK) • Credit to data producers and data publishers • A link from the traditional literature to the data (DataCite) • Exposure and research metrics for datasets (Web of Knowledge, Google) Take control of the management and distribution of your research, share and get credit for it, and build your reputation through its collection and documentation
  • 11.
    Discovery: DataCiteconsortium • Technische Informationsbibliothek • Canada Institute for Scientific and (TIB), Germany Technical Information (CISTI) • L’Institut de l’Information Scientifique • Australian National Data Service (ANDS) et Technique (INIST), France • The British Library • Library or the ETH Zürich • California Digital Library, USA • Library of TU Delft, The Netherlands • Office of Scientific and Technical Information, US Department of Energy • Purdue University, USA • Technical Information Center of Denmark
  • 12.
    New distributed framework Coordinating Nodes Flexible, scalable, sustainabl Member Nodes • retain complete metadata e network • catalog institutions diverse • subset of all data • serve local community • perform basic indexing • provide resources for • provide network-wide managing their data services • ensure data availability (preservation) • provide replication services
  • 13.
    The rest ofthe story www.cdlib.org/uc3 [email protected] [email protected] for service questions

Editor's Notes

  • #2 Panel: Partnerships between institutional repositories, domain repositories, and publishers20-25 mins, 9:30-11amThe 'data management ecosystem' angle seems appropriate for the panel, but feel free to share some of the technical aspects with the audience, too.partnerships via conventions and APIs. Data Citation conventions, Libraries are chipping away on several fronts to try to shrink this "data curation" problem to a more manageable size, and they are offering a great deal of support for data management planning, data citation, identifier and repository services,repository federation, and “data publication”.
  • #4 Research data can be seen to fit in a kind of ecosystem of inter-dependent stakeholder niches. Each niche depends on other niches.In a broad sense, partnerships are about dependencies. Besides explicit partnerships between publishers and institutional and domain repositories, there are other critical inter-dependencies – essentially implicit partnerships.Libraries as neutral connectors to sub-partners insystem development and collection buildinglinking with museums and archives
  • #6 Development partners:DMPTool: U Va, Smithsonian, DCC, et alDataUp: MSRC, GBMF, D1 WAS: LC, UNT, NYU, et alUser partners (clients, patrons, customers): any
  • #7 Partners: JISC/EDINA, paying customers on two continents
  • #8 D1 network partners all over the world
  • #10 partnering with escholarship and UC campuses for collection building
  • #11 Partnering with JISC/EDINA, DataCite, the Research Data Alliance
  • #12 Each member partners with regional data repositoriesDataCite partners with publishers (eg, T-R) for data citation indexCreditDiscoveryImpact trackingHelping data authors verify use of their data andHelping identify how others have used the dataWith archiving: re-use and reproducibility