DSHR's Blog: institutional repositories

Showing posts with label institutional repositories. Show all posts

Thursday, April 18, 2019

Personal Pods and Fatcat

Sir Tim Berners-Lee's Solid project envisages a decentralized Web in which people control their own data stored in personal "pods":

The basic idea of Solid is that each person would own a Web domain, the "host" part of a set of URLs that they control. These URLs would be served by a "pod", a Web server controlled by the user that implemented a whole set of Web API standards, including authentication and authorization. Browser-side apps would interact with these pods, allowing the user to:

Export a machine-readable profile describing the pod and its capabilities.

Write content for the pod.

Control others access to the content of the pod.

Pods would have inboxes to receive notifications from other pods. So that, for example, if Alice writes a document and Bob writes a comment in his pod that links to it in Alice's pod, a notification appears in the inbox of Alice's pod announcing that event. Alice can then link from the document in her pod to Bob's comment in his pod. In this way, users are in control of their content which, if access is allowed, can be used by Web apps elsewhere.

In his Paul Evan Peters Award Lecture, my friend Herbert Van de Sompel applied this concept to scholarly communication, envisaging a world in which access, for both humans and programs, to all the artifacts of research would be greatly enhanced.

In Herbert's vision, institutions would host their researchers "research pods", which would be part of their personal domain but would have extensions specific to scholarly communication, such as automatic archiving upon publication.

Follow me below the fold for an update to my take on the practical possibilities of Herbert's vision.

The Demise Of The Digital Preservation Network

Now I've had a chance to read the Digital Preservation Network (DPN): Final Report I feel the need to add to my initial reactions in Digital Preservation Network Is No More, which were based on Roger Schonfeld's Why Is the Digital Preservation Network Disbanding?. Below the fold, my second thoughts.

Cloud For Preservation

Imagine you're responsible for preserving the long-established digital collection at a large research or national library. It is currently preserved in home-grown software, or off-the-shelf software that's been extensively customized, that you are responsible for running on hardware run by your institution's IT department. You are probably not a large customer of theirs. They are probably laying down the law, saying "cloud first", especially as you are looking at a looming hardware refresh. Below the fold, I examine a set of issues that need to be clarified in the decision-making process.

Digital Preservation Network Is No More

In Why Is the Digital Preservation Network Disbanding? Roger Schonfeld examines the demise of the Digital Preservation Network which was announced last month:

An initial announcement said directly that "After careful analysis of the Digital Preservation Network's membership, operating model, and finances, the Board of Trustees of DPN passed a resolution to affect an orderly wind-down of DPN," including committing to consultations with each member to ensure that content would not be lost in the wind-down. Shortly thereafter, messages came out from DPN's hubs, both individually including HathiTrust, and collectively, characterizing their operating and financial strength and ability to provide for an orderly transition. Because DPN was not itself directly preserving anything but rather a broker for preservation services by underlying repositories, it does not appear that any content will be put at risk.

Below the fold, I look at various views of the lessons to be learned.

IPRES 2017

Kyoto Railway Museum

Much as I love Kyoto, now that I'm retired with daily grandparent duties (and no-one to subsidize my travel) I couldn't attend iPRES 2017.

I have now managed to scan both the papers, and the very useful "collaborative notes" compiled by Micky Lindlar, Joshua Ng, William Kilbride, Euan Cochrane, Jaye Weatherburn and Rachel Tropea (thanks!). Below the fold I have some notes on the papers that caught my eye.

Sustaining Open Resources

Cambridge University Office of Scholarly Communication's Unlocking Research blog has an interesting trilogy of posts looking at the issue of how open access research resources can be sustained for the long term:

Dr. Lauren Cadwallader's Open Resources, who should pay
David Carr's Sustaining open research resources – a funder perspective
Dave Gerrard's Sustaining long-term access to open research resources – a university library perspective

Below the fold I summarize each of their arguments and make some overall observations.

EU report on Open Access

The EU's ambitious effort to provide immediate open access to scientific publications as the default by 2020 continues with the publication of Towards a competitive and sustainable open access publishing market in Europe, a report commissioned by the OpenAIRE 2020 project. It contains a lot of useful information and analysis, and concludes that:

Without intervention, immediate OA to just half of Europe's scientific publications will not be achieved until 2025 or later.

The report:

considers the economic factors contributing to the current state of the open access publishing market, and evaluates the potential for European policymakers to enhance market competition and sustainability in parallel to increasing access.

Below the fold, some quotes, comments, and an assessment.

The Exception That Proves The Rule

Chris Bourg, who moved from the Stanford Libraries to be library director at MIT, gave a thoughtful talk at Educause entitled Libraries and future of higher education. Below the fold, my thoughts on how it provides the exception that proves the rule I described in Why Did Institutional Repositories Fail?.

Why Did Institutional Repositories Fail?

Richard Poynder has a blogpost introducing a PDF containing a lengthy introduction that expands on the blog post and a Q&A with Cliff Lynch on the history and future of Institutional Repositories (IRs). Richard and Cliff agree that IRs have failed to achieve the hopes that were placed in them at their inception in a 1999 meeting at Santa Fe, NM. But they disagree about what those hopes were. Below the fold, some commentary.

What took so long?

More than ten months ago I wrote Be Careful What You Wish For which, among other topics, discussed the deal between Elsevier and the University of Florida:

And those public-spirited authors who take the trouble to deposit their work in their institution's repository are likely to find that it has been outsourced to, wait for it, Elsevier! The ... University of Florida, is spearheading this surrender to the big publishers.

Only now is the library community starting to notice that this deal is part of a consistent strategy by Elsevier and other major publishers to ensure that they, and only they, control the accessible copies of academic publications. Writing on this recently we have:

Ellen Finnie and Greg Eow from the MIT Library.
The Coalition of Open Access Policy Institutions steering committee.
And Barbara Fister.

Barbara Fister writes:

librarians need to move quickly to collectively fund and/or build serious alternatives to corporate openwashing. It will take our time and money. It will require taking risks. It means educating ourselves about solutions while figuring out how to put our values into practice. It will mean making tradeoffs such as giving up immediate access for a few who might complain loudly about it in order to put real money and time into long-term solutions that may not work the first time around. It means treating equitable access to knowledge as our primary job, not as a frill to be worked on when we aren’t too busy with our “real” work of negotiating licenses, fixing broken link resolvers, and training students in the use of systems that will be unavailable to them once they graduate.

Amen to all that, even if it is 10 months late. If librarians want to stop being Elsevier's minions they need to pay close, timely attention to what Elsevier is doing. Such as buying SSRN. How much would arXiv.org cost them?

Tuesday, June 23, 2015

Future of Research Libraries

Bryan Alexander reports on a talk by Xiaolin Zhang, the head of the National Science Library at the Chinese Academy of Sciences (CAS), on the future of research libraries.

Director Zhang began by surveying the digital landscape, emphasizing the ride of ebooks, digital journals, and machine reading. The CAS decided to embrace the digital-first approach, and canceled all print subscriptions for Chinese-language journals. Anything they don’t own they obtain through consortial relationships ...

This approach works well for a growing proportion of the CAS constituency, which Xiaolin referred to as “Generation Open” or “Generation Digital”. This group benefits from – indeed, expects – a transition from print to open access. For them, and for our presenter, “only ejournals are real journals. Only smartbooks are real books… Print-based communication is a mistake, based on historical practicality.” It’s not just consumers, but also funders who prefer open access.

Below the fold, some thoughts on Director Zhang's vision.

Potemkin Open Access Policies

Last September Cameron Neylon had an important post entitled Policy Design and Implementation Monitoring for Open Access that started:

We know that those Open Access policies that work are the ones that have teeth. Both institutional and funder policies work better when tied to reporting requirements. The success of the University of Liege in filling its repository is in large part due to the fact that works not in the repository do not count for annual reviews. Both the NIH and Wellcome policies have seen substantial jumps in the proportion of articles reaching the repository when grantees final payments or ability to apply for new grants was withheld until issues were corrected.

He points out that:

Monitoring Open Access policy implementation requires three main steps. The steps are:

Identify the set of outputs are to be audited for compliance

Identify accessible copies of the outputs at publisher and/or repository sites

Check whether the accessible copies are compliant with the policy

Each of these steps are difficult or impossible in our current data environment. Each of them could be radically improved with some small steps in policy design and metadata provision, alongside the wider release of data on funded outputs.

He makes three important recommendations:

Identification of Relevant Outputs: Policy design should include mechanisms for identifying and publicly listing outputs that are subject to the policy. The use of community standard persistable and unique identifiers should be strongly recommended. Further work is needed on creating community mechanisms that identify author affiliations and funding sources across the scholarly literature.
Discovery of Accessible Versions: Policy design should express compliance requirements for repositories and journals in terms of metadata standards that enable aggregation and consistent harvesting. The infrastructure to enable this harvesting should be seen as a core part of the public investment in scholarly communications.
Auditing Policy Implementation: Policy requirements should be expressed in terms of metadata requirements that allow for automated implementation monitoring. RIOXX and ALI proposals represent a step towards enabling automated auditing but further work, testing and refinement will be required to make this work at scale.

What he is saying is that defining policies that mandate certain aspects of Web-published materials without mandating that they conform to standards that make them enforceable over the Web is futile. This should be a no-brainer. The idea that, at scale, without funding, conformance will be enforced manually is laughable. The idea that researchers will voluntarily comply when they know that there is no effective enforcement is equally laughable.

Tuesday, April 21, 2015

The Ontario Library Research Cloud

One of the most interesting sessions at the recent CNI was on the Ontario Library Research Cloud (OLRC). It is a collaboration between universities in Ontario to provide a low-cost, distributed, mutually owned private storage cloud with adequate compute capacity for uses such as text-mining. Below the fold, my commentary on their presentations.

The Mystery of the Missing Dataset

I was interviewed for an upcoming news article in Nature about the problem of link rot in scientific publications, based on the recent Klein et al paper in PLoS One. The paper is full of great statistical data but, as would be expected in a scientific paper, lacks the personal stories that would improve a news article.

I mentioned the interview over dinner with my step-daughter, who was featured in the very first post to this blog when she was a grad student. She immediately said that her current work is hamstrung by precisely the kind of link rot Klein et al investigated. She is frustrated because the dataset from a widely cited paper has vanished from the Web. Below the fold, a working post that I will update as the search for this dataset continues.

DSHR's Blog

Thursday, April 18, 2019

Personal Pods and Fatcat

Tuesday, April 16, 2019

The Demise Of The Digital Preservation Network

Thursday, February 7, 2019

Cloud For Preservation

Thursday, January 10, 2019

Digital Preservation Network Is No More

Tuesday, October 10, 2017

IPRES 2017

Tuesday, September 26, 2017

Sustaining Open Resources

Tuesday, March 28, 2017

EU report on Open Access

Tuesday, November 8, 2016

The Exception That Proves The Rule

Tuesday, October 18, 2016