Go To Hellman: RA21

Showing posts with label RA21. Show all posts

Tuesday, December 3, 2019

Your Identity, Your Library

Today, your identity on the Internet is essentially owned by the big email providers and social networks. Google, Yahoo, Facebook, Twitter - chances are you use one of these services to conveniently log into other services as YOU. You don't need to remember a new password for each service, and the service providers don't have to verify your "identity". What you gain in convenience, you lose in privacy, and that's turned out really well, hasn't it?

The "flow" you use to take advantage of this single sign-in is a "dance" that takes you from website to website and back to the site you're logging into. A similar dance occurs to secure access to resources licensed on you behalf by libraries, institutions, corporations, etc.. I wrote a bunch of articles about "RA21" (now rebranded as the vaguely NSFW "SeamlessAccess"), an effort spearheaded by STM publishers to improve the user experience of that dance. (It can be complicated and confusing because there are lots of potential dance partners!)

Henri Matisse, La danse (first version) 1909

These dance partners style themselves as "identity providers". That label makes me uncomfortable. Identity can't be something that can be stripped from you by on the whim of a megacorporation. Instead, internet identity should be woven from a web of relationships. These can be formed digitally or face-to-face, global or local, business or personal.

You'd have thunk that the whole identity-on-the-internet thing would have improved in the 13 years since that login dance was first rolled out. And you'd be almost right, because a new architecture for internet identity is now on the horizon. Made possible by many of the same technologies that are securing the internet and inflating the blockchain bubble, massively distributed and even "self-sovereign identity" are becoming real-ish.

These technologies will inevitably be applied to the access authorization problem. Access via distributed identity replaces the website-to-website dance with the presentation of some sort of signed credential. A service provider verifies the signature against the signer's public key. It's like showing a passport that can't be forged. A tricky bit is that the credential also needs to be checked against a list of revoked credentials. This would have been cumbersome even ten years ago, but distributed databases are now a mature technology, versions of which underpin the internet itself.

Interlinked with the concept of distributed identity is the notion that users of the web should be able to securely control their data, and that decisions about what a web site gets to know about you should not be delegated to advertising networks.

Unfortunately, we're not quite ready for distributed identity, in the sense that implementation for today's web would require users to install plugin software, which has its own set of usability, privacy and security issues. The ideal situation would be for some sort of standardized distributed identity and secure data management capability to be installed in browser software - Chrome, Firefox, Safari, etc.

There's a lot of work going on to make this happen.

ID2020 has put out an identity manifesto that starts with the declaration that "The ability to prove one’s identity is a fundamental and universal human right."
Tim Berners-Lee is leading the Solid Project, which let's you "move freely between services, reuse data across apps, connect with anyone, and select what you share precisely".
The W3C Verifiable Claims Working Group has published Technical Recommendations for "Verifiable Credential Use Cases and a "Verifiable Credential Data Model". They observe that "from educational records to payment account access, the next generation of web applications will authorize entities to perform actions based on rich sets of credentials issued by trusted parties."
The Sovrin Network is a "new standard for digital identity – designed to bring the trust, personal control, and ease-of-use of analog IDs – like driver’s licenses and ID cards – to the Internet."
Kaliya Young, Doc Searls and Phil Windley have been convening the Internet Identity Workshop twice a year since 2005 to create a community centered around internet identity. A glance at prior year proceedings gives a flavor of how much is happening in the field

The common thread here is that users, not unaccountable third parties, should be able to manage their identity on the internet, while at the same time creating a global chain of trust.

It seems to me that there's a last-mile problem with all these schemes. If identity is really a universal human right, how do we create a chain of trust that can include every human? That problem becomes a lot easier to solve if there were some sort of organization with a physical presence in communities all over, trusted by the community and by other organizations. A sort of institution experienced in managing information access and privacy, and devoted to the needs of all sorts of users.

In other words, what if "libraries" existed?

The federated authentications systems used by libraries today - Shibboleth, Athens, and related systems use a dance similar to what you do with Google or Facebook. It's a big step that moves your internet identity away from "surveillance capitalists" towards community institutions. But you still don't have control over what data your institution give away, as you will in the next-generation internet identity systems I describe here. (RA21 is no different from Shib or Athens in this respect.)

What might libraries do to prepare for the age of distributed identity? The first step is not about technology, it's about mission. I believe libraries should start to think of themselves as internet relationship providers for their communities. When I get access to a resource though my library, I won't be "logging in", I'll be asserting a relationship with a library community, and the library will be standing behind me. Joining an identity federation is a good next step for libraries. But the library community needs to advocate for user identity as a basic human right and prepare their systems to support a future where no dancing is required.

Update 12/5/2019: revised last two paragraphs to be less mystifying.

Thursday, May 30, 2019

Responding to Critical Reviews

The first scientific paper I published was submitted to Physical Review B, the world's leading scientific journal in condensed matter physics. Mailing in the manuscript felt like sending my soul into a black hole, except not even Hawking radiation would came back. A seemingly favorable review returned a miraculous two months later:

"I found this paper interesting, and I think it probably eventually it should be published - but only after Section II is revamped and section III clarified."

I made a few minor revisions and added some computations that had been left out of the first version, then confidently resubmitted the paper. But another two months later, I received the second review. The referee hadn't appreciated that I had deflected the review's description of "fundamental logic flaws and careless errors" that made my paper "extremely confusing". The reviewer went on to say "I do not think the authors' new variational calculation is correct" and suggested that my approach was completely wrong.

My thesis advisor suggested that I go and talk to Bob Laughlin in the Physics department about how to deal with the stubborn referee. I had been collaborating with Bob and one of his students on a related project, and he had become a surrogate advisor for my theoretical endeavors. During that time, Bob had acquired a reputation among my fellow students for asking merciless questions at oral exams; many of us were scared of him.

Bob's lesson on how to deal with a difficult referee turned out to be one of the most useful things I learned in grad school. Referees, he told me, come in 2 varieties, complete idiots, and not-complete-idiots. (Yes, Bob was merciless.) If your referee is a complete idiot, all you can do is ask for a different referee. If your referee has the least bit of sense, then you have to take the attitude that either the referee is somewhat correct, and you think YES-SIR MISTER REFEREE SIR! (Bob had been in the Army) and do whatever the referee says to do, or you take the point of view that you have explained something so poorly that the referee, who is an excellent representative of your target audience, had no hope of understanding it. Either way, there was a lot of work to do. We decided that this referee was not an idiot, and I needed to go back to the drawing board and re-do my calculation, figuring out how to be clearer and more correct in my exposition.

A third review came back with the lovely phrase "The significance of the calculation of section II, which is neither fish nor fowl, remains unclear." Using Bob's not-idiot rule, I recognized that my explanation was still unclear and I worked even harder to improve the paper.

My third revised version was accepted and published. Bob later won the Nobel Prize. I'm here writing blog posts for you about RA21.

RA21 received 120 mostly critical reviews from a cross-section of referees, not a single one of whom is the least bit an idiot. Roughly half the issues fell into the badly-explained category, while the other half fell in the "fundamental flaws and careless errors" category. RA21 needs to go back to the chalkboard and rethink even their starting assumptions before they can move forward with this much-needed effort.

Friday, May 17, 2019

RA21: Technology is not the problem.

RA21 vows to "improve access to institutionally-provided information resources". The barriers to access are primarily related to the authorization of such access in the context of licensing agreements. In a perfect world, trust and consensus between licensors and licensing communities would render authorization technology irrelevant. In the real world, technological controls need to build upon good-faith agreements and the consent of community members. Also in the real world, poorly implemented technology erodes that good-faith and consent.

The RA21 draft recommended practice focuses on technology and technology implementations, all the while failing to consider how to build the trust that underpins good-faith and consent. Service providers need to trust that identity providers faithfully facilitate authorized users and that the communities that identity providers serve will adhere to licensing agreements; users of information resources need to trust that their usage data will not be tracked and sold to the highest bidder.

Trust is not created out of thin air and certainly not by software. Technology can provide tools that facilitate trust, but shared values and communication between parties is the raw material of trust. An effective program to improve access must include processes and procedures that develop shared values and promote cooperation.

I recognize that RA21 has chosen to consider only the authentication intercourse as in-scope. But the draft recommendation has identified several areas of "further work". Included in this further work should be areas where community standards and best practices can enhance trust around authentication and authorization. To name two examples:

A set of best practices around "incident response" would in practice work much better than a "guiding principle" of "end-to-end traceability".
A set of best practices around auditing of security and privacy procedures and technology at service providers and identity providers would materially address the privacy and security concerns that the draft recommendation punts over to cited reports and studies.

This is the fifth and last of my comments submitted as part of the NISO standards process. The 102+ comments that have been submitted so far represent a great deal of expertise and real-world experience. My previous comments were about secure communication channels, potential phishing attacks, the incompatibility of the recommended technical approach with privacy-enhancing browser features, and the need for radical inclusiveness. I've posted the comments here so you can easily comment.

Update July 22, 2019:

RA21's official response to this comment is:

We agree that technology is not the primary problem. There are two core issues that RA21 is seeking to address - firstly the current user experience of federated authentication needs to be improved, and this comprises the bulk of our recommendations. Secondly, considerable trust has been established between identity providers and service providers through their mutual particpation in identity federations and we are recommending broader particpation in identity federations where they do not exist. The understanding and acceptance of this trust model is not universal among all stakeholder groups particularly withing IdP organisations and through ongoing dialog and outreach during the implementation phase, RA21 hopes to address this deficit. Finally, we have added a section to the recommendations addressing security incident response and adoption of an operational security baseline by particpants.

OK.

Monday, May 13, 2019

RA21 doesn't address the yet-another-WAYF problem. Radical inclusiveness would.

The fundamental problem with standards is captured by XKCD 927.

XKCD https://xkcd.com/927/

Single sign-on systems have the same problem. The only way for a single sign-on system to deliver a seamless user experience is to be backed by a federated identity system that encompasses all use cases. For RA-21 to be the single button that works for everyone, it must be radically inclusive. It must accommodate a wide variety of communities and use cases.

Unfortunately, the draft recommended practice betrays no self-awareness about this problem. Mostly, it assumes that there will be a single "access through your institution" button. While it is certainly true that end-users have more success when presented with a primary access method, it's not addressed how RA-21 might reach that state.

Articulating a radical inclusiveness principle would put the goal of single-button access within reach. Radical inclusiveness means bringing IP-based authentication, anonymous access, and access for walk-ins into the RA-21 tent. Meanwhile the usability and adoption of of SAML-based systems would be improved; service providers who require "end-to-end traceability" could achieve this in the context of their customer agreements; it needn't be a requirement for the system as a whole.

Radical inclusiveness would also broaden the user base and thus financial support for the system as a whole. We can't expect a 100,000 student university library in China to have the same requirements or capabilities as a small hospital in New Jersey or a multinational pharmaceutical company in Switzerland, even though all three might need access to the same research article.

This is my fourth comment on the RA-21 draft "Recommended Practices for Improved Access toInstitutionally-Provided Information Resources". The official comment period ends Friday. This comment, 57 others, and the add-comment form can be read here. My comments so far are about secure communication channels, potential phishing attacks, and the incompatibility of the recommended technical approach with privacy-enhancing browser features. I'm posting the comments here so you can easily comment. I'll have one more comment, and then a general summary.

Update July 10, 2019:

RA21's official response to this comment is:

RA21 envisages supporting the anonymous and walk-in use cases via federated authentication. It is anticpated that federated authentication and IP authentication will exist side-by-side during a transition phase. The specifics of the User Experience during the transition phase will need to be determined during implementation; however it is likely that the RA21 button will simply not need to be displayed to users who are IP authenticated.

I suppose self-awareness was a big ask. The revised recommendation includes some "envisaging" of use cases that was glaring by omission in the draft recommendation. The added section 2.1.1., Employ appropriate authentication mechanisms for specific use cases, is an improvement on the draft; but the revised recommendation has not retreated from its end-to-end traceability "guiding principle".

RA21 used the same response for a comment by Ohio State's, Jennifer Vinopal:

I want to reiterate a point that a number of commenters have already mentioned: there is no discussion of how public or walk-in (or other unauthenticated/unauthenticating) users will get access to resources through RA21. Public libraries, as well as many college and research libraries, negotiate our e-resource licenses to provide access to walk-in users who aren?t represented in our IdM systems.

Don't forget, EZProxy was supposed to be a transition phase!

Monday, May 6, 2019

RA21 Draft RP session timeout recommendation considered harmful

Hey everybody, I implemented RA21 for access to the blog!

Well, that was fun.

I'm contributing comments about the recently published NISO draft "Recommended Practice" (RP) on "Improved Access to Institutionally-Provided Information Resources" a. k. a. "Resource Access in the 21st Century" (RA21). Official comments can be submitted until May 17th. The draft has much to recommend it, but it appears to have flaws that could impair the success of the effort. My first comment concerned the use of secure communication channels. I expect to write two more. I'm posting the comments here so you can easily comment.

RA21 Draft RP session timeout recommendation considered harmful

RA21 hopes to implement a user authentication environment which allows seamless single sign-on to a large number of service provider websites. Essential to RA21's vision is to replace a hodge-podge of implementations with a uniform, easily recognizable user interface.

While a uniform sign-in flow will be a huge benefit to end users, it introduces an increased vulnerability to an increasingly common type of compromise, credential phishing. A credential phishing attack exploits learned user behavior by presenting the user with a fraudulent interface cloned from a legitimate service. The unsuspecting user enters credentials into the fraudulent website without ever being aware of the credential theft. RA21 greatly reduces the difficulty of a phishing attack in three ways:

Users will learn and use the same sign-in flow for many, perhaps hundreds, of websites. Most users will occasionally encounter the RA21 login on websites they have never used before.
The uniform visual appearance of the sign-in button and identity provider selection step will be trivial to copy. Similarly, a user's previously selected identity provider will often be easy for an attacker to guess, based on the user's IP address.
If successful, RA21 may be used by millions of authorized users, making it difficult to detect unauthorized use of stolen credentials.

If users are trained to enter password credentials even once per day, they are unlikely to notice when they are asked for identity provider credentials by a website crafted to mimic a real identity provider.

For this very reason, websites commonly used for third party logins, such as Google and Facebook, use timeouts much longer than the 24 hour timeouts recommended by the RA21 draft RP. To combat credential theft, they add tools such as multi-factor authentication and insert identity challenges based on factors such as user behavior and the number of devices used by an account.

Identity providers participating in RA21 need to be encouraged to adopt these and other anti-phishing security measures; the RA21 draft's recommended identity provider session timeout (section 2.7) is not in alignment with these measures and is thus counterproductive. Instead, the RP should encourage long identity provider session timeouts, advanced authentication methods, and should clearly note the hazard of phishing attacks on the system. Long-lived sessions will result in better user experience and promote systemic security. While the RP cites default values used in Shibboleth, there is no published evidence that these parameters have suppressed credential theft; the need for RA21 suggests that the resulting user experience has been far from "seamless".

Update July 3, 2019:

RA21's official response to this comment is:

We disagree with premise that consumer websites adopt long sign-in timeouts as a Phishing protection measure. That said, IdPs should follow best practices such as HTTPS so users can verify that they are on a valid sign in page. Length of validity of sign-in is also by necessity context dependent.

Well, yeah. I wasn't expecting them to actually consult real people who battle identity theft on consumer websites. I was mostly amazed that sign-in timeouts would be considered in-scope for RA21 while HTTPS, which will be essential to RA21's success or failure, was not. But the RA21 recommendation will have no effect whatsoever on what identity providers do, unless perhaps existing identity providers are making timeouts ridiculously short. Identity providers know their context much better than any committee and they will do what they want to do. And they should!

Interestingly, a section (2.8. Establish Security Incident Reporting Frameworks) has been added to the revised recommendation that acknowledges credential phishing as a motivation for RA21! So, yay RA21!

Sunday, May 5, 2019

RA21 RP does not require secure protocols. It should.

As I've written, "RA21" could be a Good Thing, or it could be a disaster. The RA21 working group has released its "Recommended Practice" draft for comments, until May 17. The draft has much to like, but also has significant flaws. I will be contributing comments to address the flaws I see, which I will also publish here so we can discuss and comment. My official comments, and many others worth reading are here.

Here's my first comment, perhaps the most predictable:

RA21 RP does not require secure protocols. It should.

RA21 envisions the creation of a widely deployed authentication and authorization for resources and tools serving the research community. In such an ecosystem, the health and security of the entire system can be degraded by a small number of weak implementations. In particular, delivering resources over insecure unencrypted channels will be harmful.

In this context it is surprising that the RA21 recommended practice (RP) fails to directly address the need for service providers and identity providers to use secure channels such as HTTPS for websites. The recommended practice makes indirect reference to this need by citing another document, "WAYF Cloud and P3W Security & Privacy Recommendations". This document fails to treat secure channels as a requirement, saying in analyzing the pilot implementations (italics added):

"All browser traffic should use secured protocols, such as https, to prevent unauthorized access and to preserve confidentiality." (WAYF cloud, page 13)

"All browser traffic should use secured protocols such as https to prevent unauthorized access and to preserve confidentiality." (P3W, page 18)

In contrast to the "should" used for secure communications, the analysis uses the stronger "must" in other places, for example,

"Therefore, applications must include strong controls to prevent user ID tampering and abuse "(Information Disclosure, page 7)

Security and privacy issues essential to the success of RA21 should not be buried in technical analyses of uncertain normativity. Secure channels should not be optional, they must be required.

Update July 2, 2019:

RA21's official response to this comment is

We agree that HTTPS everywhere is a good idea for tools and resources serving the research community. However, a specific recommendation on this would be outside of the scope of RA21.

This response strikes me as uninformed, considering that the recommendation promotes a technical solution that will likely require publishers to adopt HTTPS. Either the committee is unaware of the technical ramifications of their recommendations (very likely), or they're trying to hide from the publishing community the inconvenient fact that RA21 will require all of them to go HTTPS (I wish).

Really, all I was hoping for some bland indication that RA21 will not compromise system privacy and security to accommodate the laggards of the service provider community. Since that didn't happen, I'll do some shouting here:

ATTENTION PUBLISHERS! RA21 WON'T TELL YOU IT'S GOING TO REQUIRE HTTPS ON YOUR SITE. I JUST DID.

Friday, May 18, 2018

The Shocking Truth About RA21: It's Made of People!

Useful Utilities
logo from 2004

When librarian (and programmer) Chris Zagar wrote a modest URL-rewriting program almost 20 years ago, he expected the little IP authentication utility would be useful to libraries for a few years and would be quickly obsoleted by more sophisticated and powerful access technologies like Shibboleth. He started selling his program to other libraries for a pittance, naming this business "Useful Utilities", fully expecting that it would not disrupt his chosen profession of librarianship.

He was wrong. IP address authentication and EZProxy, now owned and managed by OCLC, are still the access management mainstays for libraries in the age of the internet. IP authentication allows for seamless access to licensed resources on a campus, while EZProxy allows off-campus users to log in just once to get similar access. Meanwhile, Shibboleth, OpenAthens and similar solutions remain feature-rich systems with clunky UIs and little mainstream adoption outside big rich publishers, big rich universities and the UK, even as more distributed identity technologies such as OAuth and OpenID have become ubiquitous thanks to Google, Facebook, Twitter etc.

from My Book House, Vol. I: In the Nursery, p. 197.

So how long will the little engines that could keep chugging? Not long, if the folks at RA21 have their way. Here are some reasons why the EZProxy/IP authentication stack needs replacement:

IP authentication imposes significant administrative burdens on both libraries and publishers. On the library side, EZProxy servers need a configuration file that knows about every publisher supplying the library. It contains details about the publisher's website that the publisher itself is often unaware of! On the publisher side, every customer's IP address range must be accounted for and updated whenever changes occur. Fortunately, this administrative burden scales with the size of the publisher and the library, so small publishers and small institutions can (and do) implement IP authentication with minimal cost. (For example, I wrote a Django module that does it.)
IP Addresses are losing their grounding in physical locations. As IP address space fills up, access at institutions increasingly uses dynamic IP addresses in local, non-public networks. Cloud access points and VPN tunnels are now common. This has caused publishers to blame IP address authentication for unauthorized use of licensed resources, such as that by Sci-Hub. IP address authentication will most likely get leakier and leakier.
~~Men~~ Monsters in the middle are dangerous, and the web is becoming less tolerant of them. EZProxy acts as a "~~Man~~ Monitor in the Middle", intercepting web traffic and inserting content (rewritten links) into the stream. This is what spies and hackers do, and unfortunately the threat environment has become increasingly hostile. In response, publishers that care about user privacy and security have implemented website encryption (HTTPS) so that users can be sure that the content they see is the content they were sent.

In this environment, EZProxy represents an increasingly attractive target for hackers. A compromised EZProxy server could be a potent attack vector into the systems of every user of a library's resources. We've been lucky that (as far as is known) EZProxy is not widely used as a platform for system compromise, probably because other targets are softer.

Looking into the future, it's important to note that new web browser APIs, such as service workers, are requiring secure channels. As publishers begin to make use these API's, it's likely that EZProxy's rewriting will unrepairably break new features.

So RA21 is an effort to replace IP authentication with something better. Unfortunately, the discussions around RA21 have been muddled because it's being approached as if RA21 is a product design, complete with use cases, technology pilots, and abstract specifications. But really, RA21 isn't a technology, or a product. It's a relationship that's being negotiated.

What does it mean that RA21 is a relationship? At its core, the authentication function is an expression of trust between publishers, libraries and users. Publishers need to trust libraries to "authenticate" the users for whom the content is licensed. Libraries need to trust users that the content won't be used in violation of their licenses. So for example, users are trusted keep their passwords secret. Publishers also have obligations in the relationship, but the trust expressed by IP authentication flows entirely in one direction.

I believe that IP Authentication and EZProxy have hung around so long because they have accurately represented the bilateral, asymmetric relationships of trust between users, libraries, and publishers. Shibboleth and its kin imperfectly insert faceless "Federations" into this relationship while introducing considerable cost and inconvenience.

What's happening is that publishers are losing trust in libraries' ability to secure IP addresses. This is straining and changing the relationship between libraries and publishers. The erosion of trust is justified, if perhaps ill-informed. RA21 will succeed only if creates and embodies a new trust relationship between libraries, publishers, and their users. Where RA21 fails, solutions from Google/Twitter/Facebook will succeed. Or, heaven help us, Snapchat.

Whatever RA21 turns out to be, it will add capability to the user authentication environment. IP authentication won't go away quickly - in fact the shortest path to RA21 adoption is to slide it in as a layer on top of EZProxy's IP authentication. But capability can be good or bad for parties in a relationship. An RA21 beholden to publishers alone will inevitably be used for their advantage. For libraries concerned with privacy, the scariest prospect is that publishers could require personal information as a condition for access. Libraries don't trust that publishers won't violate user privacy, nor should they, considering how most of their websites are rife with advertising trackers.

It needn't be that way. RA21 can succeed by aligning its mission with that of libraries and earning their trust. It can start by equalizing representation on its steering committee between libraries and publishers (currently there are 3 libraries, 9 publishers, and 5 other organizations represented; all three of the co-chairs represent STEM publishers.) The current representation of libraries omits large swaths of libraries needing licensed resources. MIT, with its ~~Class A~~ huge IP address block, has little in common with my public library, the local hospital, or our community colleges. RA21 has no representation of Asia, Africa, or South America, even on the so-called "outreach" committee. The infrastructure that RA21 ushers in could exert a great deal of power; it will need to do so wisely for all to benefit.

To learn more...

Aaron Tay has written a very good overview of the technology and issues surrounding authentication.
The RA21 website news page has a list of RA21 posts.

Thanks to Lisa Hinchliffe and Andromeda Yelton for very helpful background.

Would you let your kids see an RA21 movie?
_______________

Update 5/17/2019: A year later, the situation is about the same.

Go To Hellman

Tuesday, December 3, 2019

Your Identity, Your Library

Thursday, May 30, 2019

Responding to Critical Reviews

Friday, May 17, 2019