In the past two years, there has been an explosion of interest in the possibilities of artificial intelligence tools, such as deep learning methods in scientific research, and — particularly in STM publishing — large language models for automated text generation. There is a growing list of issues that the publishing world is facing, from copyright infringement to the pressure of increased bandwidth on open systems, and from questions of authorship and research integrity to the second-order issues around increasing pace of content generation. The Scholarly Kitchen has covered many of these issues regularly as the community comes to identify this growing list of challenges. This blog alone has published 159 posts that touch on artificial intelligence (AI). At times, it seems the number of challenges is increasing faster than the number of solutions AI tools are solving. Several initiatives are underway to address these challenges. Notably last month, NISO ( National Information Standards Organization — full disclosure, my employer) hosted a series of workshops for scholarly publishing leadership to identify and prioritize efforts to address some of the challenges around AI and interoperability. These collective actions can help reduce the number of issues and we can all benefit from the opportunities AI provides.

The two workshops brought together two dozen leaders from large publishers, aggregators and suppliers in our community. During the meetings, participants brainstormed interoperability concerns and potential standards-related efforts to improve efficient interaction with AI tools and systems for the scholarly community. The purpose of these meetings was to identify key issues facing publishers, synthesize these problems into achievable solutions, and then prioritize a plan of work for NISO to improve efficiency working with AI systems across the industry. The goal of this effort and the resulting report is to define a small set of priority projects that NISO can advance collective action around at a network level, not necessarily at an internal or product level.
A report detailing the output of these workshops was published by NISO this week. The report identified more than two dozen potential projects that could be undertaken to address various issues related to AI tools and systems in our community. The participants also prioritized these ideas, but wider feedback is also being sought. Among the ideas that were highlighted as efforts the community should advance included:
- Usage tracking and auditing to assess impact
The need to understand, measure, and report on the usage of content by AI tools is critical in communicating the value of published content. It was suggested that a COUNTER-like description of assessment and reporting of AI agents and their outputs could be created and these actions tracked, as they are for licensed aggregators of content. - Communicating to technology companies about structure of scientific information
Technology companies should understand scientific publications and the vetting process used to produce various types of output. Use existing structure of scientific content to give it appropriate weight. - Attribution and provenance standards for AI outputs
Participants stressed the importance of recognizing and preserving citation structure in AI-generated responses. Outputs should reflect versioned sources, publishers, and contextual weight (e.g., peer-reviewed vs. preprint). Standards are needed for labeling, attribution granularity, and provenance. - Standardized licensing models/Legal terminology for AI usage
There is a need for shared, flexible licensing templates or consistent contractual terms that define rights for training, grounding, and inference. The development of model licenses might facilitate consistent interpretation, reduce negotiation costs, and help ensure fair compensation. - Transparency and disclosure frameworks
Strong interest was expressed in requiring vendors to disclose model composition and training data origin via standardized “model cards” or tool declarations. Such transparency would enable better assessment of output reliability and bias. - Interoperable metadata and access infrastructure
The need for machine-readable metadata schemas, harmonized APIs, and standard access mechanisms (e.g., for bot traffic) was repeatedly raised. Creating shared metadata vocabularies would enhance cross-platform discoverability and reduce service friction. Standards could be developed around how access to AI bots and AI agents is provisioned such that it does not degrade human user functionality. - Versioning and updating of AI training data
The lack of version tracking in AI ingest pipelines presents risks around retractions, corrections, and evolving knowledge. Participants recommended standardized protocols for data refreshes, retraction alerts, and model audit trails. - AI-enhanced accessibility deployment guidance
There was enthusiasm for collaborative guidelines on using AI to generate accessibility aids (e.g., alt text). Efforts could also include frameworks for auditing AI tools and assessing readiness across platforms and institutions.
Certainly, there were many more ideas generated that could be the focus of interest. All the ideas are covered in the full report. The above only represent a sample of the ideas that could be prioritized by the community highlighted by the participants. Given the modest number of workshop participants, more voices in the prioritization will be critical. We recognize that NISO’s community is significantly broader than the organizations represented by the participants in these two meetings. Input from the NISO leadership, NISO members, along with the wider community is being sought. The broader community is now invited to share their perspectives via a feedback consultation survey form.
Some of these ideas might find homes in other organizations, which is certainly helpful since it is unlikely that NISO has the resources to advance work on all, or even most, of the ideas that were identified. In discussions about this project and the draft outputs with other organizations, several community initiatives with ties to some of the topics advanced are already underway, and it is worth highlighting a few of them.
For several years, COUNTER has been focused on quantifying and reporting on the of machine use of content. COUNTER release 5.1 includes guidance on handling Text and Data Mining data harvesting. Assessing the impact of AI traffic and looking forward the potential usage impact of AI agents in accessing, “reading” and summarizing content, will be an important topic in our community. A focus group specifically organized to explore how AI systems are impacting usage is being organized by COUNTER to explore some of the usage-based topics related to AI.
The STM Association’s Standards and Technology Committee (STeC) is also focusing on several areas of work related to AI systems. Whether it is acceptable for authors to use AI tools in writing scholarly articles has been a topic of much debate. A recent survey in Nature found significant divergence in opinions on AI usage in the publication process, but there was broad agreement about the need for disclosure. This came up in the NISO workshop as an area of need within the publishing community. Fortunately, STEM’s STeC group has already made progress in this area. In April, the group released a draft report, “Classifying AI Use in Manuscript Preparation” recommending ways to describe how authors are using AI tools in the authoring process. The draft classification offers a clear framework to help publishers define, evaluate, and guide the transparent use of AI in manuscript preparation. The STeC group is also exploring other issues around AI and even more related to research integrity.
Related to the topic of AI bots and the strain on open content systems, earlier this year, the Confederation of Open Access Repositories (COAR) undertook a survey of repositories around the world to assess the impact of AI bot traffic on repositories. Unsurprisingly, the report found that “open repositories are being profoundly impacted by AI bots and other crawlers”. In the news release about the survey results, COAR announced that will be launching a “Repositories and AI Bots Task Force” in July of 2025.
Following the open consultation on the report, NISO will be organizing working groups that will develop community consensus solutions to some of these topics. In addition to providing guidance on priorities they may have for efforts to advance; survey respondents may also volunteer to engage in one of these efforts. Ideally, projects will be launched by the fall and will have a one-year development timeline. The environment with AI systems is evolving rapidly and we need to advance solutions as quickly as possible. These workshops and the resultant report should help provide guidance on where the community should focus its efforts.
Discussion
5 Thoughts on "We Need AI Standards for Scholarly Publishing: A NISO Workshop Report"
Thanks for the shout-out, Todd!
COUNTER’s Advisory Committee has generative and agentic AI on the agenda for their meeting at the end of this month. The AC is open to all COUNTER members, and we’ll be following our usual consultative process so non-members can have a say once the best practice guidance is drafted.
Thank you for the update! It’s great to hear that COUNTER’s Advisory Committee is actively addressing generative and agentic AI. Looking forward to seeing how the best practice guidance evolves — especially the opportunities for wider community input during the consultation phase. Transparency and inclusivity in this process will be key as AI continues to reshape content usage and reporting.
The survey has some good points and some alarming ones. Regarding the latter, it seems commercial publishers are eager to join the hype bandwagon and feed scientists’ outputs as training materials, potentially reselling their outputs to AI companies. Nowhere in the survey was there a question about a need to update copyright and publishing agreements. But I suppose it will not be scientists who will get the credit and attribution:
“Fair compensation and credit for content owners in the age of Al.”
Regarding the good points:
“Develop a process for updating training data when content changes (e.g. retractions etc.).”
“Communication (to technology community) about structure of scientific information (e.g. preprints vs peer reviewed article – not all content is equally validated. Use existing structure of scientific content to give it appropriate weight.”
In general, I am a little disappointed by the state of affairs because there would be so many things that could be improved easily with little investments. Take references as an example: why, by 2025, we still do not have a system that automatically checks each and every reference automatically? You wouldn’t really need the “AI” label to do that. And having that functionality would already catch much of the AI slop the hype bandwagon people (and fraudsters) nowadays submit. The slop also feeds into the peer review crisis because the backlogs are filled with increasing amounts of garbage no one is willing to peer review.
This is an incredibly comprehensive overview—thank you for sharing! One question I had while reading: Given the urgent need for interoperability and transparency in AI usage across STM publishing, how can smaller publishers or institutions with limited technical resources effectively participate in these standardization efforts without falling behind larger organizations? Are there specific support mechanisms or collaborative opportunities being discussed to ensure inclusivity in shaping these emerging AI standards?
The growing complexity around AI access, licensing, and content usage brings to mind platforms like Sci-Hub, which have long challenged traditional publishing models by offering free access to scientific papers. While controversial, Sci-Hub highlights the ongoing tension between open access and proprietary control—an issue that AI tools now further complicate. As discussions around AI interoperability and equitable access move forward, will the community also revisit broader questions of accessibility and who gets to benefit from scientific knowledge?