From quality checks to detecting research misconduct to matching manuscripts with expert reviewers, AI (artificial intelligence) is reshaping how we approach peer review. As AI technologies become more sophisticated, governance and policy evolve, and human understanding deepens, how can we harness AI’s potential while preserving the human judgment so critical to peer review?

To start, where is AI currently being used in peer review?

AI is already making an impact on the peer review process across several areas for many publishers. We’re seeing AI tools used for technical and integrity checks, language enhancement to peer review reports, reviewer suggestions, and even summarization and literature recommendations to help editors and reviewers quickly grasp a manuscript’s key points. These aren’t futuristic concepts — they’re in production today and being integrated into editorial workflows at scale.

The greatest potential lies in publishing papers both better and faster. Peer review today faces two big challenges: the rise of research misconduct and the difficulty of finding qualified reviewers quickly. AI is proving valuable in both areas.

Logo image for Peer Review Week 15 - 19 September 2025.

AI-powered integrity tools have become widely adopted across the publishing industry, with more than 50 vendors now offering specialized services. These tools excel at detecting patterns that would be nearly impossible for humans to spot manually. One example is Research Exchange, Wiley’s comprehensive scholarly publishing platform, which automatically flags suspicious patterns to editors for review (full disclosure, I am employed by Wiley). At the same time, AI can also power the reviewer invitation process — helping to identify qualified reviewers from vast reviewer databases and improve diversity and quality of reviewer pools. Image manipulation detection represents another area where AI provides clear value. While editors might miss subtle alterations with the naked eye, AI can identify suspicious modifications and duplication, even across publications, in seconds. Similarly, AI can screen thousands of manuscripts for plagiarism, data anomalies, reference quality, or detect anomalies in references and author networks in minutes — a task that would take human reviewers far longer to complete.

However, while AI capabilities are improving rapidly, significant limitations remain. AI still struggles in areas that require human judgment, such as:

  • Nuance and context: AI may miss subtle but critical distinctions that human experts readily identify and can have big impacts for research. For instance, determining whether a methodological choice is truly innovative or potentially problematic requires deep domain knowledge and critical thinking that current AI systems cannot fully replicate. AI also lacks the lived experience, world knowledge, and domain-specific perspective necessary to judge whether a study is genuinely novel or ethical, or how it fits into broader scholarly debates.
  • Ethical and cultural considerations: Responsible use of sensitive data and ethical consent require cultural awareness, ethical reasoning, and application of professional guidelines — areas where AI is not equipped to assist.
  • The ‘black box’ problem: AI models often operate as ‘black boxes,’ or closed systems, making it difficult to understand how they arrive at their recommendations. Editors and reviewers need to understand not just what AI recommends, but why — and current systems often fall short in providing meaningful explanations.

Several additional risks accompany AI applications for peer review:

  • Bias reinforcement: If AI is trained on biased data, training data and opaque algorithms can unintentionally reinforce historical inequalities — for example, under-recommending early-career researchers or scholars from underrepresented regions.
  • Under- and over-reliance: Reviewers or editors may place too much trust in AI outputs, potentially weakening editorial oversight, critical thinking, and human judgment. On the other hand, some may dislike and/or mistrust AI and rely solely on human oversight, underusing tools that could save time and flag issues unapparent to the human eye — such as readability, quality, and research completeness.
  • Hesitation leads to a lack of transparency: A great deal of misunderstanding, and even fear, around AI in research still exists for researchers and reviewers. For example, Cactus Communications’ recent survey found that many researchers still equate the use of AI with misconduct, which creates hesitation and prevents transparency.
  • Manipulation vulnerabilities: Generative AI introduces new risks, such as hidden prompt injection attacks, where adversarial text is embedded to manipulate or influence LLM-assisted review tools.
  • Beyond AI tools: AI is only one piece of the puzzle. Even the most advanced tools won’t perfect the peer review process; real impact also requires good user experience, clear guidance, effective training, and strong customer support.

AI or Human Peer Review?

So, given the ongoing risks and challenges involved with AI and peer review, will it ever replace the human element? At its core, peer review is about judgment and accountability — and those remain uniquely human. AI can provide powerful assistance, but it cannot be held responsible for decisions, nor does it have the legal or ethical standing to take ownership of outcomes, such as assigning copyright or bearing accountability for errors.

Equally important, humans still bring the critical thinking, contextual awareness, nuance, and deep subject-matter expertise that AI has yet to match. AI is improving rapidly, but it’s best viewed today as a smart assistant — excellent at the heavy lifting, like technique checks, flagging out-of-scope submissions, helping to draft papers and reports, flagging potential misconduct, or recommending suitable reviewers.

Recent survey data also supports a cautious approach. According to Wiley’s 2024 global ExplanAItions survey of nearly 5,000 scholars, researchers currently prefer human judgment over AI for four out of five peer review-related tasks. This reinforces the view that AI is best positioned as a powerful assistant while humans provide nuance, ethical reasoning, and contextual expertise.

Implementing AI responsibly: A suggested guide for editors and journals

For those editors and journals looking to incorporate AI into their peer review workflows, success depends on thoughtful implementation that addresses both opportunities and risks. I’d like to highlight five key areas:

  1. Be vigilant about AI-generated content. AI can generate misinformation (false or inaccurate outputs due to errors or limitations in training data), hallucinations (contrived information), and disinformation (deliberately misleading content) — because large language models generate responses based on patterns, not facts.
  2. Protect copyright and confidentiality. Editors and reviewers must consider whether institutional policies allow the use of unpublished manuscripts in AI tools. Uploading confidential work without safeguards could violate trust and privacy. Publishers should provide guidance on the privacy and security risks of the use of AI tools.
  3. Judge the research, not the tool. A paper should be accepted or rejected based on quality, methodology, soundness, novelty, and originality — not simply whether AI was involved.
  4. Build on the pillars of responsible adoption (as outlined in my previous post for The Scholarly Kitchen):
  • Trust: AI must be transparent, explainable, and consistent to earn the confidence of the community. Humans should also develop a good understanding of AI so that they will know when and where its use is acceptable.
  • Collaboration: We should view AI as a partner, not a replacement. Future peer review strategy and process design should take into consideration AI and humans as collaborators.
  • Governance: Strong policies, oversight, and clear boundaries are critical to ensure AI operates within its intended scope — and not beyond. AI use should align with guidelines and policies at the national and international level, as well as relevant institutional or industry standards.
  1. Provide more training and support for AI literacy. In the same Wiley survey, approximately 70% of study participants are looking to publishers to guide them on the safe and responsible use of AI. For both editors and reviewers, training should go beyond how to use AI to address when to use it, and how to use it responsibly. That means understanding AI’s limitations, setting the right expectations, and recognizing that while AI isn’t perfect, it’s a valuable tool. Its value depends heavily on how and where it’s applied.

Training for researchers should also emphasize how to interpret and assess AI outputs, as well as what feedback to provide to improve AI. For evaluating and adopting AI tools, I find the L-O-C-A-D framework provides a useful structure:

  • Limitations: Understanding what tools can and cannot do
  • Ownership: Considering copyright and intellectual property implications of input, output, and training data
  • Confidentiality: Avoiding unintentionally disclosing sensitive information, such as manuscript content, data, or prompts, when using AI tools.
  • Accuracy: Assessing output reliability and taking responsibility for results
  • Disclosure: Following institutional and publisher guidelines for transparency and application.

The importance of disclosure

Transparency around AI use is becoming increasingly important — not just for authors but also for reviewers and editors. The movement toward disclosure reflects a growing recognition that hiding AI use creates distrust, while transparency builds confidence.

Frameworks are emerging to support this cultural shift. The COPE AI guidelines, the STM’s GenAI guidelines, European Association of Science Editors (EASE) AI guidelines, and, most recently, the European Commission’s Living Guidelines on the Responsible Use of Generative AI in Research (published in March) are all examples of how our community is codifying best practices.

In the recent COPE AI Forum, one of the key themes was moving beyond simply detecting AI-generated text toward verifying compliance with AI-use disclosures and editorial standards. Many publishers are now publishing detailed guidelines on AI use — not only for authors, but increasingly for editors and reviewers as well.

Looking ahead: The future of AI in peer review

From a technology perspective, next-generation AI systems promise improvements in deep reasoning (GPT-5 thinking, Gemini 2.5 Pro-Deep thinking, Claude Opus 4), multimodal analysis (Google Veo3, MS Voice-1, OpenAI Sora), and agency (Manus, OpenAI Codex). These advances may enable AI to better assess novelty, soundness, reproducibility, and much richer research work, while automatically handling a high volume of routine functions, as well as complex review tasks.

Almost all LLM providers use deep research models (such as OpenAI DeepResearch, Google Gemini DeepResearch, and Claude Research). These will also help editors and reviewers more completely and quickly understand topics and research areas so that they can provide a fairer and accurate review.

From a product perspective, there are about 50 AI-powered integrity detection products and 20 dedicated AI-powered peer review products on the market. Integrity remains a major part of peer review. Many integrity detection products are shifting from single detection to multi-signal detection by integrating tools from other vendors to provide a more complete and accurate check. Most integrity detection products will properly converge in terms of capability in the future. Seamless integration and distribution, human- and machine-friendly design, and earning user trust will be key to the success of these products.

Looking ahead, AI is evolving beyond integrity checks and reviewer matching. Many editorial checks and analyses — such as scope and relevance checks, quality and reproducibility checks, citation analysis, and statistical and methodological reviews — will increasingly be applied earlier at the submission stage, enabled by automation and AI. Providing this kind of early feedback will not only help researchers strengthen their work sooner but also streamline the process and reduce the burden on editors and reviewers later. AI will also be used to support editors more directly, for example, by inviting relevant authors to submit papers and helping editors identify gaps and emerging topics to expand or create new journals. It will also be used to automate and carry out initial and in-depth reviews by following specific journal or domain requirements/guidelines such as CONSORT and PRISMA.

We’re also seeing the emergence of AI review assistants — tools such as xPeerd, WorldBrain Scholar’s Eliza, Hum’s Alchemist Review, Cactus Paperpal Review, and Enago & Charlesworth’s Review Assistant— that can help reviewers structure their reports, highlight key strengths and weaknesses, assess methodologies and novelty, or even help editors interpret reviewer feedback. I also anticipate that AI agents will become widely applied in peer review workflow to automate routine and operational work, such as project planning, resource allocation/assignment, communication management, and reviewer coordination, to reduce the manual workload.

For researchers, AI capabilities are moving from AI discovery to AI assistant to AI scientist. In the future, researchers could have their own personal assistant to help them not only conduct the research, but also provide an initial review of research outputs, like a digital twin based on their styles, expertise, and preferences.

The success of AI adoption in peer review depends on how we design and responsibly use it, who we include in the process, and both where and when we use it — all underpinned by transparency and open communication. If we get this right, AI can genuinely strengthen peer review. But we must stay grounded in the values of research integrity and community trust.

The future of peer review isn’t about choosing between humans and AI, or between speed and quality, but about combining the strengths of both to enable speed with quality. AI can filter, flag, and surface insights; humans provide the judgment, accountability, and contextual understanding that ensure quality, ethics, and trust in the scholarly record.

Hong Zhou

Hong Zhou

Hong is a product and AI innovation leader in scholarly publishing and former Senior Director of AI Product & Innovation at Wiley, where he set AI vision and strategy, drive the roadmap, and delivered award-winning services that automate research and publishing workflows. He helped define Wiley’s AI ethics principles, drove innovation for the Wiley Research Exchange platform and Atypon experience platform, and led creation of Wiley’s first AI-driven papermill detection tool—now a leading integrity solution. A former head of AI R&D, he built collaborations with top labs worldwide and earned multiple AI awards, including an honorable mention for the 2024 APE Award for Innovation. He holds a PhD in 3D Modeling with AI (Aberystwyth) and an MBA in Digital Transformation (Oxford). His personal research passion is supporting publishers’ success in their transition to Open Science and helping global researchers know more, do more, and achieve more in the era of digital transformation. Hong is widely published on computer science and AI topics and presents regularly at prominent industry events. He also serves as a COPE Advisor, Scholarly Kitchen Chef, Co-Chair of ALPSP’s AI Special Interest Group, member of STM’s Future Lab, and Distinguished Expert at China’s National Key Laboratory of Knowledge Mining for Medical Journals

Discussion