Most generative-AI tools — ChatGPT, Claude, Gemini — excel when the data is already containerized. Give them a spreadsheet, a protocol, a dataset, and they analyze brilliantly. But in R&D, most knowledge isn’t in a dataset. It lives in PDFs, methods sections, toxicology tables, ELNs, and internal reports. Here, generic LLMs fail. They can’t index full-text papers, they can’t retrieve evidence across 200-page documents, and they can’t generate traceable citations reliably. That’s why purpose-built scientific retrieval and reasoning systems matter. Real research requires auditability, grounding, and governed multi-agent reasoning — not just chat.
110% agree Yiannis. Have wrestled with this challenge at some of the biggest STM providers and data aggregators. Glad you are raising awareness (and going after it!).
Absolutely, context and traceability are everything in R&D.
Context optical compression like deepseek ocr could be a game changer on this domain https://arxiv.org/html/2510.18234v1
Senior Director of Portfolio Marketing | Elsevier Life Sciences
3dCompletely agree, this really highlights the core challenge of applying AI to science. Most generative models perform well when information is already structured but scientific knowledge rarely comes that way. In research, insight hides in unstructured evidence: experimental methods, results tables, and full-text papers that demand both context and traceability. That’s why the real opportunity isn’t in general chat models, but in domain-specialized systems that can understand, retrieve, and reason across the scientific record, while providing transparent citations and grounding in source data. That’s why we at Elsevier are focused on exactly that — building AI agents that work with scientific evidence, not around it. Our goal is to help researchers ask complex questions and receive auditable, evidence-based responses that support real decision-making in R&D.