Moving the needle on open data

A new study from Taylor & Francis and DataSeer

Researcher analyzing data on the computer in the laboratory

Publishing open access (OA) increases the visibility and impact of research. It encourages innovation and collaboration and boosts public engagement.

In addition, data sharing enables greater transparency in scholarly communications, increases the reproducibility of experimental results, and hastens the pace of important discoveries.

When data generated as a result of academic research is well managed and shared openly, it enables ideas to flourish, and allows for stronger, more rigorous academic debate.

Seeing the complete picture

Going beyond open access, open research supports the sharing of research methods, activities, and outputs that are not surfaced through the traditional publishing workflow, enhancing the visibility and impact of research and increasing innovation, collaboration, and public engagement.

At Taylor & Francis we support the sharing of data generated during the course of a research project, regardless of discipline, content, or publication format. Our authors may also choose to share code and software connected to their articles, share pre-prints, and submit ORCIDs, and clarify author’s roles using the CRediT taxonomy.

We were interested in learning more about what our authors’ open research publishing activities consisted of – for example, which data repositories do our authors prefer to use? How many publish pre-prints? How many share code data, or include ORCIDs in their submissions?

Author engagement with Open Research exceeds our expectations

The “State of Open Data 2024”, the latest in a series of Open Data surveys from figshare, Digital Science, and Springer Nature, found that open data practices are indeed gaining traction, but incentives and support are needed for further adoption by researchers.

Projects such as the Open Science Monitoring Initiative, which Taylor & Francis participates in, highlight the challenge of tracking and monitoring open research progress over time.

Our analysis of open research activity in Taylor & Francis journals, supported by the AI open research solution provider DataSeer, uncovered much higher adoption than we had anticipated. Announced at the end of 2024, this pilot created an AI-driven landscape analysis of open research indicators (metrics) across a sample of over 8,000 published T&F articles.

This article highlights the key findings of this pilot, led by our Director of Open Science Strategy & Innovation, Rebecca Taylor-Grant, and Implementation Manager, Eilise Norris.

"We know that adoption of open science practice varies widely by discipline, geographic location and even career stage – this new DataSeer dataset allows us to identify trends and open science activity that will help us to better support our authors. Open science has a key role in promoting the integrity, reproducibility and reusability of research and it’s wonderful to see so much evidence of good practice from across our portfolio."

Rebecca Taylor-Grant, Director of Open Science Strategy & Innovation, Taylor & Francis

Black student studying in a traditional university library.

Key findings and stats

Researchers wearing protective gear working on machines in a laboratory
Scientists with ice core samples doing glacial research on Vatnajokul Ice Cap, Iceland
A researcher in a white lab coat in a field of plants

52%

Journal icon with the text "52% of articles published in 2023 included a data availability statement, 1.5 times higher than the level we expected""

Of the articles in the sample included a data availability statement

1.5 times the level we anticipated

1/3 of the authors in the study

Open door icon with the text "1/3 of our authors Choose to share data regardless of the journal’s policy"

Choose to share data regardless of the journal’s policy

U.S. and China

Globe icon with the text "US and China These are the top two countries for data sharing"

These are the top two countries for data sharing

More than half of articles in the sample included a data availability statement

Data availability statements (DAS) are short sections of a research paper which describe the datasets underpinning the study and how data can be accessed by readers.

Depending on the journal and its policy requirements, authors may share their data only on request, or publicly in a data repository, and the data sharing method should be outlined in the DAS.

Based on the requirements of each journal’s data sharing policy, we’d expected to see around a third of articles with a DAS included.

The landscape analysis identified more than half (52%) of articles in the sample included a DAS, far surpassing the policy requirements of the articles in our corpus (more than 1.5 times the level we anticipated). This result is positive – it indicates that authors are motivated to share data by factors other than policy mandates.

Looking at the top five regions for data sharing across the corpus, the highest proportion of openly shared data (that is data shared in repositories and supplementary files) was seen from authors in the US, with China coming a close second.

Learn how to share your data here.

Chart shows figures from the US, China, Canada, the UK and Italy, the 5 regions which demonstrated the highest proportion of data sharing.

Figure 1: Data sharing by country, Taylor & Francis research, 2023

Figure 1: Data sharing by country, Taylor & Francis research, 2023

8% of authors shared code and software openly despite lack of mandates

As code and software policies are less mature and less widely implemented than data sharing policies, we were not sure whether code and software sharing would be evident in our corpus. However, DataSeer found that 18% of the articles in the sample generated new code or software, and of those, 8% of authors chose to share the code or software openly in a repository.

Unsurprisingly, nearly half of the authors sharing software and code were publishing in the computer sciences, with a smaller proportion in Biosciences and Earth Sciences. Unexpectedly, figshare proved to be the most popular location to store data rather than Github (a dedicated repository for code and software sharing). This may be because it is possible to store code in Github and archive snapshots in figshare or Zenodo for the purposes of publication, and our authors are following best practice in doing so.

Learn how to share your research code here.

Chart compares numbers from figshare (54), Github (39), Zenodo (6), OSF (3), Mendeley Data (1) and Other repositories (2)

Figure 2: The repositories which are used most frequently by authors in the corpus to share software and code.

Figure 2: The repositories which are used most frequently by authors in the corpus to share software and code.

"The results of our collaboration with DataSeer allow us much greater insight into the activities of researchers who publish with us. I am delighted to see how positively researchers are embracing open science, over and above the policies we have put in place to set a minimum level of engagement. This insight enables us to ensure our future plans to develop open science at Taylor & Francis are meeting our communities where they are or giving them the encouragement they need."

Matthew Cannon, Associate Director Open Science Programmes, Taylor & Francis

Blind marathon runner with guide

DataSeer identified that 8% of the authors in the sample had published preprints

A pre-print, also known as the Author’s Original Manuscript (AOM), is the version of an article before the researcher has submitted it to a journal for peer review. For many researchers, posting a preprint will be a way to share results quickly and receive useful feedback from colleagues before submitting to a journal for formal peer review.

Preprints increasing post COVID-19

Although there are no numbers showing how many researchers create preprints, there is data showing that preprints have increased since 2019, as researchers aimed to share results rapidly due to COVID-19. One study suggests that releasing a preprint is associated with more attention and citations for the peer-reviewed article.

Authors are encouraged to share preprints ahead of submitting to Taylor & Francis journals, and we know that some authors choose to share their manuscripts this way to provide early access and gain feedback from their communities.

DataSeer identified that 8% of the authors in the sample had published preprints, with the most popular preprint servers being ResearchSquare (149 preprints); bioRxiv (73 preprints) and SSRN (61 preprints).

What are preprint and preprint servers? Find out here.

We were really pleased to work with Taylor & Francis on this project – large publishers cover so many research communities and it was fascinating to see how author behaviours around open science varied across the articles we analyzed. It’s also clear from the data that the efforts T&F have made around open science are paying off, and we’re very interested to see how this progress unfolds in the coming years.



Tim Vines, Founder and CEO, DataSeer

Black student studying in a traditional university library.

Uptake of ORCIDs is nearing universal adoption

Uptake of ORCIDs is nearing universal adoption

ORCIDs are a free, unique, persistent identifier (PID) for individuals to use as they engage in research, scholarship, and innovation activities.

They are used to persistently identify and disambiguate researchers. People can use it to see what else you have published and what other research activities you’ve been up to (such as grant submissions or book chapters), showcasing your professional activities.

While authors can choose to include ORCIDs for themselves and their co-authors, they are not mandatory on most of our 2,700 journals.

Our DataSeer analysis shows us that ORCID use is by our authors is extremely high, however. 98% of our corpus included at least one ORCID (that rose to 99% of our Dove journals!)

Learn more about ORCIDs.

Further reflection

This report highlights that Taylor & Francis authors are engaging significantly with open research practices including open data sharing, code and software sharing, pre-printing, and the use of ORCIDs.

We hypothesized that we might see more engagement in open research activity in some disciplines (for example Earth Sciences and Psychology) and less in others (for example Humanities).

This was a good prediction, because we did find, for example, that 88% of all preprinting activity was from Medicine & Health (53%), Life, Earth & Environmental Sciences (25%) and Physical Sciences and Engineering (10%). Using what we have learned, we will continue to develop discipline-specific support for authors across our journal portfolio.

We are committed to working collaboratively and creatively with the academic community to drive a sustainable shift towards open research.

Having access to this dataset, created by DataSeer, has provided us with new insights into our authors’ open science engagement. We look forward to utilizing this analysis to continue to develop and delivering solutions, approaches, and resources that benefit everyone.