Showing posts with label reporting. Show all posts
Showing posts with label reporting. Show all posts

Monday, 2 October 2023

Spitting out the AI Gobbledegook sandwich: a suggestion for publishers

 


The past couple of years have been momentous for some academic publishers. As documented in a preprint this week, after rapid growth, largely via "special issues" of journals, they have dramatically increased the number of published articles, and at the same time made enormous profits. A recent guest post by Huanzi Zhang, however, showed this has not been without problems. Unscrupulous operators of so-called "papermills" saw an opportunity to boost their own profits by selling authorship slots and then placing fraudulent articles in special issues that were controlled by complicit editors. Gradually, publishers realised they had a problem and started to retract fraudulent articles. To date, Hindawi has retracted over 5000 articles since 2021*.  As described in Huanzi's blogpost, this has made shareholders nervous and dented the profits of parent company Wiley. 

 

There are numerous papermills, and we only know about the less competent ones whose dodgy articles are relatively easy to detect. For a deep dive into papermills in Hindawi journals see this blogpost by the anonymous sleuth Parashorea tomentella.  At least one papermill is the source of a series of articles that follow a template that I have termed the "AI gobbledegook sandwich".  See for instance my comments here on an article that has yet to be retracted. For further examples, search the website PubPeer with the search term "gobbledegook sandwich". 

 

After studying a number of these articles, my impression is that they are created as follows. You start with a genuine article. Most of these look like student projects. The topics are various, but in general they are weak on scientific content. They may be a review of an area, or if data is gathered, it is likely to be some kind of simple survey.  In some cases, reference is made to a public dataset. To create a paper for submission, the following steps are taken:

 

·      The title is changed to include terms that relate to the topic of a special issue, such as "Internet of Things" or "Big data".

·      Phrases are scattered in the Abstract and Introduction mentioning these terms.

·      A technical section is embedded in the middle of the original piece describing the method to be used.  Typically this is full of technical equations. I suspect these are usually correct, in that they use standard formulae from areas such as machine learning, and in some cases can be traced to Wikipedia or another source.  It is not uncommon to see very basic definitions, e.g. formulae for sensitivity and specificity of prediction.

·      A results section is created showing figures that purport to demonstrate how the AI method has been applied to the data. This often reveals that the paper is problematic, as plots are at best unclear and at worst bear no relationship to anything that has gone before.  Labels for figures and axes tend to be vague. A typical claim is that the prediction from the AI model is better than results from other, competing models. It is usually hard to work out what is being predicted from what.

·      The original essay resumes for a Conclusions section, but with a sentence added to say how AI methods have been useful in improving our understanding.

·      An optional additional step is to sprinkle irrelevant citations in the text: we know that papermills collect further income by selling citations, and new papers can act as vehicles for these.


Papermills have got away with this, because the content of these articles is sufficiently technical and complex that the fraud may only be detectable on close reading. Where I am confident there is fraud, I will use the term "Gobbledegook sandwich" in my report on PubPeer, but there are many, many papers where my suspicions are raised but it would take more time than it is worth for me to comb through the article to find compelling evidence.

 

For a papermill, the beauty of the AI gobbledegook sandwich is that you can apply AI methods to almost any topic, and there are so many different algorithms that can be used that there is a potentially infinite number of papers that can be written according to this template.  The ones I have documented include topics ranging from educational methods, hotel management, sports, art, archaeology, Chinese medicine, music, building design, mental health and promotion of Marxist ideology. In none of these papers did the application of AI methods make any sense, and they would not get past a competent editor or reviewers, but once a complicit editor is planted in a journal, they can accept numerous articles. 

 

Recently, Hindawi has ramped up its integrity operations and is employing many more staff to try and shut this particular stable door.  But Hindawi is surely not the only publisher infected by this kind of fraud, and we need a solution that can be used by all journals. My simple suggestion is to focus on prevention rather than cure, by requiring that all articles that report work using AI/ML methods adopt reporting standards that are being developed for machine-learning based science, as described on this website.  This requires computational reproducibility, i.e., data and scripts must be provided so that all results can be reproduced.  This would be a logical impossibility for AI gobbledegook sandwiches.

 

Open science practices were developed with the aim of improving reproducibility and credibility of science, but, as I've argued elsewhere, they could be highly effective in preventing fraud.  Mandating reporting standards could be an important step, which, if accompanied also by open peer review, will make life of the papermillers much harder.



*Source is spreadsheet maintained by the anonymous sleuth Parashorea tomentella

 

N.B. Comments on this blog are moderated, so there may be a delay before they appear. 






Tuesday, 12 April 2022

Book Review. Fiona Fox: Beyond the Hype

If you're a scientist reading this, you may well think, as I used to, that running a Science Media Centre (SMC) would be a worthy but rather dull existence. Surely, it's just a case of getting scientists to explain things clearly in non-technical language to journalists. The fact that the SMC was created in part as a response to the damaging debacle of the MMR scandal might suggest that it would be a straightforward job of providing journalists with input from experts rather than mavericks, and helping them distinguish between the two. 

I now know it's not like that, after being on the Science Media Centre's panel of experts for many years, and having also served on their advisory committee for a few of them. The reality is described in this book by SMC's Director, Fiona Fox, and it's riveting stuff.

In part this is because no science story is simple. People will disagree about the quality of the science, the meaning of the results, and the practical implications. Topics such as climate change, chronic fatigue syndrome/ME and therapeutic cloning elicit highly charged responses from those who are affected by the science. More recently, we have found that when a pandemic descends upon the world, some of the bitterest disagreements are not between scientists and the media, but between well-respected, expert scientists. The idea that scientists can hand down tablets of stone inscribed with the truth to the media is a fiction that is clearly exposed in this book.

Essentially, the SMC might be seen as acting like a therapist in the midst of a seriously dysfunctional family where everyone misunderstands everyone else, and everyone wants different things out of life. On the one hand we have the scientists. They get frustrated because they feel they should be able to make exciting new discoveries, with the media then helping communicate these to the world. Instead, they complain that the media has two responses: either they're not interested in the science, or they want to sensationalise it. If you find a mild impact of grains on sexual behaviour in rats, you'll find it translated into the headline 'Cornflakes make you impotent'.

On the other hand, we have the media. They want a good story, but find that the scientists are reluctant to talk to them, or want total control of how the story is presented. In the worst case, scientists are prima donnas who want days or weeks to prepare for a media interview and will then shower the journalist with detailed information that is incomprehensible, irrelevant, or both. When the public desperately needs a clear, simple message, the scientists will refuse to deliver it, hedging every statement.

Fox has worked over the years to challenge these stereotypes: journalists do want a good story, but the serious science journalists want a true story, and are glad of the opportunity to pose questions directly to scientists. And many scientists do a fantastic job of explaining their subject matter to a non-specialist audience. In the varied chapters of the book, Fox is an irrepressible optimist, who keeps coming back to the importance of having scientists communicating directly with the media. Her optimism is not founded in ignorance: she knows exactly how messy and complicated science can be. But she persists in believing that more good is done by communicating what we know, warts and all, rather than pretending that uncertainties and disagreements do not exist.

The role of the SMC is, however, complicated by further factions. The dramatis personae includes two other groups. First, there are science press officers, who are appointed by institutions to help scientists promote their work, and then there are government officials and civil servants, who are concerned with policy implications of science.

In her penultimate chapter, Fox bemoans the fact that the traditional press officer - passionate about science and viewing themselves as "purveyors of truth and accuracy" - is a dying breed. There remain notable exceptions, but all too often science communication has become conflated with a public relations role: pushing a corporate message, defending the institutional reputation, and even using scientific discoveries as a marketing tool. Fox notes a 2014 survey of exaggerated science reports in the media that concluded: "Exaggeration in news is strongly associated with exaggeration in press releases." I had been one of those scientists who thought the media were mostly to blame for over-hyped science reporting, but this study showed that journalists are often recycling exaggerated accounts handed to them by those speaking for the scientists.

But the problems posed by scientists, journalists and press officers are trivial compared to the obstacles created by those involved in policy. They want to use science when convenient, but also want to exert control over which aspects of science policy gets talked about. Scientists working for government-funded organisations are often muzzled, with explicit instructions not to talk to the media. One can see that this cautious approach, attempting to control the message and keep things simple, puts many civil servants and government scientists on a collision course with Fox, whose view is: "Explaining preliminary and contradictory science is messy: that should not be seen as a failure of communications".

A refreshing aspect of Fox's account is that she does not brush aside the occasions when the SMC - or she personally - may have handled a situation badly. Of course, it's easy to point the finger of blame when something does go horribly wrong, and Fox has come under fire on many occasions. Rather than being defensive, she accepts that things might have been done differently, while at the same time explaining the logic of the decisions that were taken. This is in line with my memories of meetings of the SMC advisory committee, where there were frequent post mortems - "this is how we handled it; this is how it turned out; should we have done it differently?" - with frank discussions from the committee members. When you are working in contentious areas where things are bound to blow up every now and again, this is a sensible strategy that helps the organisation learn and develop. I'm glad that after 20 years, the ethos of the SMC is still very much on the side of open, transparent communication between scientists and the media.  


Fox, Fiona (2022) Beyond the Hype: The Inside Story of Science's Biggest Media Controversies. London: Elliott and Thompson Ltd.


Saturday, 23 January 2021

Time to ditch relative risk in media reports

The Winton Centre for Risk and Evidence Communication at the University of Cambridge has done some sterling work in developing guidelines for communicating risk to the general public. In a short video,  David Spiegelhalter explains how relative risk can be misleading when the baseline for a condition is not reported. For instance, he noted that many women stopped taking a contraceptive pill after hearing media reports that it was associated with a doubling in the rate of thrombo-embolism. In terms of absolute risk the increase sounds much less alarming, going from 1 in 7000 to 2 in 7000. 

One can understand how those who aren't scientifically trained can get this wrong. But we might hope that,  in a pandemic, where public understanding of risk is so crucial, particular care would be taken to be realistic without being alarmist. It was, therefore, depressing to see a report on Channel 4 news last night where two scientists clearly explained the evidence on Covid variants in terms of absolute risk, impeccably reflecting the Winton Centre's advice, only to have the reporters translate the numbers into relative risk. I have transcribed the relevant sections: 

0:44 Reporter: "The latest evidence from the government's advisers is that this new variant is more deadly. And this is what it means:"

Patrick Vallance: "If you took somebody in their sixties, a man in their sixties, the average risk is that for 1000 people who got infected, roughly ten would be expected to unfortunately die with the virus. With the new variant, with 1000 people infected, roughly 13 or 14 people might be expected to die." 

Reporter: "That’s a thirty to forty per cent increase in mortality." 

5:15 Reporter (Krishnan Guru-Murthy): "But this is a high percent of increase, isn't it. Thirty to forty percent increase of mortality, on a relatively small number." 

Neil Ferguson: "Yes. For a 60-year-old at the current time, there's about a one in a hundred risk of dying. So that means 10 in 1000 people who get the infection are likely to die, despite improvements in treatment. And this new variant might take that up to 13 or 14 in a 1000." 

The reporters are likely to stoke anxiety when they translate clear communication by the scientists into something that sounds a lot more scary. I hope this is not their intention: Channel 4 is one of the few news outlets that I regularly watch and in general I find it well-researched and informative. I would urge the reporters to watch the Winton Centre video, which in less than 5 minutes makes a clear, compelling case for dropping relative risk altogether in media reports. 

 

This blogpost has been corrected to remove the name of Anja Popp as first reporter. She confirmed was not in this segment.  My apologies.