The People Behind
Research Software
crediting from the informatics,
technical point of view
Professor Carole Goble,
University of Manchester, UK
Software Sustainability Institute UK
ELIXIR, ISBE, FAIRDOM
Views are my own
Science Europe LEGS Committee: Career Pathways in Multidisciplinary Research:
How to Assess the Contributions of Single Authors in Large Teams, 1-2 Dec 2015, Brussels.
Team Science: Ego-System
• Experimental scientists
• Theoretical scientists
• Modellers
• Social scientists
• Computer scientists
• Computational Scientists
• Scientific informaticians
• Specialist Tool developers
• Research Software Engineers
• Data engineers and curators
• Service & resource providers
• Infrastructure developers
• System Administrators
Many software, services
and public data resources
are team based
collaborations
Service vs Science in Projects
teams within teams
Biologists
Software frameworks
Tools, Infrastructure
Data platforms
Public data archives
Bioinformaticians
Comp Biologists
Local data curators
Informatics contribution to team
Reputation, Recognition, Productivity, Respect
Contribution to the informatics
– Technical publications in their own right
– Software publications: citation proxies
• Fosselise snapshot of authors as
contributors
– Specific code and curation tracking
– Usage metrics (downloads, reuse)
– Comp Sci - Conferences matter
– IMPACT
Compound, collaborative, living nature of
data and software
Acknowledgement by research teams
– “We are not the janitors” It’s not “free”.
– The Craftsmen of Science
– Careers, credibility and sustainability
– Recognised career role of Research Software
Engineer and BioCurator
– Recognition of professionalism, software and
data quality.
– Reward for LABOUR.
Informatics contribution to team
Reputation, Recognition, Productivity, Respect
*Survey of researchers from 15 UK Russell Group universities conducted by SSI between August - October
2014. 406 respondents covering representative range of funders, discipline and seniority.
Credit
Biologists
Bioinformaticians
Cite
Local tool providers
Public data set providers
Service vs Science
Background vs Foreground
Data [and software] in
foreground most likely cited.
Same data [and software] viewed
as background not / explicitly
cited though equally essential
Wynholds, et al (2012) Data, data use, and scientific inquiry: two case studies of data practices
10.1145/2232817.2232822
25% Publications that used
the public Arrayexpress
Archive cited it*
The invisibility of software
esp software that is widely
used, infrastructural,
components or cross-discipline
*Rung, Brazma Reuse of public wide gene expression data Nature Review Genetics 2012
What is a Team? Credit drift
Immediate
team
Background
team
“Foreground”
informatics
Authorship Authorship?
Cited?
Acknowledged
Cited?
Mentioned
Ignored
“Background”
informatics
Cited
The Currency of Recognition
Person Career
Peers
Funders
Institutions
Public
Resource Sustainability
Software mentions in the
biology literature (90 articles)
Howison and Bullard 2015 The visibility of software in the scientific literature: how
do scientists mention software and how effective are those mentions? J Assoc for
Info Science and Technology DOI: 10.1002/asi.23538
37% citations formal
87% software could be found
informal mentions very common
-> poor at providing crediting information
18% software author offered preferred citation
-> 32% who cited it ignored it
24% journals had a citation policy Legal License
attribution
obligations
ignored
Team reciprocity rules
Download and Go. No.
Jam for Everyone.
sciencecodemanifesto.org
1. Software and Data Research Objects
into the Publishing Workflow
informal
mentions
replaced by
formal
http://ivory.idyll.org/blog/2015-authorship-on-software-papers.html
*http://arxiv.org/pdf/1407.5117v3.pdf
• Research Object-specific credit models
– Software, data, models….
– Credit based on use: downloads, reusability, reuse, FAIR
• Contribution: Credit distribution, propagation, dividends
– Transitive credit maps (Katz and Smith)* , CReDIT**
• Use: Credit trajectories: tracing, tracking, mining
– Recovery from literature, identifier and provenance infrastructure,
standards, data/software level metrics services (Datacite),
repositories, machine readable and processable metadata.
3. Credit
networks &
credit currency
**http://casrai.org/CRediT
http://depsy.org/
2. Stop conflating credit with
Authorship
Contribution
Roles
Usage
Liz Allen: CreDiT
4. Research units and credit models
that reflect software
Not Publish. Release paradigm. Portfolio paradigm.
Jennifer Schopf,Treating Data Like Software: A Case for Production Quality Data,JCDL 2012
Evolving Multi-stewarded
Multi-authored
Multi-platform
Reproducible
Executable papers
Connected
Body of work
Compound, Aggregated
https://dx.doi.org/10.1111/febs.13237
https://doi.org/10.15490/seek.1.investigation.56
http://www.fair-dom.org
28/01/2016 22
An “evolving manuscript” would begin with a pre-
publication, pre-peer review “beta 0.9” version of an
article, followed by the approved published article itself, [
… ] “version 1.0”.
Subsequently, scientists would update this paper with
details of further work as the area of research develops.
Versions 2.0 and 3.0 might allow for the “accretion of
confirmation [and] reputation”.
Ottoline Leyser […] assessment criteria in science revolve
around the individual. “People have stopped thinking
about the scientific enterprise”.
http://www.timeshighereducation.co.uk/news/evolving-manuscripts-the-future-of-scientific-communication/2020200.article
Ramps vs Revolutions
Technical ramps
• Machinery, tools, platforms,
repositories
Process ramps
• Research processes and
Publisher workflows
Social ramps
• Rules and policies
• Adoption by stakeholders
– interventions & automations
• Recognition by stakeholders
Credit is like love not money
Citations and across discipline boundaries.
Within discipline more like dividends.
All research products and all scholarly
labour are equally valued
(except by institutional promotion,
funding review and REF committees)
Public software and data resources
are not free.
Stewardship costs and needs crediting
Publishers adapt to “Publications”
that are dynamic Research Objects
(still need to snapshot)
http://www.software.ac.uk/software-credi
https://www.force11.org/group/software-citation-working-group
Links
• FAIRDOM
– http://www.fair-dom.org
• SEEK Platform
– http://www.seek4science.org
• Research Objects
– http://www.researchobject.org
• Software Sustainability Institute
– http://www.software.ac.uk
• Software Carpentry
– http://www.software-carpentry.org
• Force11
– http://www.force11.org

Crediting informatics and data folks in life science teams

  • 1.
    The People Behind ResearchSoftware crediting from the informatics, technical point of view Professor Carole Goble, University of Manchester, UK Software Sustainability Institute UK ELIXIR, ISBE, FAIRDOM Views are my own Science Europe LEGS Committee: Career Pathways in Multidisciplinary Research: How to Assess the Contributions of Single Authors in Large Teams, 1-2 Dec 2015, Brussels.
  • 2.
    Team Science: Ego-System •Experimental scientists • Theoretical scientists • Modellers • Social scientists • Computer scientists • Computational Scientists • Scientific informaticians • Specialist Tool developers • Research Software Engineers • Data engineers and curators • Service & resource providers • Infrastructure developers • System Administrators Many software, services and public data resources are team based collaborations
  • 3.
    Service vs Sciencein Projects teams within teams Biologists Software frameworks Tools, Infrastructure Data platforms Public data archives Bioinformaticians Comp Biologists Local data curators
  • 4.
    Informatics contribution toteam Reputation, Recognition, Productivity, Respect Contribution to the informatics – Technical publications in their own right – Software publications: citation proxies • Fosselise snapshot of authors as contributors – Specific code and curation tracking – Usage metrics (downloads, reuse) – Comp Sci - Conferences matter – IMPACT
  • 5.
    Compound, collaborative, livingnature of data and software
  • 6.
    Acknowledgement by researchteams – “We are not the janitors” It’s not “free”. – The Craftsmen of Science – Careers, credibility and sustainability – Recognised career role of Research Software Engineer and BioCurator – Recognition of professionalism, software and data quality. – Reward for LABOUR. Informatics contribution to team Reputation, Recognition, Productivity, Respect
  • 7.
    *Survey of researchersfrom 15 UK Russell Group universities conducted by SSI between August - October 2014. 406 respondents covering representative range of funders, discipline and seniority.
  • 9.
  • 10.
    Service vs Science Backgroundvs Foreground Data [and software] in foreground most likely cited. Same data [and software] viewed as background not / explicitly cited though equally essential Wynholds, et al (2012) Data, data use, and scientific inquiry: two case studies of data practices 10.1145/2232817.2232822 25% Publications that used the public Arrayexpress Archive cited it* The invisibility of software esp software that is widely used, infrastructural, components or cross-discipline *Rung, Brazma Reuse of public wide gene expression data Nature Review Genetics 2012
  • 11.
    What is aTeam? Credit drift Immediate team Background team “Foreground” informatics Authorship Authorship? Cited? Acknowledged Cited? Mentioned Ignored “Background” informatics Cited
  • 12.
    The Currency ofRecognition Person Career Peers Funders Institutions Public Resource Sustainability
  • 13.
    Software mentions inthe biology literature (90 articles) Howison and Bullard 2015 The visibility of software in the scientific literature: how do scientists mention software and how effective are those mentions? J Assoc for Info Science and Technology DOI: 10.1002/asi.23538 37% citations formal 87% software could be found informal mentions very common -> poor at providing crediting information 18% software author offered preferred citation -> 32% who cited it ignored it 24% journals had a citation policy Legal License attribution obligations ignored
  • 14.
    Team reciprocity rules Downloadand Go. No. Jam for Everyone.
  • 15.
  • 16.
    1. Software andData Research Objects into the Publishing Workflow informal mentions replaced by formal
  • 17.
  • 18.
    *http://arxiv.org/pdf/1407.5117v3.pdf • Research Object-specificcredit models – Software, data, models…. – Credit based on use: downloads, reusability, reuse, FAIR • Contribution: Credit distribution, propagation, dividends – Transitive credit maps (Katz and Smith)* , CReDIT** • Use: Credit trajectories: tracing, tracking, mining – Recovery from literature, identifier and provenance infrastructure, standards, data/software level metrics services (Datacite), repositories, machine readable and processable metadata. 3. Credit networks & credit currency **http://casrai.org/CRediT http://depsy.org/
  • 19.
    2. Stop conflatingcredit with Authorship Contribution Roles Usage Liz Allen: CreDiT
  • 20.
    4. Research unitsand credit models that reflect software Not Publish. Release paradigm. Portfolio paradigm. Jennifer Schopf,Treating Data Like Software: A Case for Production Quality Data,JCDL 2012 Evolving Multi-stewarded Multi-authored Multi-platform Reproducible Executable papers Connected Body of work Compound, Aggregated
  • 21.
  • 22.
    28/01/2016 22 An “evolvingmanuscript” would begin with a pre- publication, pre-peer review “beta 0.9” version of an article, followed by the approved published article itself, [ … ] “version 1.0”. Subsequently, scientists would update this paper with details of further work as the area of research develops. Versions 2.0 and 3.0 might allow for the “accretion of confirmation [and] reputation”. Ottoline Leyser […] assessment criteria in science revolve around the individual. “People have stopped thinking about the scientific enterprise”. http://www.timeshighereducation.co.uk/news/evolving-manuscripts-the-future-of-scientific-communication/2020200.article
  • 23.
    Ramps vs Revolutions Technicalramps • Machinery, tools, platforms, repositories Process ramps • Research processes and Publisher workflows Social ramps • Rules and policies • Adoption by stakeholders – interventions & automations • Recognition by stakeholders Credit is like love not money Citations and across discipline boundaries. Within discipline more like dividends. All research products and all scholarly labour are equally valued (except by institutional promotion, funding review and REF committees) Public software and data resources are not free. Stewardship costs and needs crediting Publishers adapt to “Publications” that are dynamic Research Objects (still need to snapshot)
  • 24.
  • 25.
  • 26.
    Links • FAIRDOM – http://www.fair-dom.org •SEEK Platform – http://www.seek4science.org • Research Objects – http://www.researchobject.org • Software Sustainability Institute – http://www.software.ac.uk • Software Carpentry – http://www.software-carpentry.org • Force11 – http://www.force11.org