# FAIR Guide

This section provides information and resources to help ensure that digital products are ***F***indable, ***A***ccessible, ***I***nteroperable, ***R***eusable, and Reproducible[^1]. A general [Metadata Checklist](Metadata-Checklist.md) is provided to stimulate thinking about the type of information to be collected. Additionally, we include checklists for [code](Code-Checklist.md), [data](Data-Checklist.md), and [model](Model-Checklist.md) repositories. The code checklist focuses on the contents of a well-documented GitHub repository, while the data and model checklists cover the content of the [data](HF_DatasetCard_Template_mkdocs.md/) and [model](HF_ModelCard_Template_mkdocs.md/) card templates, respectively.

Each checklist was developed following the FAIR principles (as defined by the [Go-FAIR Initiative](https://www.go-fair.org/fair-principles/)). They provide a detailed outline of tasks and files to include to ensure alignment with the FAIR principles, and are complementary to the descriptions provided within the [GitHub](GitHub-Repo-Guide.md) and [Hugging Face](Hugging-Face-Repo-Guide.md) Guides presented on this site. As with the contents of these Guides, these checklists are based on a combination of existing guides (e.g., [The Turing Way](https://book.the-turing-way.org/), the [Model Card Guidebook](https://huggingface.co/docs/hub/en/model-card-annotated), and the [Dataset Card Creation Guide](https://github.com/huggingface/datasets/blob/main/templates/README_guide.md)) and the experiences of our team. Following these checklists ensures digital products are aligned with FAIR principles and a best-effort toward reproducibility.[^2]

!!! tip "Pro tip"

    Use the eye icon at the top of any checklist page to access the source and copy the markdown for the checklist into an issue on your GitHub [Repo](GitHub-Repo-Guide.md) or [Project](Guide-to-GitHub-Projects.md) so you can check the boxes as you add each. When added to the main description of the issue, the issue summary will show _x_ out of total components completed for that issue.

The last topic in this section discusses different methods of [DOI Generation](DOI-Generation.md) for digital products (code, data, and models). It focuses on our selected method for dataset publication: [Hugging Face](https://huggingface.co/), with some guidance on using [Zenodo](https://zenodo.org/) to archive code (specifically, a GitHub repository). For more information about other common data publication venues&mdash;and to see the thought process behind our selection&mdash;see the [Data Archive Options Comparative Overview](../pdfs/Data_Archive-Publication-Options-Comparative-Overview.pdf) for more information.[^3] Generating a DOI for a digital product is part of ensuring a globally unique and persistent identifier that can be used to reference and refer back to a digital product&mdash;an important component of FAIR and Reproducible principles.

!!! info "References and Background"
    If you want to learn more about FAIR and Reproducible principles, explore these resources that we used when developing this guide:

    - [The Turing Way](https://book.the-turing-way.org/): an open-source, community data science handbook. It provides a strong foundation on the guiding principles for _this_ Guide, providing accessible explanations and overviews of topics from [reproducibility](https://book.the-turing-way.org/reproducible-research/reproducible-research), to [collaboration](https://book.the-turing-way.org/collaboration/collaboration) and [communication](https://book.the-turing-way.org/communication/communication), to [project design](https://book.the-turing-way.org/project-design/project-design), to [ethical research](https://book.the-turing-way.org/ethical-research/ethical-research). 
        
        _This is a particularly good resource for those [just starting to use `git` and GitHub](https://book.the-turing-way.org/reproducible-research/vcs/vcs-git). It builds motivation for use of version control through the lens of reproducibility._
    - Go-FAIR Initiative: [The FAIR Principles](https://www.go-fair.org/fair-principles/)
    - Ozoani, Ezi and Gerchick, Marissa and Mitchell, Margaret. Model Card Guidebook. Hugging Face, 2022. [https://huggingface.co/docs/hub/en/model-card-guidebook](https://huggingface.co/docs/hub/en/model-card-guidebook). 
        
        _The authors also provide a nice [summary of related work](https://huggingface.co/docs/hub/en/model-card-landscape-analysis), including [Datasheets for Datasets (Gebru, et al., 2018)](https://www.fatml.org/media/documents/datasheets_for_datasets.pdf) and The Dataset Nutrition Label ([label](https://datanutrition.org/labels/), [paper](https://arxiv.org/abs/1805.03677))._
    - Wilkinson, M., Dumontier, M., Aalbersberg, I. _et al._ The FAIR Guiding Principles for scientific data management and stewardship. _Sci Data_ **3**, 160018 (2016). [10.1038/sdata.2016.18](https://doi.org/10.1038/sdata.2016.18)
    - Barker, M., Chue Hong, N.P., Katz, D.S. _et al._ Introducing the FAIR Principles for research software. _Sci Data_ **9**, 622 (2022). [10.1038/s41597-022-01710-x](https://doi.org/10.1038/s41597-022-01710-x)
    - Balk, M. A., Bradley, J., Maruf, M., Altintaş, B., Bakiş, Y., Bart, H. L. Jr, Breen, D., Florian, C. R., Greenberg, J., Karpatne, A., Karnani, K., Mabee, P., Pepper, J., Jebbia, D., Tabarin, T., Wang, X., & Lapp, H. (2024). A FAIR and modular image-based workflow for knowledge discovery in the emerging field of imageomics. _Methods in Ecology and Evolution_, 15, 1129–1145. [10.1111/2041-210X.14327](https://doi.org/10.1111/2041-210X.14327)
    - The [FARR Research Coordination Network](https://www.farr-rcn.org/) has a number of interesting resources and events.
    - The [Research Data Aliance for Interdisciplinary Research](https://www.rd-alliance.org/disciplines/rda-for-interdisciplinary-research/) also provides links to resources and events particularly focused on considerations in interdisciplinary research.

!!! question "[Questions, Comments, or Concerns?](https://github.com/Imageomics/Collaborative-distributed-science-guide/issues)"

[^1]: While "Reproducible" is not part of the original FAIR principles as defined by the [Go-FAIR Initiative](https://www.go-fair.org/fair-principles/), we include it here to emphasize the importance of computational reproducibility alongside data stewardship. This extension reflects emerging practice in data-intensive science, where code, models, and workflows must be reusable and verifiable to support robust scientific claims. It is not part of the formal FAIR acronym, but aligns with broader community goals for open and transparent research.
[^2]: Full reproducibility is difficult to achieve; this [presentation](https://drive.google.com/file/d/1BFqZ00zMuyVHaD9A8PvzRDEg7aV0kp3W/view?usp=drive_link) by Odd Erik Gundersen provides a discussion of the varying degrees of reproducibilityand useful references when considering the level of reproducibility achieved by a given project.  
[^3]: The [Data Archive Options Comparative Overview](../pdfs/Data_Archive-Publication-Options-Comparative-Overview.pdf) was created in May 2023 as part of developing archive recommendations for the Institute, so it does not include information about newer features such as [Hugging Face's dataset viewer](https://huggingface.co/docs/hub/en/datasets-viewer), which greatly simplifies previewing datasets for downstream users.