Bioconductor Code: crumblr

Name	Mode	Size
..
articles	040000
deps	040000
news	040000
reference	040000
404.html	100644	6 kb
apple-touch-icon-120x120.png	100644	22 kb
apple-touch-icon-152x152.png	100644	33 kb
apple-touch-icon-180x180.png	100644	41 kb
apple-touch-icon-60x60.png	100644	7 kb
apple-touch-icon-76x76.png	100644	10 kb
apple-touch-icon.png	100644	41 kb
authors.html	100644	6 kb
favicon-16x16.png	100644	1 kb
favicon-32x32.png	100644	2 kb
index.html	100644	11 kb
katex-auto.js	100644	1 kb
lightswitch.js	100644	2 kb
link.svg	100644	1 kb
logo.png	100644	141 kb
pkgdown.js	100644	5 kb
pkgdown.yml	100644	0 kb

README.md

### Count ratio uncertainty modeling based linear regression  <div style="text-align: justify;"> The `crumblr` package enables analysis of count ratio data using precision-weighted linear (mixed) models, PCA and clustering. `crumblr`'s fast, normal approximation of transformed count data from a Dirichlet-multinomial model allows use of standard workflows to analyze count ratio data while modeling heteroskedasticity. ![](man/figures/Figure_crumblr_workflow.png) __Preprint:__ Hoffman and Roussos. 2025. Fast, flexible analysis of differences in cellular composition with crumblr. [biorxiv](https://www.biorxiv.org/content/10.1101/2025.01.29.635498v1) ### Details Analysis of count ratio data (i.e. fractions) requires special consideration since data is non-normal, heteroskedastic, and spans a low rank space. While counts can be considered directly using Poisson, negative binomial, or Dirichlet-multinomial models for simple regression applications, these can be problematic since they 1) can be very computationally expensive, 2) can produce poorly calibrated hypothesis tests, and 3) are challenging to extend to other applications. The widely used centered log-ratio (CLR) transform from [compositional data analysis](https://link.springer.com/book/10.1007/978-3-642-36809-7) makes count ratio data more normal and enables use the linear models, and other standard methods. Yet CLR-transformed data is still highly heteroskedastic: the precision of measurements varies widely. This important factor is not considered by existing methods. `crumblr` uses a fast asymptotic normal approximation of CLR-transformed counts from a Dirichlet-multinomial distribution to model the sampling variance of the transformed counts. `crumblr` enables incorporating the sampling variance as precision weights to linear (mixed) models in order to increase power and control the false positive rate. `crumblr` also uses a variance stabilizing transform (vst) based on the precision weights to improve performance of PCA and clustering. </div> ### Install ```r # 1) Make sure Bioconductor is installed if (!require("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } # 2) Install crumblr and dependencies BiocManager::install('DiseaseNeurogenomics/crumblr') ``` ### Introduction to compositional data analysis - Brief intro for bioinformatics [Quinn, et al. 2018](https://doi.org/10.1093/bioinformatics/bty175) - Book for analysis in R [van den Boogaart and Tolosana-Delgado, 2013](https://link.springer.com/book/10.1007/978-3-642-36809-7)