Description
Date accepted: 2025-02-03
Submitting Author Giovanni Saraceno
Submitting Author Github Handle: @giovsaraceno
Other Package Authors Github handles: @rmj3197
Repository: https://github.com/giovsaraceno/QuadratiK-package§
Version submitted:1.1.1
Submission type: Stats
Badge grade: gold
Editor: @emitanaka
Reviewers: @kasselhingee, @emitanaka
Archive: TBD
Version accepted: TBD
- DESCRIPTION file:
Type: Package
Package: QuadratiK
Title: A Collection of Methods Using Kernel-Based Quadratic Distances for
Statistical Inference and Clustering
Version: 1.0.0
Authors@R: c(
person("Giovanni", "Saraceno", , "[email protected]", role = c("aut", "cre"),
comment = "ORCID 000-0002-1753-2367"),
person("Marianthi", "Markatou", role = "aut"),
person("Raktim", "Mukhopadhyay", role = "aut"),
person("Mojgan", "Golzy", role = "ctb")
)
Maintainer: Giovanni Saraceno <[email protected]>
Description: The package includes test for multivariate normality, test for
uniformity on the Sphere, non-parametric two- and k-sample tests,
random generation of points from the Poisson kernel-based density and a
clustering algorithm for spherical data. For more information see
Saraceno, G., Markatou, M., Mukhopadhyay, R., Golzy, M. (2024)
<arXiv:2402.02290>, Ding, Y., Markatou, M., Saraceno, G. (2023)
<doi:10.5705/ss.202022.0347>, and Golzy, M., Markatou, M. (2020)
<doi:10.1080/10618600.2020.1740713>.
License: GPL (>= 3)
URL: https://cran.r-project.org/web/packages/QuadratiK/index.html,
https://github.com/giovsaraceno/QuadratiK-package
BugReports: https://github.com/giovsaraceno/QuadratiK-package/issues
Depends:
R (>= 3.5.0)
Imports:
cluster,
clusterRepro,
doParallel,
foreach,
ggplot2,
ggpp,
ggpubr,
MASS,
mclust,
methods,
moments,
movMF,
mvtnorm,
Rcpp,
RcppEigen,
rgl,
rlecuyer,
rrcov,
sn,
stats,
Tinflex
Suggests:
knitr,
rmarkdown,
roxygen2,
testthat (>= 3.0.0)
LinkingTo:
Rcpp,
RcppEigen
VignetteBuilder:
knitr
Config/testthat/edition: 3
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown=TRUE, roclets=c("namespace", "rd", "srr::srr_stats_roclet"))
RoxygenNote: 7.2.3
Scope
- Please indicate which category or categories from our package fit policies or statistical package categories this package falls under. (Please check an appropriate box below):
Data Lifecycle Packages
- data retrieval
- data extraction
- data munging
- data deposition
- data validation and testing
- workflow automation
- version control
- citation management and bibliometrics
- scientific software wrappers
- field and lab reproducibility tools
- database software bindings
- geospatial data
- text analysis
Statistical Packages
-
Bayesian and Monte Carlo Routines
-
Dimensionality Reduction, Clustering, and Unsupervised Learning
-
Machine Learning
-
Regression and Supervised Learning
-
Exploratory Data Analysis (EDA) and Summary Statistics
-
Spatial Analyses
-
Time Series Analyses
-
Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:
This category is the most suitable due to QuadratiK's clustering technique, specifically designed for spherical data. The package's clustering algorithm falls within the realm of unsupervised learning, where the focus is on identifying groupings in the data without pre-labeled categories. The two- and k-sample tests serve as additional tools for testing the differences between the identified groups.
Following the link https://stats-devguide.ropensci.org/standards.html we noticed in the "Table of contents" that category 6.9 refers to Probability Distribution. We are unsure how we fit and if we fit this category. Can you please advise?
- If submitting a statistical package, have you already incorporated documentation of standards into your code via the srr package?
Yes, we have incorporated documentation of standards into our QuadratiK package by utilizing the srr package, considering the categories "General" and "Dimensionality Reduction, Clustering, and Unsupervised Learning", in line with the recommendations provided in the rOpenSci Statistical Software Peer Review Guide.
- Who is the target audience and what are scientific applications of this package?
The QuadratiK package offers robust tools for goodness-of-fit testing, a fundamental aspect in statistical analysis, where accurately assessing the fit of probability distributions is essential. This is especially critical in research domains where model accuracy has direct implications on conclusions and further research directions. Spherical data structures are common in fields such as biology, geosciences and astronomy, where data points are naturally mapped to a sphere. QuadratiK provides a tailored approach to effectively handle and interpret these data. Furthermore, this package is also of particular interest to professionals in health and biological sciences, where understanding and interpreting spherical data can be crucial in studies ranging from molecular biology to epidemiology. Moreover, its implementation in both R and Python broadens its accessibility, catering to a wide audience accustomed to these popular programming languages.
- Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?
Yes, there are other R packages that address goodness-of-fit (GoF) testing and multivariate analysis. Notable among these are the energy package for energy statistics-based tests. The function kmmd in the kernlab package offers a kernel-based test which has similar mathematical formulation. The package sphunif provides all the tests for uniformity on the sphere available in literature. The list of implemented tests includes the test for uniformity based on the Poisson kernel. However, there are fundamental differences between the methods encoded in the aforementioned packages and those offered in the QuadratiK package.
QuadratiK uniquely focuses on kernel-based quadratic distances methods for GoF testing, offering a comprehensive set of tools for one-sample, two-sample, and k-sample tests. This specialization provides more nuanced and robust methodologies for statistical analysis, especially in complex multivariate contexts. QuadratiK is optimized for high-dimensional datasets, employing efficient C++ implementations. This makes it particularly suitable for contemporary large-scale data analysis challenges. The package introduces advanced methods for kernel centering and critical value computation, as well as optimal tuning parameter selection based on midpower analysis. QuadratiK includes a unique clustering algorithm for spherical data. These innovations are not covered in other available packages. With implementations in both R and Python, QuadratiK appeals to a wider audience across different programming communities. We also provide a user-friendly dashboard application which further enhances accessibility, catering to users with varying levels of statistical and programming expertise.
In summary there are fundamental differences between QuadratiK and all existing R packages:
- The goodness-of-fit tests are U-statistics based on centered kernels. The concept and methodology of centering is novel and unique to our methods and is not part of the methods of other existing packages.
- An algorithm for connecting the tuning parameter with the statistical properties of the test, namely power and degrees of freedom of the kernel (DOF) is provided. This feature differentiates our novel methods from all encoded methods in the aforementioned R packages.
- A new clustering algorithm for data that reside on the sphere is offered. This aspect is not a feature of existing packages.
- We also offer algorithms for generating random samples from Poisson kernel-based densities. This capability is also unique to our package.
- (If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?
Yes, our package, QuadratiK, is compliant with the rOpenSci guidelines on Ethics, Data Privacy, and Human Subjects Research. We have carefully considered and adhered to ethical standards and data privacy laws relevant to our work.
- Any other questions or issues we should be aware of?:
Please see the question posed in the first bullet.