Bioconductor Code: RAIDS

History View file @ 5e34f05

@@ -66,10 +66,13 @@ sequences of:
                      * targeted gene panels
                      * RNA,
                     -including those from cancer-derived nucleic acids. The **RAIDS** package implements a data synthesis method that, for any given
                     -molecular profile, enables, on the one hand, profile-specific inference
                     +including those from cancer-derived nucleic acids. The **RAIDS** package implements a
                     +data synthesis method that, for any given
                     +molecular profile of an idividual, enables, on the one hand, profile-specific inference
                      parameter optimization and, on the other hand, a profile-specific inference
                     -accuracy estimate.
                     +accuracy estimate. By the molecular profile we mean a table of the individual's
                     +germline genotypes at genome positions with sufficient read coverage in the
                     +individual's input data, where sequence variants are frequent in the population reference data.
                      <br>
                      <br>
@@ -298,33 +301,30 @@ data are used to optimize the inference parameters and, with these, the
                      ancestry is inferred from the input sequence profile.
                      According to the type of input data (RNA or DNA sequence), a specific function
                     -should be called. The *inferAncestry()* function is used for DNA profiles while
                     +should be called. The *inferAncestry()* function (*inferAncestryDNA()* is
                     +the same as *inferAncestry()* ) is used for DNA profiles while
                      the *inferAncestryGeneAware()* function is RNA specific.
                     -The *inferAncestry()* function requires a specific profile input format. The
                     -format is set by the *genoSource* parameter.
                     +The *inferAncestry()* function requires a specific input format for the individual's
                     +genotyping profile as explained in the Introduction. The format is set by
                     +the *genoSource* parameter.
                     -One of those formats is in a VCF format (*genoSource=c("VCF")*).
                     -This format follows the VCF standard with at least those genotype
                     -fields: _GT_, _AD_ and _DP_.
                     -The SNVs  must be germline variants and should include the genotype of the
                     -wild-type homozygous at the selected positions in the reference. The VCF file
                     -must be gzipped.
                     +One of the allowed formats is VCF (*genoSource=c("VCF")*), with the following
                     +mandatory fields: _GT_, _AD_ and _DP_.
                     +The VCF file must be gzipped.
                     -A generic SNP file can replace the VCF file (*genoSource=c("generic")*).
                     -The format is comma separated and the mandatory columns are:
                     +Also allowed is a  "generic" file format  (*genoSource=c("generic")*), specified as
                     +a comma-separated table The following columns are mandatory:
                     -* _Chromosome_: The name of the chromosome
                     +* _Chromosome_: The name of the chromosome can be formatted as chr1 or 1
                      * _Position_: The position on the chromosome
                      * _Ref_: The reference nucleotide
                     -* _Alt_: The aternative nucleotide
                     -* _Count_: The total count
                     -* _File1R_: The count for the reference nucleotide
                     -* _File1A_: The count for the alternative nucleotide
+                    -
                     -Beware that the starting position in the **population reference GDS file** is
                     -zero (like BED files). The generic SNP file should also start
                     -at position zero.
                     +* _Alt_: The alternative nucleotide
                     +* _Count_: The total read count
                     +* _File1R_: Read count for the reference nucleotide
                     +* _File1A_: Read count for the alternative nucleotide
+                    +
                     +Note: a header with identical column names is required.
                      In this example, the profile is from DNA source and requires the use of the
                      *inferAncestry()* function.
@@ -364,11 +364,8 @@ if (requireNamespace("GenomeInfoDb", quietly=TRUE) &&
                      ```
                     -A profile GDS file is created in the *pathProfileGDS* directory while all the
                     -ancestry and optimal parameters information are integrated in the output
                     -object.
                     -At last, all temporary files created in this example should be deleted.
                     +The temporary files created in this example are deleted as follows.
                      ```{r removeTmp, echo=TRUE, eval=TRUE, collapse=TRUE, warning=FALSE, message=FALSE}
@@ -406,16 +403,17 @@ names(resOut)
                      ### 3.1 Inspect the inference and the optimal parameters
                     -For the global ancestry inference using PCA followed by nearest neighbor
                     -classification these parameters are *D* (the number of the top principal
                     -directions retained) and *k* (the number of nearest neighbors).
                     +Global ancestry is inferred using principal-component decomposition
                     +followed by nearest neighbor classification. Two parameters are defined and optimized:
                     +*D*, the number of the top principal directions retained and *k*, the number of nearest
                     +neighbors.
                     -The information is stored in the *Ancestry* entry as a *data.frame* object.
                     -It is a contains those columns:
                     +The results of the inference are provided as the *Ancestry* item in the *resOut* list.
                     +It is a *data.frame* with the following columns:
                      * _sample.id_: The unique identifier of the sample
                     -* _D_: The optimal PCA dimension value used to infer the ancestry
                     -* _k_: The optimal number of neighbors value used to infer the ancestry
                     +* _D_: The optimal *D* inference parameter
                     +* _k_: The optimal *k* inference parameter
                      * _SuperPop_: The inferred ancestry
@@ -446,7 +444,7 @@ createAUROCGraph(dfAUROC=resOut$paraSample$dfAUROC, title="Example ex1")
                      ```
                     -In this specific example, the performances are lower than expected
                     +In this illustrative example, the performance estimates are lower than expected
                      with a realistic sequence profile and a complete reference population file.
                      <br>
@@ -471,7 +469,7 @@ the reference dataset, are required:
                      - The population reference SNV Retained VCF file (optional)
                     -The format of those files are described
                     +The formats of those files are described in
                      the [Population reference dataset GDS files](Create_Reference_GDS_File.html)
                      vignette.

Merge branch 'main' of https://github.com/belleau/RAIDS