Browse code

Merge branch 'main' of https://github.com/belleau/RAIDS

belleau authored on 14/04/2025 23:15:51
Showing 1 changed files

... ...
@@ -103,7 +103,7 @@ knitr::include_graphics("MainSteps_v05.png")
103 103
 
104 104
 The main steps are:
105 105
 
106
-**Step 1.** Set-up and provide population reference files
106
+**Step 1.** Set-up working directory and provide population reference files
107 107
 
108 108
 **Step 2** Sample the reference data for donors whose genotypes will be used for synthesis and optimize ancestry inference parameters using synthetic data
109 109
 
... ...
@@ -117,7 +117,7 @@ These steps are described in detail in the following.
117 117
 <br>
118 118
 
119 119
 
120
-## Step 1. Set-up and provide population reference files
120
+## Step 1. Set-up working directory and provide population reference files
121 121
 
122 122
 
123 123
 ### 1.1 Create a working directory structure
... ...
@@ -146,6 +146,7 @@ example will be run.
146 146
 
147 147
 #############################################################################
148 148
 ## Create a temporary working directory structure
149
+##    using the tempdir() function
149 150
 #############################################################################
150 151
 pathWorkingDirectory <- file.path(tempdir(), "workingDirectory")
151 152
 pathWorkingDirectoryData <- file.path(pathWorkingDirectory, "data")
... ...
@@ -166,7 +167,8 @@ if (!dir.exists(pathWorkingDirectory)) {
166 167
 
167 168
 The population reference files should be downloaded into the *data/refGDS* 
168 169
 sub-directory. This following code downloads the complete pre-processed files 
169
-for 1000 Genomes (1KG), for the hg38 build of the human genome, in the GDS format. The size of the 1KG GDS file is 15GB. 
170
+for 1000 Genomes (1KG), for the hg38 build of the human genome, in the GDS 
171
+format. The size of the 1KG GDS file is 15GB. 
170 172
 
171 173
 ```
172 174
 
... ...
@@ -202,9 +204,9 @@ library(RAIDS)
202 204
 
203 205
 #############################################################################
204 206
 ## The population reference GDS file and SNV Annotation GDS file
205
-## need to be located in the same sub-directory.
207
+##    need to be located in the same sub-directory.
206 208
 ## Note that the mini-reference GDS file used for this example is
207
-## NOT sufficient for reliable inference
209
+##    NOT sufficient for reliable inference.
208 210
 #############################################################################
209 211
 ## Path to the demo 1KG GDS file is located in this package
210 212
 dataDir <- system.file("extdata", package="RAIDS")
... ...
@@ -265,18 +267,21 @@ In the following code, only 2 individual profiles per
265 267
 sub-continental population are sampled from the 
266 268
 demo population GDS file:
267 269
 
268
-```{r sampling, echo=TRUE, eval=TRUE, collapse=TRUE, warning=FALSE, message=FALSE}
270
+```{r samplingProfiles, echo=TRUE, eval=TRUE, collapse=TRUE, warning=FALSE, message=FALSE}
269 271
 
270 272
 #############################################################################
271
-## Set up the following random number generator seed to reproduce the expected results
273
+## Set up the following random number generator seed to reproduce  
274
+##    the expected results
272 275
 #############################################################################
273 276
 set.seed(3043)
274 277
 
275 278
 #############################################################################
276 279
 ## Choose the profiles from the population reference GDS file for 
277
-## data synthesis.
278
-## Here we choose 2 profiles perm subcontinental population from the mini 1KG GDS file.
279
-## Normally, we would use 30 randomly chosen profiles per subcontinental population.
280
+##   data synthesis.
281
+## Here we choose 2 profiles per subcontinental population 
282
+##   from the mini 1KG GDS file.
283
+## Normally, we would use 30 randomly chosen profiles per 
284
+##   subcontinental population.
280 285
 #############################################################################
281 286
 dataRef <- select1KGPopForSynthetic(fileReferenceGDS=refGenotype,
282 287
                                         nbProfiles=2L)
... ...
@@ -284,13 +289,11 @@ dataRef <- select1KGPopForSynthetic(fileReferenceGDS=refGenotype,
284 289
 ```
285 290
 
286 291
 
287
-
288 292
 <br>
289 293
 
290 294
 ### 2.3 Infer ancestry
291 295
 
292
-Within a single function 
293
-call, data synthesis is performed, the synthetic 
296
+Within a single function call, data synthesis is performed, the synthetic 
294 297
 data are used to optimize the inference parameters and, with these, the 
295 298
 ancestry is inferred from the input sequence profile.
296 299
 
... ...
@@ -302,8 +305,8 @@ The *inferAncestry()* function requires a specific profile input format. The
302 305
 format is set by the *genoSource* parameter. 
303 306
 
304 307
 One of those formats is in a VCF format (*genoSource=c("VCF")*). 
305
-This format follows the VCF standard 
306
-with at least those genotype fields: _GT_, _AD_ and _DP_. 
308
+This format follows the VCF standard with at least those genotype 
309
+fields: _GT_, _AD_ and _DP_. 
307 310
 The SNVs  must be germline variants and should include the genotype of the 
308 311
 wild-type homozygous at the selected positions in the reference. The VCF file 
309 312
 must be gzipped.