Bioconductor Code: PureCN

Browse code

Switch to futile.logger; Added --version flags to command line tools; worked on contamination rate estimation (still alpha);log-ratio smoothing now in runAbsoluteCN, not in segmentation function.

git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/PureCN@125770 bc3139a8-67e5-0310-9ffc-ced21a209358

Markus Riester authored on 08/01/2017 23:26:53
Showing 40 changed files

DESCRIPTION index 3da6ea3..e4a8521 100755
NAMESPACE index ec602fc..f30416a 100755
NEWS index 11771b9..09ee804 100755
R/PureCN-internal.R index 5747b71..eb9e23d 100755
R/calculateLogRatio.R index 84b9432..6c82a74 100644
R/callAlterations.R index 754b25e..a440ef4 100644
R/correctCoverageBias.R index 08b8c98..6a2dc48 100755
R/createNormalDatabase.R index 09d3d77..07c2528 100755
R/filterTargets.R index 599ed03..26cdfe9 100644
R/filterVcf.R index f682617..84df47f 100755
R/findBestNormal.R index e3416dc..80282cb 100755
R/getSex.R index e1eb30e..17b17d9 100755
R/plotAbs.R index 65e08fe..312ea71 100755
R/powerDetectSomatic.R index c13e9cb..ffb3720 100644
R/readCurationFile.R index 4ee2fac..c2fc976 100755
R/runAbsoluteCN.R index cd4f84f..f1b059e 100755
R/segmentationCBS.R index e15d339..160fae1 100755
R/segmentationPSCBS.R index 013eb17..5314ffc 100755
R/setMappingBiasVcf.R index f249772..d5af3a9 100644
inst/extdata/Coverage.R index b1a747f..44e855d 100755
inst/extdata/NormalDB.R index 7ccbf2f..fd19c82 100755
inst/extdata/PureCN.R index 7f99e97..0dc577d 100755
inst/unitTests/test_getSexFromCoverage.R index b6601b0..1920d71 100755
inst/unitTests/test_runAbsoluteCN.R index b4d402a..4a0e398 100755
man/calculateLogRatio.Rd index 8669274..a5ad47e 100644
man/calculatePowerDetectSomatic.Rd index e904b28..cff1b6b 100644
man/createTargetWeights.Rd index b7d7151..ea918e2 100644
man/filterTargets.Rd index f37af71..28866b4 100644
man/filterVcfBasic.Rd index f9db132..8bc9b57 100644
man/filterVcfMuTect.Rd index b220024..52959ac 100644
man/findBestNormal.Rd index 9967581..626b6bc 100644
man/getSexFromCoverage.Rd index 2ece0ab..a681d4e 100644
man/getSexFromVcf.Rd index 2393d72..e80bcf5 100644
man/readCurationFile.Rd index f10d53f..28a6faf 100644
man/runAbsoluteCN.Rd index 4963688..5d96164 100644
man/segmentationCBS.Rd index 25bdd0a..6933805 100644
man/segmentationPSCBS.Rd index 9775636..d67c9a0 100644
man/setMappingBiasVcf.Rd index 4ea5732..6784ebd 100644
man/setPriorVcf.Rd index 2f5a3fd..a8aac50 100644
vignettes/PureCN.Rnw index 101eead..0ec7420 100755

DESCRIPTION

History View file @ 4d22136

@@ -2,8 +2,8 @@ Package: PureCN
                      Type: Package
                      Title: Copy number calling and SNV classification using
                          targeted short read sequencing
                     -Version: 1.5.32
                     -Date: 2017-01-03
                     +Version: 1.5.33
                     +Date: 2017-01-06
                      Authors@R: c(person("Markus", "Riester", role=c("aut", "cre"),
                              email="[email protected]"),
                          person("Angad P.", "Singh", role="aut"))
@@ -13,7 +13,7 @@ Description: This package estimates tumor purity, copy number, loss of
                          with standard somatic variant detection pipelines, and has support for
                          tumor samples without matching normal samples.
                      Depends:
                     -    R (>= 3.2),
                     +    R (>= 3.3),
                          DNAcopy,
                          VariantAnnotation (>= 1.14.1)
                      Imports:
@@ -32,6 +32,7 @@ Imports:
                          Biostrings,
                          rtracklayer,
                          ggplot2,
                     +    futile.logger,
                          edgeR,
                          limma
                      Suggests:

NAMESPACE

History View file @ 4d22136

@@ -55,13 +55,22 @@ importFrom(S4Vectors,queryHits)
                      importFrom(S4Vectors,subjectHits)
                      importFrom(SummarizedExperiment,rowRanges)
                      importFrom(data.table,data.table)
                     +importFrom(futile.logger,appender.tee)
                     +importFrom(futile.logger,flog.appender)
                     +importFrom(futile.logger,flog.fatal)
                     +importFrom(futile.logger,flog.info)
                     +importFrom(futile.logger,flog.threshold)
                     +importFrom(futile.logger,flog.warn)
                      importFrom(ggplot2,aes)
                      importFrom(ggplot2,aes_string)
                      importFrom(ggplot2,element_text)
                      importFrom(ggplot2,facet_wrap)
                     +importFrom(ggplot2,geom_boxplot)
                     +importFrom(ggplot2,geom_hline)
                      importFrom(ggplot2,geom_line)
                      importFrom(ggplot2,geom_point)
                      importFrom(ggplot2,ggplot)
                     +importFrom(ggplot2,labs)
                      importFrom(ggplot2,scale_alpha_continuous)
                      importFrom(ggplot2,stat_density2d)
                      importFrom(ggplot2,theme)

NEWS

History View file @ 4d22136

@@ -6,6 +6,7 @@ Changes in version 1.6.0
                      - Polished plots, added new GC-normalization and volcano plots
                      - Support for cell lines
                      - Improved somatic vs. germline status calling
                     +- Contamination rate estimation.
                      - Better mapping bias estimation and correction
                      - Better copy number normalization using multiple best normals
                      - New GC-normalization for smaller gene panels
@@ -16,6 +17,8 @@ Changes in version 1.6.0
                      - Faster post.optimize=TRUE by not optimizing poor fits.
                      - Tweaks to segmentationPSCBS
                      - Lots of improvements to command line scripts.
                     +- Code cleanups (switch from inlinedocs to roxygen, from message/warn to
                     +  futile.logger)
                      API CHANGES
@@ -40,6 +43,14 @@ API CHANGES
                              normalized so that w[1] is 1
                          - Removed ... from runAbsoluteCN
                          - added centromeres to segmentation function.
                     +    - contamination.cutoff now specifies fraction of SNPs necessary to
                     +      call sample contaminated. Does not remove anymore, since these
                     +      are obviously informative for contamination rate estimation in the
                     +      PureCN model.
                     +    - Removed verbose from most functions, since messages are now controlled
                     +      with futile.logger.
                     +    - Smoothing of log-ratios now optionally done by runAbsoluteCN, not
                     +      segmentation function.
                      PLANNED FEATURES
@@ -50,3 +61,13 @@ PLANNED FEATURES
                        MYC)
                      - Switch to S4 data structures (maybe)
                      - Whole dataset visualizations (maybe PureCN 1.8)
+                    +
+                    +
                     +ROADMAP
+                    +
                     +- January 15: API freeze (no changes to existing API)
                     +- February 1: Defaults freeze (no tuning of default parameters anymore)
                     +- February 15: Feature freeze (only bugfixes, code cleanups and speedups
                     +  and documentation improvements)
                     +- March 15: Code freeze (only bugfixes and documentation improvements)
                     +- April: Release

R/PureCN-internal.R

History View file @ 4d22136

@@ -64,7 +64,7 @@ c(test.num.copy, round(opt.C))[i], prior.K))
+                     }
                      .calcSNVLLik <- function(vcf, tumor.id.in.vcf, ov, p, test.num.copy,
                     -    C.posterior, C, opt.C, snv.model, prior.somatic, mapping.bias, snv.lr,
                     +    C.likelihood, C, opt.C, snv.model, prior.somatic, mapping.bias, snv.lr,
                          sampleid = NULL, cont.rate = 0.01, prior.K, max.coverage.vcf, non.clonal.M,
                          model.homozygous=FALSE, error=0.001, max.mapping.bias=0.8) {
@@ -84,9 +84,9 @@ c(test.num.copy, round(opt.C))[i], prior.K))
                              haploid.penalty <- 2
+                         }
                     -    subclonal <- apply(C.posterior[queryHits(ov), ], 1, which.max) == ncol(C.posterior)
                     +    subclonal <- apply(C.likelihood[queryHits(ov), ], 1, which.max) == ncol(C.likelihood)
                     -    seg.idx <- which(seq_len(nrow(C.posterior)) %in% queryHits(ov))
                     +    seg.idx <- which(seq_len(nrow(C.likelihood)) %in% queryHits(ov))
                          sd.ar <- sd(unlist(geno(vcf)$FA[, tumor.id.in.vcf]))
                          # Fit variants in all segments
@@ -104,13 +104,13 @@ c(test.num.copy, round(opt.C))[i], prior.K))
                              shape1 <- ar_all * dp_all + 1
                              shape2 <- (1 - ar_all) * dp_all + 1
                     -        lapply(seq(ncol(C.posterior)), function(k) {
                     +        lapply(seq(ncol(C.likelihood)), function(k) {
                                  Ci <- c(test.num.copy, opt.C[i])[k]
                                  priorM <- log(Ci + 1 + haploid.penalty)
                                  #priorM <- log(abs(2-Ci) + 1)
                                  priorHom <- ifelse(model.homozygous, -log(3), log(0))
                     -            skip <- test.num.copy > Ci | C.posterior[i, k] <=0
                     +            skip <- test.num.copy > Ci | C.likelihood[i, k] <=0
                                  p.ar <- lapply(c(0, 1), function(g) {
                                      cns <- test.num.copy
@@ -166,8 +166,8 @@ c(test.num.copy, round(opt.C))[i], prior.K))
                          likelihoods <- do.call(rbind,
                              lapply(seq_along(xx), function(i) Reduce("+",
                     -            lapply(seq(ncol(C.posterior)), function(j)
                     -                exp(xx[[i]][[j]]) * C.posterior[seg.idx[i], j]))))
                     +            lapply(seq(ncol(C.likelihood)), function(j)
                     +                exp(xx[[i]][[j]]) * C.likelihood[seg.idx[i], j]))))
                          colnames(likelihoods) <- c(paste("SOMATIC.M", test.num.copy, sep = ""),
                              paste("GERMLINE.M", test.num.copy, sep = ""), "GERMLINE.CONTHIGH",
@@ -418,7 +418,7 @@ c(test.num.copy, round(opt.C))[i], prior.K))
                      .getGoF <- function(result) {
                          if (is.null(result$SNV.posterior$beta.model)) return(0)
                          r <- result$SNV.posterior$beta.model$posteriors
                     -    e <- (r$ML.AR-r$AR)^2
                     +    e <- (r$ML.AR-r$AR.ADJUSTED)^2
                          maxDist <- 0.2^2
                          r2 <- max(1-mean(e,na.rm=TRUE)/maxDist,0)
                          return(r2)
@@ -583,7 +583,7 @@ c(test.num.copy, round(opt.C))[i], prior.K))
                      .optimizeGrid <- function(test.purity, min.ploidy, max.ploidy, test.num.copy = 0:7,
                     -    exon.lrs, seg, sd.seg, li, max.exon.ratio, max.non.clonal, verbose = FALSE, debug = FALSE) {
                     +    exon.lrs, seg, sd.seg, li, max.exon.ratio, max.non.clonal) {
                          ploidy.grid <- seq(min.ploidy, max.ploidy, by = 0.2)
                          if (min.ploidy < 1.8 && max.ploidy > 2.2) {
                              ploidy.grid <- c(seq(min.ploidy, 1.8, by = 0.2), 1.9, 2, 2.1, seq(2.2, max.ploidy,
@@ -592,16 +592,12 @@ c(test.num.copy, round(opt.C))[i], prior.K))
                          mm <- lapply(test.purity, function(p) {
                              b <- 2 * (1 - p)
                              log.ratio.offset <- 0
                     -        if (debug)
                     -            message(paste(b, log.ratio.offset))
                              lapply(ploidy.grid, function(D) {
                                  dt <- p/D
                                  llik.all <- lapply(seq_along(exon.lrs), function(i) .calcLlikSegmentExonLrs(exon.lrs[[i]],
                                      log.ratio.offset, max.exon.ratio, sd.seg, dt, b, D, test.num.copy))
                                  subclonal <- vapply(llik.all, which.max, double(1)) == 1
                                  subclonal.f <- length(unlist(exon.lrs[subclonal]))/length(unlist(exon.lrs))
                     -            if (debug)
                     -                message(paste(sum(subclonal), subclonal.f))
                                  if (subclonal.f > max.non.clonal)
                                      return(-Inf)
                                  llik <- sum(vapply(llik.all, max, double(1)))
@@ -857,6 +853,7 @@ c(test.num.copy, round(opt.C))[i], prior.K))
                          msg <- paste0(msg, "\n\nThis is most likely a user error due to",
                              " invalid input data or parameters (PureCN ",
                              packageVersion("PureCN"), ").")
                     +    flog.fatal(paste(strwrap(msg),"\n"))
                          stop( paste(strwrap(msg),"\n"), call.= FALSE)
+                     }
                      .stopRuntimeError <- function(...) {
@@ -864,25 +861,41 @@ c(test.num.copy, round(opt.C))[i], prior.K))
                          msg <- paste0(msg, "\n\nThis runtime error might be caused by",
                              " invalid input data or parameters. Please report bug (PureCN ",
                              packageVersion("PureCN"), ").")
                     +    flog.fatal(paste(strwrap(msg),"\n"))
                          stop( paste(strwrap(msg),"\n"), call.= FALSE)
+                     }
+                    -
                     +.logHeader <- function(l) {
                     +    flog.info(strrep("-", 60))
                     +    flog.info("PureCN %s", as.character(packageVersion("PureCN")))
                     +    flog.info(strrep("-", 60))
                     +    idxSupported <- sapply(l, function(x) class(eval(x))) %in%
                     +        c("character", "numeric", "NULL", "list", "logical") &
                     +        sapply(l, function(x) length(unlist(eval(x)))) < 12
+                    +
                     +    l <- c(l[idxSupported],lapply(l[!idxSupported], function(x) "<data>"))
                     +    argsStrings <- paste(sapply(seq_along(l), function(i)
                     +        paste0("-", names(l)[i], " ", paste(eval(l[[i]]),collapse=","))),
                     +        collapse=" ")
                     +    flog.info("Arguments: %s", argsStrings)
                     +}
                     +.logFooter <- function() {
                     +    flog.info("Done.")
                     +    flog.info(strrep("-", 60))
                     +}
                      .calcGCmetric <- function(gc.data, coverage) {
                          gcbins <- split(coverage$average.coverage, gc.data$gc_bias < 0.5);
                          mean(gcbins[[1]], na.rm=TRUE)/mean(gcbins[[2]], na.rm=TRUE)
+                     }
                     -.checkGCBias <- function(normal, tumor, gc.data, max.dropout, verbose) {
                     +.checkGCBias <- function(normal, tumor, gc.data, max.dropout) {
                          gcMetricNormal <- .calcGCmetric(gc.data, normal)
                          gcMetricTumor <- .calcGCmetric(gc.data, tumor)
                     -    if (verbose) {
                     -        message("AT/GC dropout: ", round(gcMetricTumor, digits=2),
                     -        " (tumor), ", round(gcMetricNormal, digits=2), " (normal).")
                     -    }
                     +    flog.info("AT/GC dropout: %.2f (tumor), %.2f (normal). ",
                     +        gcMetricTumor, gcMetricNormal)
                          if (gcMetricNormal < max.dropout[1] ||
                              gcMetricNormal > max.dropout[2] ||
                              gcMetricTumor  < max.dropout[1] ||
                              gcMetricTumor  > max.dropout[2]) {
                     -        warning("High GC-bias in normal or tumor. Is data GC-normalized?")
                     +        flog.warn("High GC-bias in normal or tumor. Is data GC-normalized?")
                              return(TRUE)
+                         }
                          return(FALSE)

R/calculateLogRatio.R

History View file @ 4d22136

@@ -10,7 +10,6 @@
                      #' function.
                      #' @param tumor Tumor coverage read in by the \code{\link{readCoverageGatk}}
                      #' function.
                     -#' @param verbose Verbose output.
                      #' @return \code{numeric(nrow(tumor))}, tumor vs. normal copy number log-ratios
                      #' for all targets.
                      #' @author Markus Riester
@@ -22,21 +21,17 @@
                      #'     package="PureCN")
                      #' normal <- readCoverageGatk(normal.coverage.file)
                      #' tumor <- readCoverageGatk(tumor.coverage.file)
                     -#' log.ratio <- calculateLogRatio(normal, tumor, verbose=FALSE)
                     +#' log.ratio <- calculateLogRatio(normal, tumor)
                      #'
                      #' @export calculateLogRatio
                     -calculateLogRatio <- function(normal, tumor, verbose=TRUE) {
                     +calculateLogRatio <- function(normal, tumor) {
                          # make sure that normal and tumor align
                          if (!identical(as.character(normal[, 1]), as.character(tumor[, 1]))) {
                              .stopUserError("Interval files in normal and tumor different.")
+                         }
                     -    if (verbose) {
                     -        message("Average coverage: ",
                     -            round(mean(tumor$average.coverage, na.rm=TRUE), digits=0),
                     -            "X (tumor), ",
                     -            round(mean(normal$average.coverage, na.rm=TRUE), digits=0),
                     -            "X (normal).")
                     -    }
                     +    flog.info("Average coverage: %.0fX (tumor) %.0fX (normal).",
                     +        mean(tumor$average.coverage, na.rm=TRUE),
                     +        mean(normal$average.coverage, na.rm=TRUE))
                          total.cov.normal <- sum(as.numeric(normal$coverage), na.rm = TRUE)
                          total.cov.tumor <- sum(as.numeric(tumor$coverage), na.rm = TRUE)

R/callAlterations.R

History View file @ 4d22136

@@ -57,10 +57,10 @@ log.ratio.cutoffs = c(-0.9, 0.9), failed = NULL, all.genes = FALSE) {
                          bm <- res$results[[id]]$SNV.posterior$beta.model
                          if (!is.null(bm)) {
                     -        segids <- bm$posterior$seg.id
                     +        segids <- bm$posteriors$seg.id
                              calls$num.snps.segment <- sapply(calls$seg.id, function(i)
                                  sum(segids==i,na.rm=TRUE))
                     -        calls$loh <- bm$posterior$ML.M.SEGMENT[match(calls$seg.id, segids)] == 0
                     +        calls$loh <- bm$posteriors$ML.M.SEGMENT[match(calls$seg.id, segids)] == 0
+                         }
                          calls <- calls[, !grepl("^\\.",colnames(calls))]

R/correctCoverageBias.R

History View file @ 4d22136

@@ -1,5 +1,5 @@
                      # Make CMD check happy
                     -globalVariables(names=c("gcIndex", "gcNum", "..level.."))
                     +globalVariables(names=c("..level.."))
                      #' Correct for GC bias
                      #'
@@ -110,7 +110,7 @@ plot.max.density = 50000) {
                              if (density == "Low") {
                                  print(ggplot(gcPlot, aes_string(x="gc_bias", y="average.coverage")) +
                                      geom_point(color='red', alpha=0.2) +
                     -                geom_line(data = plotMed, aes(x = gcIndex, y = gcNum), color = 'blue') +
                     +                geom_line(data = plotMed, aes_string(x = 'gcIndex', y = 'gcNum'), color = 'blue') +
                                      xlab("GC content") + ylab("Coverage") +
                                      theme(axis.text = element_text(size= 6), axis.title = element_text(size=16)) +
                                      facet_wrap(~ norm_status, nrow=1))
@@ -119,7 +119,7 @@ plot.max.density = 50000) {
                                  geom_point(color="blue", alpha = 0.1) +
                                  stat_density2d(aes(fill = ..level..), geom="polygon") +
                                  scale_alpha_continuous(limits=c(0.1, 0), breaks=seq(0, 0.1, by = 0.025)) +
                     -            geom_line(data = plotMed, aes(x = gcIndex,y = gcNum), color = 'red') +
                     +            geom_line(data = plotMed, aes_string(x = 'gcIndex',y = 'gcNum'), color = 'red') +
                                  xlab("GC content") + ylab("Coverage") +
                                  theme(axis.text = element_text(size = 16), axis.title = element_text(size = 16)) +
                                  facet_wrap(~norm_status, nrow=1))

R/createNormalDatabase.R

History View file @ 4d22136

@@ -46,7 +46,7 @@ max.mean.coverage = 100, ... ) {
                          normals.m <- scale(normals.m, 1/z, center=FALSE)
                          normals.pca <- prcomp(t(normals.m[idx,]), ...)
                     -    sex.determined <- sapply(normals,getSexFromCoverage, verbose=is.null(sex))
                     +    sex.determined <- sapply(normals,getSexFromCoverage)
                          if (is.null(sex)) {
                              sex <- sex.determined
                          } else {
@@ -59,9 +59,8 @@ max.mean.coverage = 100, ... ) {
                              for (i in seq_along(sex.determined)) {
                                  if (!is.na(sex.determined[i]) && sex[i] != "diploid" &&
                                      sex.determined[i] != sex[i]) {
                     -                warning("Sex mismatch in ", normal.coverage.files[i],
                     -                    ". Sex provided is ", sex, ", but could be ",
                     -                    sex.determined[i])
                     +                flog.warn("Sex mismatch in %s. Sex provided is %s, but could be %s.",
                     +                    normal.coverage.files[i], sex[i], sex.determined[i])
+                                 }
+                             }
+                         }
@@ -109,7 +108,6 @@ createExonWeightFile <- function() {
                      #' (>20) to estimate target log-ratio standard deviations. Should not overlap
                      #' with files in \code{tumor.coverage.files}.
                      #' @param target.weight.file Output filename.
                     -#' @param verbose Verbose output.
                      #' @return A \code{data.frame} with target weights.
                      #' @author Markus Riester
                      #' @examples
@@ -127,12 +125,12 @@ createExonWeightFile <- function() {
                      #'
                      #' @export createTargetWeights
                      createTargetWeights <- function(tumor.coverage.files, normal.coverage.files,
                     -target.weight.file, verbose = TRUE) {
                     -    if (verbose) message("Loading coverage data...")
                     +target.weight.file) {
                     +    flog.info("Loading coverage data...")
                          tumor.coverage <- lapply(tumor.coverage.files,  readCoverageGatk)
                          normal.coverage <- lapply(normal.coverage.files,  readCoverageGatk)
                          lrs <- lapply(tumor.coverage, function(tc) sapply(normal.coverage,
                     -            function(nc) calculateLogRatio(nc, tc, verbose=verbose)))
                     +            function(nc) calculateLogRatio(nc, tc)))
                          lrs <- do.call(cbind, lrs)

R/filterTargets.R

History View file @ 4d22136

@@ -23,7 +23,6 @@
                      #' \code{\link{createNormalDatabase}}.
                      #' @param normalDB.min.coverage Exclude targets with coverage lower than 20
                      #' percent of the chromosome median in the pool of normals.
                     -#' @param verbose Verbose output.
                      #' @return \code{logical(length(log.ratio))} specifying which targets should be
                      #' used in segmentation.
                      #' @author Markus Riester
@@ -55,7 +54,7 @@
                      #' @export filterTargets
                      filterTargets <- function(log.ratio, tumor, gc.data, seg.file,
                          filter.lowhigh.gc = 0.001, min.targeted.base = 5, normalDB = NULL,
                     -    normalDB.min.coverage = 0.2, verbose) {
                     +    normalDB.min.coverage = 0.2) {
                          # NA's in log.ratio confuse the CBS function
                          targetsUsed <- !is.na(log.ratio) & !is.infinite(log.ratio)
                          # With segmentation file, ignore all filters
@@ -64,12 +63,12 @@ filterTargets <- function(log.ratio, tumor, gc.data, seg.file,
                          if (!is.null(gc.data)) {
                              .checkFraction(filter.lowhigh.gc, "filter.lowhigh.gc")
                              targetsUsed <- .filterTargetsLowHighGC(targetsUsed, tumor,
                     -            gc.data, filter.lowhigh.gc, verbose)
                     +            gc.data, filter.lowhigh.gc)
+                         }
                          targetsUsed <- .filterTargetsNormalDB(targetsUsed, tumor, normalDB,
                     -        normalDB.min.coverage, verbose)
                     +        normalDB.min.coverage)
                          targetsUsed <- .filterTargetsTargetedBase(targetsUsed, tumor,
                     -        min.targeted.base, verbose)
                     +        min.targeted.base)
+                     }
                      .checkNormalDB <- function(tumor, normalDB) {
@@ -90,7 +89,7 @@ filterTargets <- function(log.ratio, tumor, gc.data, seg.file,
+                     }
                      .filterTargetsNormalDB <- function(targetsUsed, tumor, normalDB,
                     -normalDB.min.coverage, verbose) {
                     +normalDB.min.coverage) {
                          if (is.null(normalDB)) return(targetsUsed)
                          # make sure that normalDB matches tumor
                          if (!.checkNormalDB(tumor, normalDB)) {
@@ -113,40 +112,40 @@ normalDB.min.coverage, verbose) {
                          nAfter <- sum(targetsUsed)
                     -    if (verbose && nAfter < nBefore) {
                     -        message("Removing ", nBefore-nAfter, " targets with low coverage ",
                     -            "in normalDB.")
                     +    if (nAfter < nBefore) {
                     +        flog.info("Removing %i targets with low coverage in normalDB.",
                     +            nBefore-nAfter)
+                         }
                          targetsUsed
+                     }
                     -.filterTargetsChrHash <- function(targetsUsed, tumor, chr.hash, verbose) {
                     +.filterTargetsChrHash <- function(targetsUsed, tumor, chr.hash) {
                          if (is.null(chr.hash)) return(targetsUsed)
                          nBefore <- sum(targetsUsed)
                          targetsUsed <- targetsUsed & tumor$chr %in% chr.hash$chr
                          nAfter <- sum(targetsUsed)
                     -    if (verbose && nAfter < nBefore) {
                     -        message("Removing ", nBefore-nAfter, " targets on chromosomes ",
                     -            "outside chr.hash.")
                     +    if ( nAfter < nBefore) {
                     +        flog.info("Removing %i targets on chromosomes outside chr.hash.",
                     +            nBefore-nAfter)
+                         }
                          targetsUsed
+                     }
                     -.filterTargetsTargetedBase <- function(targetsUsed, tumor, min.targeted.base,
                     -    verbose) {
                     +.filterTargetsTargetedBase <- function(targetsUsed, tumor, min.targeted.base) {
                          if (is.null(min.targeted.base)) return(targetsUsed)
                          nBefore <- sum(targetsUsed)
                          targetsUsed <- targetsUsed & !is.na(tumor$targeted.base) &
                              tumor$targeted.base >= min.targeted.base
                          nAfter <- sum(targetsUsed)
                     -    if (verbose && nAfter < nBefore) message("Removing ", nBefore-nAfter,
                     -        " small targets")
                     +    if (nAfter < nBefore) {
                     +        flog.info("Removing %i small targets.", nBefore-nAfter)
                     +    }
                          targetsUsed
+                     }
                      .filterTargetsLowHighGC <- function(targetsUsed, tumor, gc.data,
                     -    filter.lowhigh.gc, verbose) {
                     +    filter.lowhigh.gc) {
                          gc.data <- gc.data[match(as.character(tumor[,1]), gc.data[,1]),]
                          qq <- quantile(gc.data$gc_bias, p=c(filter.lowhigh.gc,
 -filter.lowhigh.gc), na.rm=TRUE)
@@ -156,8 +155,8 @@ normalDB.min.coverage, verbose) {
                              !(gc.data$gc_bias < qq[1] | gc.data$gc_bias > qq[2])
                          nAfter <- sum(targetsUsed)
                     -    if (verbose && nAfter < nBefore) message("Removing ",
                     -        nBefore-nAfter, " low/high GC targets")
+                    -
                     +    if (nAfter < nBefore) {
                     +        flog.info("Removing %i low/high GC targets.", nBefore-nAfter)
                     +    }
                          targetsUsed
+                     }

R/filterVcf.R

History View file @ 4d22136

@@ -20,8 +20,9 @@
                      #' contamination. If a matched normal is available, this value is ignored,
                      #' because homozygosity can be confirmed in the normal.
                      #' @param contamination.cutoff Count SNPs in dbSNP with allelic fraction
                     -#' smaller than the first value, if found on most chromosomes, remove all with
                     -#' AF smaller than the second value.
                     +#' smaller than the first value or greater than 1-first value, if found on most chromosomes, mark sample
                     +#' as contaminated if the fraction of putative contamination SNPs exceeds
                     +#' the second value.
                      #' @param min.coverage Minimum coverage in tumor. Variants with lower coverage
                      #' are ignored.
                      #' @param min.base.quality Minimim base quality in tumor. Requires a \code{BQ}
@@ -40,7 +41,6 @@
                      #' SNPs. Ignored in case a matched normal is provided in the VCF.
                      #' @param interval.padding Include variants in the interval flanking regions of
                      #' the specified size in bp. Requires \code{target.granges}.
                     -#' @param verbose Verbose output.
                      #' @return A list with elements \item{vcf}{The filtered \code{CollapsedVCF}
                      #' object.} \item{flag}{A flag (\code{logical(1)}) if problems were
                      #' identified.} \item{flag_comment}{A comment describing the flagging.}
@@ -61,10 +61,10 @@
                      #' @importFrom stats pbeta
                      filterVcfBasic <- function(vcf, tumor.id.in.vcf = NULL,
                      use.somatic.status = TRUE, snp.blacklist = NULL, af.range = c(0.03, 0.97),
                     -contamination.cutoff = c(0.05,0.075), min.coverage = 15, min.base.quality = 25,
                     +contamination.cutoff = c(0.075,0.02), min.coverage = 15, min.base.quality = 25,
                      min.supporting.reads = NULL, error = 0.001, target.granges = NULL,
                     -remove.off.target.snvs = TRUE, model.homozygous = FALSE, interval.padding = 50,
                     -verbose=TRUE) {
                     +remove.off.target.snvs = TRUE, model.homozygous = FALSE,
                     +interval.padding = 50) {
                          flag <- NA
                          flag_comment <- NA
@@ -74,25 +74,73 @@ verbose=TRUE) {
                          if (use.somatic.status) {
                              n <- nrow(vcf)
                              vcf <- .testGermline(vcf, tumor.id.in.vcf)
                     -        if (verbose) message(paste("Removing", n-nrow(vcf),
                     -            "non heterozygous (in matched normal) germline SNPs."))
                     +        flog.info("Removing %i non heterozygous (in matched normal) germline SNPs.",
                     +            n-nrow(vcf))
                          } else {
                              info(vcf)$SOMATIC <- NULL
+                         }
                     +    # find supporting read cutoffs based on coverage and sequencing error
                     +    #--------------------------------------------------------------------------
                          if (is.null(min.supporting.reads)) {
                     -        min.supporting.reads <- calculatePowerDetectSomatic(
                     -            mean(geno(vcf)$DP[,tumor.id.in.vcf], na.rm=TRUE),
                     -            purity=1,ploidy=2, error=error, verbose=FALSE)$k
                     -    }
                     +        depths <- geno(vcf)$DP[,tumor.id.in.vcf]
                     +        minDepth <- round(log2(mean(depths)))
+                    +
                     +        depths <- sort(unique(round(log2(depths))))
                     +        depths <- depths[depths>=minDepth]
                     +        depths <- 2^depths
                     +        cutoffs <- sapply(depths, function(d)
                     +            calculatePowerDetectSomatic(d, ploidy=2, purity=1, error=error,
                     +                verbose=FALSE)$k)
                     +        depths <- c(0,depths)
                     +    } else {
                     +        depths <- 0
                     +        cutoffs <- min.supporting.reads
                     +    }
                          n <- nrow(vcf)
                     -    # remove variants with insufficient reads. This includes reference alleles
                     -    # to remove homozygous germline variants.
                     -    vcf <- vcf[do.call(rbind,
                     -        geno(vcf)$AD[,tumor.id.in.vcf])[,2] >= min.supporting.reads]
+                    -
                     +    .sufficientReads <- function(vcf, ref, depths, cutoffs) {
                     +        idx <- rep(TRUE, nrow(vcf))
+                    +
                     +        .filterVcfByAD <- function(vcf, min.supporting.reads, depth) {
                     +            # remove variants with insufficient reads. This includes reference alleles
                     +            # to remove homozygous germline variants.
                     +            do.call(rbind, geno(vcf)$AD[,tumor.id.in.vcf])[,ifelse(ref,1,2)] >=
                     +                min.supporting.reads |
                     +                geno(vcf)$DP[,tumor.id.in.vcf] < depth
                     +           }
+                    +
                     +        for (i in seq_along(cutoffs)) {
                     +            idx <- idx & .filterVcfByAD(vcf, cutoffs[i], depths[i])
                     +        }
                     +        idx
                     +    }
                     +    idxNotReference <- .sufficientReads(vcf,ref=FALSE, depths, cutoffs)
                     +    vcf <- vcf[idxNotReference]
                     +    idxNotHomozygous <- .sufficientReads(vcf,ref=TRUE, depths, cutoffs)
                     +    #--------------------------------------------------------------------------
+                    +
                     +    idx <- info(vcf)$DB & idxNotHomozygous &
                     +        ( unlist(geno(vcf)$FA[,tumor.id.in.vcf]) <  contamination.cutoff[1] |
                     +        unlist(geno(vcf)$FA[,tumor.id.in.vcf]) > (1 - contamination.cutoff[1]))
+                    +
                     +    fractionContaminated <- sum(idx)/sum(info(vcf)$DB)
                     +    if (fractionContaminated > 0) {
                     +    #    cntContPerChr <- sapply(seqlevelsInUse(vcf), function(seq) nrow(keepSeqlevels(vcf[idx],seq)))
                     +    #    cntAllPerChr <- sapply(seqlevelsInUse(vcf), function(seq) nrow(keepSeqlevels(vcf,seq)))
                     +    #    flog.info("Contamination p-value across chromosomes: %.2f.",
                     +    #        fisher.test(cntContPerChr, cntAllPerChr)$p.value)
                     +    }
+                    +
                     +    # do we have many low allelic fraction calls that are in dbSNP on basically
                     +    # all chromosomes? then we found some contamination
                     +    if (sum(runLength(seqnames(rowRanges(vcf[idx])))>3) >= 20 &&
                     +        fractionContaminated >= contamination.cutoff[2]) {
                     +        flag <- TRUE
                     +        flag_comment <- "POTENTIAL SAMPLE CONTAMINATION"
                     +    }
+                    +
                          # If we have a matched normal, we can distinguish LOH from homozygous
                          # in 100% pure samples. If not we need to see a sufficient number
                          # of alt alleles to believe a heterozygous normal genotype.
@@ -100,20 +148,15 @@ verbose=TRUE) {
                              af.range[2] <- 1
                              if (model.homozygous) af.range[2] <- Inf
                          } else {
                     -        vcf <- vcf[do.call(rbind,
                     -            geno(vcf)$AD[,tumor.id.in.vcf])[,1] >= min.supporting.reads]
                     +        vcf <- vcf[idxNotHomozygous]
+                         }
                          vcf <- vcf[unlist(geno(vcf)$DP[,tumor.id.in.vcf]) >= min.coverage]
                          vcf <- vcf[unlist(geno(vcf)$FA[,tumor.id.in.vcf]) >= af.range[1]]
                          # remove homozygous germline
                          vcf <- vcf[!info(vcf)$DB | geno(vcf)$FA[,tumor.id.in.vcf] < af.range[2]]
                     -    if (verbose) message("Removing ", n-nrow(vcf),
                     -        " SNPs with AF < ", af.range[1],
                     -        " or AF >= ", af.range[2],
                     -        " or less than ", min.supporting.reads,
                     -        " supporting reads or depth < ",
                     -        min.coverage, ".")
                     +    flog.info("Removing %i SNPs with AF < %.3f or AF >= %.3f or less than %i supporting reads or depth < %i.",
                     +        n-nrow(vcf), af.range[1], af.range[2], cutoffs[1], min.coverage)
                          if (!is.null(snp.blacklist)) {
                              for (i in seq_along(snp.blacklist)) {
@@ -124,7 +167,7 @@ verbose=TRUE) {
+                                 }
                                  n <- nrow(vcf)
                                  if (sum( rownames(vcf) %in% snp.blacklist.data[,1]) > 1  ) {
                     -                warning("Old SNP blacklists are deprecated. ",
                     +                flog.warn("Old SNP blacklists are deprecated. %s",
                                          "Use either a BED file or a normal.panel.vcf.file.")
                                      vcf <- vcf[!rownames(vcf) %in% snp.blacklist.data[,1],]
                                  } else {
@@ -132,79 +175,37 @@ verbose=TRUE) {
                                      ov <- suppressWarnings(overlapsAny(vcf, blackBed))
                                      vcf <- vcf[!ov]
+                                 }
                     -            if (verbose) message("Removing ", n-nrow(vcf),
                     -                " blacklisted SNPs.")
                     -        }
                     -    }
                     -    idx <- info(vcf)$DB & unlist(geno(vcf)$FA[,tumor.id.in.vcf]) <
                     -        contamination.cutoff[1]
+                    -
                     -    # do we have many low allelic fraction calls that are in dbSNP on basically
                     -    # all chromosomes? then we found some contamination
                     -    if (sum(runLength(seqnames(rowRanges(vcf[idx])))>1) >= 20) {
                     -        idx <- info(vcf)$DB & unlist(geno(vcf)$FA[,tumor.id.in.vcf]) <
                     -            contamination.cutoff[2]
                     -        vcf <- vcf[which(!idx)]
                     -        if (verbose) message("Removing ", sum(idx, na.rm=TRUE),
                     -            " contamination SNPs.")
                     -        flag <- TRUE
                     -        flag_comment <- "POTENTIAL SAMPLE CONTAMINATION"
                     -    }
+                    -
                     -    # if we have a matched normal, we can filter homozygous germline SNPs
                     -    # easily, if not, we still should remove most of them by removing everything
                     -    # with AF >= 0.97. But sometimes samples are very noisy, for example due to
                     -    # contamination, and homozygous germline SNPs are between 0.9 and 1.  This
                     -    # code finds samples which have a lot of dbSNPs between 0.95 and 0.97 on
                     -    # almost all chromosomes.  If found, then remove everything that's in dbSNP
                     -    # above 0.9.
                     -    if (!use.somatic.status && !model.homozygous) {
                     -        idx <- info(vcf)$DB & unlist(geno(vcf)$FA[,tumor.id.in.vcf]) >
                     -            (1 - contamination.cutoff[1])
+                    -
                     -        # do we have many high allelic fraction calls that are in dbSNP on
                     -        # basically all chromosomes?  then we found a very noisy sample
                     -        if (sum(runLength(seqnames(rowRanges(vcf[idx])))>2) >= 20) {
                     -            idx <- info(vcf)$DB & unlist(geno(vcf)$FA[,tumor.id.in.vcf]) >
                     -                (1 - contamination.cutoff[2])
                     -            vcf <- vcf[which(!idx)]
                     -            if (verbose) message("Removing ", sum(idx, na.rm=TRUE),
                     -                " noisy homozygous germline SNPs.")
                     -            flag <- TRUE
                     -            flag_comment <- .appendComment(flag_comment,
                     -                "NOISY HOMOZYGOUS GERMLINE CALLS")
                     +            flog.info("Removing %i blacklisted SNPs.", n-nrow(vcf))
+                             }
+                         }
                          if (!is.null(geno(vcf)$BQ)) {
                     -       n.vcf.before.filter <- nrow(vcf)
                     -       vcf <- vcf[which(as.numeric(geno(vcf)$BQ[,tumor.id.in.vcf])>=min.base.quality)]
                     -       if (verbose) message("Removing ", n.vcf.before.filter - nrow(vcf),
                     -           " low quality variants with BQ<",min.base.quality,".")
                     +        n.vcf.before.filter <- nrow(vcf)
                     +        vcf <- vcf[which(as.numeric(geno(vcf)$BQ[,tumor.id.in.vcf])>=min.base.quality)]
                     +        flog.info("Removing %i low quality variants with BQ < %i.",
                     +            n.vcf.before.filter - nrow(vcf), min.base.quality)
                          } else if (!is.null(rowRanges(vcf)$QUAL)) {
                     -       n.vcf.before.filter <- nrow(vcf)
                     -       idx <- which(as.numeric(rowRanges(vcf)$QUAL)>=min.base.quality)
                     -       if (length(idx)) vcf <- vcf[idx]
                     -       if (verbose) message("Removing ", n.vcf.before.filter - nrow(vcf),
                     -           " low quality variants with BQ<",min.base.quality,".")
                     +        n.vcf.before.filter <- nrow(vcf)
                     +        idx <- which(as.numeric(rowRanges(vcf)$QUAL)>=min.base.quality)
                     +        if (length(idx)) vcf <- vcf[idx]
                     +        flog.info("Removing %i low quality variants with BQ < %i.",
                     +            n.vcf.before.filter - nrow(vcf), min.base.quality)
+                         }
                          if (!is.null(target.granges)) {
                     -        vcf <- .annotateVcfTarget(vcf, target.granges, interval.padding, verbose)
                     +        vcf <- .annotateVcfTarget(vcf, target.granges, interval.padding)
                              if (remove.off.target.snvs) {
                                  n.vcf.before.filter <- nrow(vcf)
                                  # make sure all SNVs are in covered exons
                                  vcf <- vcf[info(vcf)$OnTarget>0]
                     -            if (verbose) message("Removing ", n.vcf.before.filter - nrow(vcf),
                     -                " variants outside intervals.",
                     -                " Set remove.off.target.snvs=FALSE to include.")
                     +            flog.info("Removing %i variants outside intervals.",
                     +                n.vcf.before.filter - nrow(vcf))
+                             }
+                         }
                     -    ##value<< A list with elements
                          list(
                     -        vcf=vcf, ##<< The filtered \code{CollapsedVCF} object.
                     -        flag=flag, ##<< A flag (\code{logical(1)}) if problems were identified.
                     -        flag_comment=flag_comment ##<< A comment describing the flagging.
                     +        vcf=vcf,
                     +        flag=flag,
                     +        flag_comment=flag_comment
+                         )
+                     }
@@ -222,7 +223,6 @@ verbose=TRUE) {
                      #' @param tumor.id.in.vcf The tumor id in the VCF file, optional.
                      #' @param stats.file MuTect stats file.
                      #' @param ignore MuTect flags that mark variants for exclusion.
                     -#' @param verbose Verbose output.
                      #' @param \dots Additional arguments passed to \code{\link{filterVcfBasic}}.
                      #' @return A list with elements \code{vcf}, \code{flag} and
                      #' \code{flag_comment}.  \code{vcf} contains the filtered \code{CollapsedVCF},
@@ -244,15 +244,15 @@ filterVcfMuTect <- function(vcf, tumor.id.in.vcf = NULL, stats.file = NULL,
                      ignore=c("clustered_read_position", "fstar_tumor_lod", "nearby_gap_events",
                      "poor_mapping_region_alternate_allele_mapq", "poor_mapping_region_mapq0",
                      "possible_contamination", "strand_artifact", "seen_in_panel_of_normals"),
                     -verbose=TRUE, ... ){
                     +... ){
                          if (is.null(stats.file)) return(
                     -        filterVcfBasic(vcf, tumor.id.in.vcf, verbose=verbose, ...))
                     +        filterVcfBasic(vcf, tumor.id.in.vcf, ...))
                          stats <- read.delim(stats.file, as.is=TRUE, skip=1)
                          if (is.null(stats$contig) || is.null(stats$position)) {
                              warning("MuTect stats file lacks contig and position columns.")
                     -        return(filterVcfBasic(vcf, tumor.id.in.vcf, verbose=verbose, ...))
                     +        return(filterVcfBasic(vcf, tumor.id.in.vcf, ...))
+                         }
                          gr.stats <- GRanges(seqnames=stats$contig,
@@ -265,13 +265,13 @@ verbose=TRUE, ... ){
                              n <- nrow(vcf)
                              stats <- stats[subjectHits(ov),]
                              vcf <- vcf[queryHits(ov)]
                     -        warning("MuTect stats file and VCF file do not align perfectly. ",
                     -         "Will remove ", n-nrow(vcf), " unmatched variants.")
                     +        flog.warn("MuTect stats file and VCF file do not align perfectly. Will remove %i unmatched variants.",
                     +            n-nrow(vcf))
+                         }
                          if (is.null(stats$failure_reasons)) {
                     -        warning("MuTect stats file lacks failure_reasons column.",
                     +        flog.warn("MuTect stats file lacks failure_reasons column.",
                                  " Keeping all variants listed in stats file.")
                     -        return(filterVcfBasic(vcf, tumor.id.in.vcf, verbose=verbose, ...))
                     +        return(filterVcfBasic(vcf, tumor.id.in.vcf, ...))
+                         }
                          n <- nrow(vcf)
@@ -279,9 +279,9 @@ verbose=TRUE, ... ){
                          ids <- sort(unique(unlist(sapply(ignore, grep, stats$failure_reasons))))
                          vcf <- vcf[-ids]
                     -    if (verbose) message("Removing ", n-nrow(vcf),
                     -        " MuTect calls due to blacklisted failure reasons.")
                     -    filterVcfBasic(vcf, tumor.id.in.vcf, verbose=verbose, ...)
                     +    flog.info("Removing %i MuTect calls due to blacklisted failure reasons.",
                     +        n-nrow(vcf))
                     +    filterVcfBasic(vcf, tumor.id.in.vcf, ...)
+                     }
@@ -305,7 +305,6 @@ verbose=TRUE, ... ){
                      #' value is for the case that variant is in both dbSNP and COSMIC > 2.
                      #' @param tumor.id.in.vcf Id of tumor in case multiple samples are stored in
                      #' VCF.
                     -#' @param verbose Verbose output.
                      #' @return A \code{numeric(nrow(vcf))} vector with the prior probability of
                      #' somatic status for each variant in the \code{CollapsedVCF}.
                      #' @author Markus Riester
@@ -320,7 +319,7 @@ verbose=TRUE, ... ){
                      #' @export setPriorVcf
                      setPriorVcf <- function(vcf,
                      prior.somatic=c(0.5, 0.0005, 0.999, 0.0001, 0.995, 0.01),
                     -tumor.id.in.vcf=NULL, verbose=TRUE) {
                     +tumor.id.in.vcf=NULL) {
                          if (is.null(tumor.id.in.vcf)) {
                              tumor.id.in.vcf <- .getTumorIdInVcf(vcf)
+                         }
@@ -329,24 +328,24 @@ tumor.id.in.vcf=NULL, verbose=TRUE) {
                               prior.somatic <- ifelse(info(vcf)$SOMATIC,
                                  prior.somatic[3],prior.somatic[4])
                     -         if (verbose) message("Found SOMATIC annotation in VCF. ",
                     -            "Setting somatic prior probabilities for somatic variants to ",
                     -            tmp[3]," or to ", tmp[4], " otherwise.")
                     +         flog.info("Found SOMATIC annotation in VCF.")
                     +         flog.info("Setting somatic prior probabilities for somatic variants to %f or to %f otherwise.",
                     +            tmp[3], tmp[4])
                          } else {
                               tmp <- prior.somatic
                               prior.somatic <- ifelse(info(vcf)$DB,
                                  prior.somatic[2], prior.somatic[1])
                               if (!is.null(info(vcf)$Cosmic.CNT)) {
                     -             if (verbose) message("Found COSMIC annotation in VCF. ",
                     -                "Setting somatic prior probabilities for hits to\n", tmp[5],
                     -                " or to ", tmp[6], " if in both COSMIC and dbSNP.")
                     +             flog.info("Found COSMIC annotation in VCF.")
                     +             flog.info("Setting somatic prior probabilities for hits to %f or to %f if in both COSMIC and dbSNP.",
                     +                tmp[5], tmp[6])
                                   prior.somatic[which(info(vcf)$Cosmic.CNT>2)] <- tmp[5]
                                   prior.somatic[which(info(vcf)$Cosmic.CNT>2 &
                                      info(vcf)$DB)] <- tmp[6]
                               } else {
                     -             if (verbose) message("Setting somatic prior probabilities ",
                     -                "for dbSNP hits to ", tmp[2]," or to ", tmp[1], " otherwise.")
                     +             flog.info("Setting somatic prior probabilities for dbSNP hits to %f or to %f otherwise.",
                     +                tmp[2], tmp[1])
+                              }
+                         }
                          prior.somatic
@@ -452,7 +451,7 @@ function(vcf, tumor.id.in.vcf, allowed=0.05) {
+                         }
                          return(TRUE)
+                     }
                     -.readAndCheckVcf <- function(vcf.file, genome, verbose) {
                     +.readAndCheckVcf <- function(vcf.file, genome) {
                          if (class(vcf.file) == "character") {
                              vcf <- readVcf(vcf.file, genome)
                          } else if (class(vcf.file) != "CollapsedVCF") {
@@ -465,7 +464,7 @@ function(vcf, tumor.id.in.vcf, allowed=0.05) {
                          if (sum(triAllelic)) {
                              n <- nrow(vcf)
                              vcf <- vcf[which(!triAllelic)]
                     -        if (verbose) message("Removing ",n-nrow(vcf), " triallelic sites.")
                     +        flog.info("Removing %i triallelic sites.",n-nrow(vcf))
+                         }
                          if (is.null(info(vcf)$DB)) {
                              # try to add an DB field based on rownames
@@ -525,9 +524,9 @@ function(vcf, tumor.id.in.vcf, allowed=0.05) {
                          vcf
+                     }
                     -.addCosmicCNT <- function(vcf, cosmic.vcf.file, verbose=TRUE) {
                     +.addCosmicCNT <- function(vcf, cosmic.vcf.file) {
                          if (!is.null(info(vcf)$Cosmic.CNT)) {
                     -        if (verbose) message("VCF already COSMIC annotated. Skipping.")
                     +        flog.info("VCF already COSMIC annotated. Skipping.")
                              return(vcf)
+                         }
                          cosmicSeqStyle <- seqlevelsStyle(headerTabix(
@@ -537,7 +536,7 @@ function(vcf, tumor.id.in.vcf, allowed=0.05) {
                          if (!length(intersect(seqlevelsStyle(vcf), cosmicSeqStyle))) {
                              seqlevelsStyle(vcfRenamedSL) <- cosmicSeqStyle[1]
+                         }
                     -    if (verbose) message("Reading COSMIC VCF...")
                     +    flog.info("Reading COSMIC VCF...")
                          # look-up the variants in COSMIC
                          cosmic.vcf <- readVcf(cosmic.vcf.file, genome=genome(vcf)[1],
                              ScanVcfParam(which = rowRanges(vcfRenamedSL),
@@ -569,20 +568,18 @@ function(vcf, tumor.id.in.vcf, allowed=0.05) {
                          vcf
+                     }
                     -.annotateVcfTarget <- function(vcf, target.granges, interval.padding, verbose) {
                     +.annotateVcfTarget <- function(vcf, target.granges, interval.padding) {
                          target.granges.padding <- target.granges
                          start(target.granges.padding) <- start(target.granges.padding)-interval.padding
                          end(target.granges.padding) <- end(target.granges.padding)+interval.padding
                          .calcTargetedGenome <- function(granges) {
                              tmp <- reduce(granges)
                     -        round(sum(width(tmp))/(1000^2),digits=2)
                     -    }
                     -    if (verbose) {
                     -        message("Total size of targeted genomic region: ",
                     -            .calcTargetedGenome(target.granges), "Mb (",
                     -            .calcTargetedGenome(target.granges.padding),
                     -            "Mb with ", interval.padding, "bp padding)")
                     +        sum(width(tmp))/(1000^2)
+                         }
                     +    flog.info("Total size of targeted genomic region: %.2fMb (%.2fMb with %ibp padding).",
                     +        .calcTargetedGenome(target.granges),
                     +        .calcTargetedGenome(target.granges.padding),
                     +        interval.padding)
                          idxTarget <- overlapsAny(vcf, target.granges)
                          idxPadding <- overlapsAny(vcf, target.granges.padding)
@@ -596,16 +593,17 @@ function(vcf, tumor.id.in.vcf, allowed=0.05) {
                          info(vcf)$OnTarget <- 0
                          info(vcf)$OnTarget[idxPadding] <- 2
                          info(vcf)$OnTarget[idxTarget] <- 1
                     -    if (verbose) {
                     -        targetsWithSNVs <- overlapsAny(target.granges.padding, vcf)
                     -        percentTargetsWithSNVs <- round(sum(targetsWithSNVs,na.rm=TRUE)/
                     -            length(targetsWithSNVs)*100, digits=1)
                     -        tmp <- ""
                     -        if (percentTargetsWithSNVs > 20) {
                     -            tmp <- " segmentationPSCBS might produce better results."
                     -        }
                     -        message(percentTargetsWithSNVs,"% of targets contain heterozygous ",
                     -            "SNVs.",tmp)
                     -    }
+                    +
                     +    # report stats in log file
                     +    targetsWithSNVs <- overlapsAny(target.granges.padding, vcf)
                     +    percentTargetsWithSNVs <- sum(targetsWithSNVs,na.rm=TRUE)/
                     +        length(targetsWithSNVs)*100
                     +    tmp <- ""
                     +    if (percentTargetsWithSNVs > 20) {
                     +        tmp <- " segmentationPSCBS might produce better results."
                     +    }
                     +    flog.info("%.1f%% of targets contain SNVs. %s",
                     +        percentTargetsWithSNVs, tmp)
+                    +
                          vcf
+                     }

R/findBestNormal.R

History View file @ 4d22136

@@ -22,7 +22,6 @@
                      #' @param pool.weights Either find good pooling weights by optimization or
                      #' weight all best normals equally.
                      #' @param plot.pool Allows the pooling function to create plots.
                     -#' @param verbose Verbose output.
                      #' @param \dots Additional arguments passed to \code{\link{poolCoverage}}.
                      #' @return Filename of the best matching normal.
                      #' @author Markus Riester
@@ -48,7 +47,7 @@
                      findBestNormal <- function(tumor.coverage.file, normalDB, pcs=1:3,
                          num.normals = 1, ignore.sex = FALSE, sex = NULL,
                          normal.coverage.files = NULL, pool = FALSE,
                     -    pool.weights = c("voom", "equal"), plot.pool = FALSE, verbose = TRUE,
                     +    pool.weights = c("voom", "equal"), plot.pool = FALSE,
                          ...) {
                          if (is.character(tumor.coverage.file)) {
                              tumor  <- readCoverageGatk(tumor.coverage.file)
@@ -71,15 +70,15 @@ findBestNormal <- function(tumor.coverage.file, normalDB, pcs=1:3,
                          if (!ignore.sex && !is.null(normalDB$sex) &&
                              sum(!is.na(normalDB$sex))>0) {
                              if (is.null(sex)) {
                     -            sex <- getSexFromCoverage(tumor, verbose=FALSE)
                     +            sex <- getSexFromCoverage(tumor)
+                             }
                     -        if (verbose) message("Sex of sample: ", sex)
                     +        flog.info("Sample sex: %s", sex)
                              if (!is.na(sex)) {
                                  idx.normals <- which(normalDB$sex == sex)
+                             }
                              if (length(idx.normals) < 2) {
                     -            warning("Not enough samples of sex ", sex,
                     -                " in database. Ignoring sex.")
                     +            flog.warn("Not enough samples of sex %s %s", sex,
                     +                "in database. Ignoring sex.")
                                  idx.normals <- seq_along(normalDB$normal.coverage.files)
+                             }
+                         }
@@ -98,10 +97,8 @@ findBestNormal <- function(tumor.coverage.file, normalDB, pcs=1:3,
                          if (pool) {
                            normals <- lapply(normal.coverage.files, readCoverageGatk)
                            pool.weights <- match.arg(pool.weights)
                     -      if (verbose) {
                     -          message("Pooling ", paste(basename(normal.coverage.files),
                     -            collapse=", "))
                     -      }
                     +      flog.info("Pooling %s.", paste(basename(normal.coverage.files),
                     +        collapse=", "))
                            w <- NULL
                            if (pool.weights == "voom" && num.normals > 1) {
                                logRatio <- .voomLogRatio(tumor,

R/getSex.R

History View file @ 4d22136

@@ -19,7 +19,6 @@
                      #' be considered when setting cutoffs.
                      #' @param remove.outliers Removes coverage outliers before calculating mean
                      #' chromosome coverages.
                     -#' @param verbose Verbose output.
                      #' @return Returns a \code{character(1)} with \code{M} for male, \code{F} for
                      #' female, or \code{NA} if unknown.
                      #' @author Markus Riester
@@ -32,7 +31,7 @@
                      #'
                      #' @export getSexFromCoverage
                      getSexFromCoverage <- function(coverage.file, min.ratio = 25, min.ratio.na = 20,
                     -    remove.outliers = TRUE, verbose = TRUE) {
                     +    remove.outliers = TRUE) {
                          if (is.character(coverage.file)) {
                              x <- readCoverageGatk(coverage.file)
                          } else {
@@ -51,7 +50,7 @@ getSexFromCoverage <- function(coverage.file, min.ratio = 25, min.ratio.na = 20,
+                         }
                          if (is.na(avg.coverage[sex.chr[1]]) || is.na(avg.coverage[sex.chr[2]]) ) {
                     -        if (verbose) message(
                     +        flog.warn(
                                  "Allosome coverage appears to be missing, cannot determine sex.")
                              return(NA)
+                         }
@@ -60,18 +59,13 @@ getSexFromCoverage <- function(coverage.file, min.ratio = 25, min.ratio.na = 20,
                              names(avg.coverage))],na.rm=TRUE)
                          autosome.ratio <- avg.autosome.coverage/(avg.coverage[sex.chr[1]]+0.0001)
                          if (autosome.ratio > 5) {
                     -        if (verbose) message(
                     -            "Allosome coverage very low, cannot determine sex.")
                     +        flog.info("Allosome coverage very low, cannot determine sex.")
                              return(NA)
+                         }
                          XY.ratio <- avg.coverage[sex.chr[1]]/ (avg.coverage[sex.chr[2]]+ 0.0001)
                     -    if (verbose) {
                     -        message("Mean coverages:",
                     -                " chrX: ",  round(avg.coverage[sex.chr[1]], digits=2),
                     -                " chrY: ", round(avg.coverage[sex.chr[2]], digits=2),
                     -                " chr1-22: ",round(avg.autosome.coverage, digits=2),"."
                     -        )
                     -    }
                     +    flog.info("Mean coverages: chrX: %.2f, chrY: %.2f, chr1-22: %.2f.",
                     +            avg.coverage[sex.chr[1]], avg.coverage[sex.chr[2]],
                     +            avg.autosome.coverage)
                          if (XY.ratio > min.ratio) return("F")
                          if (XY.ratio > min.ratio.na) return(NA)
                          return("M")
@@ -112,7 +106,6 @@ getSexFromCoverage <- function(coverage.file, min.ratio = 25, min.ratio.na = 20,
                      #' homozygous.
                      #' @param af.cutoff Remove all SNVs with allelic fraction lower than the
                      #' specified value.
                     -#' @param verbose Verbose output.
                      #' @return Returns a \code{character(1)} with \code{M} for male, \code{F} for
                      #' female, or \code{NA} if unknown.
                      #' @author Markus Riester
@@ -129,7 +122,7 @@ getSexFromCoverage <- function(coverage.file, min.ratio = 25, min.ratio.na = 20,
                      #' @importFrom stats fisher.test
                      getSexFromVcf <- function(vcf, tumor.id.in.vcf=NULL, min.or = 4,
                          min.or.na = 2.5, max.pv = 0.001, homozygous.cutoff = 0.95,
                     -    af.cutoff = 0.2, verbose=TRUE) {
                     +    af.cutoff = 0.2) {
                          if (is.null(tumor.id.in.vcf)) {
                              tumor.id.in.vcf <- .getTumorIdInVcf(vcf)
+                         }
@@ -144,41 +137,34 @@ getSexFromVcf <- function(vcf, tumor.id.in.vcf=NULL, min.or = 4,
                          homozygous <- geno(vcf)$FA[,tumor.id.in.vcf] > homozygous.cutoff
                          if ( sum(homozygous)/length(homozygous) < 0.001 ) {
                     -        if (verbose) {
                     -            message("No homozygous variants in VCF, provide unfiltered VCF.")
                     -        }
                     +        flog.info("No homozygous variants in VCF, provide unfiltered VCF.")
                              return(NA)
+                         }
                          if (!sum(chrX)) {
                     -        if (verbose) {
                     -            message("No variants on chrX in VCF.")
                     -        }
                     +        flog.info("No variants on chrX in VCF.")
                              return(NA)
+                         }
                          res <- fisher.test(homozygous, as.vector(chrX))
                     -    if (verbose) message(sum( homozygous & as.vector(chrX)),
                     -        " homozygous and ", sum( !homozygous & as.vector(chrX)),
                     -        " heterozygous variants on chromosome X.")
                     +    flog.info("%i homozygous and %i heterozygous variants on chrX.",
                     +        sum( homozygous & as.vector(chrX)),
                     +        sum( !homozygous & as.vector(chrX)))
                          sex <- "F"
                          if (res$estimate >= min.or.na) sex <- NA
                          if (res$estimate >= min.or && res$p.value > max.pv) sex <- NA
                          if (res$p.value <= max.pv && res$estimate >= min.or) sex <- "M"
                     -    if (verbose) {
                     -        message("Sex from VCF: ", sex, " (Fisher's p-value: ",
                     -            ifelse(res$p.value < 0.0001, "< 0.0001",
                     -            round(res$p.value, digits=3)),
                     -            "  odds-ratio: ", round(res$estimate, digits=2), ")")
                     -    }
                     +    flog.info("Sex from VCF: %s (Fisher's p-value: %s, odds-ratio: %.2f).",
                     +        sex, ifelse(res$p.value < 0.0001, "< 0.0001", round(res$p.value, digits=3)),
                     +        res$estimate)
                          return(sex)
+                     }
                      .getSex <- function(sex, normal, tumor) {
                          if (sex != "?") return(sex)
                     -    sex.tumor <- getSexFromCoverage(tumor, verbose=FALSE)
                     -    sex.normal <- getSexFromCoverage(normal, verbose=FALSE)
                     +    sex.tumor <- getSexFromCoverage(tumor)
                     +    sex.normal <- getSexFromCoverage(normal)
                          if (!identical(sex.tumor, sex.normal)) {
                     -        warning("Sex tumor/normal mismatch: tumor = ", sex.tumor,
                     +        flog.warn("Sex tumor/normal mismatch: tumor = ", sex.tumor,
                                  " normal = ", sex.normal)
+                         }
                          sex <- sex.tumor

R/plotAbs.R

History View file @ 4d22136

@@ -57,6 +57,7 @@
                      #' @importFrom graphics abline axis boxplot contour hist image
                      #'             legend lines par plot text mtext polygon points
                      #'             rect strwidth symbols barplot
                     +#' @importFrom ggplot2 geom_boxplot geom_hline labs
                      plotAbs <- function(res, ids = NULL,
                      type = c("hist", "overview", "overview2", "BAF", "AF", "volcano", "all"),
                      chr = NULL, germline.only = TRUE, show.contour = FALSE, purity = NULL,
@@ -152,7 +153,7 @@ max.mapping.bias = 0.8, palette.name = "Paired", ... ) {
                                      "Purity:", round(res$results[[i]]$purity[[1]], digits=2),
                                      " Tumor ploidy:", round( res$results[[i]]$ploidy, digits=3),
                                      " SNV log-likelihood:",
                     -                    round(res$results[[i]]$SNV.posterior$beta$llik, digits=2),
                     +                    round(res$results[[i]]$SNV.posterior$beta.model$llik, digits=2),
                                      " GoF:", r2,
                                      " Mean coverage:",
                                          paste(round(apply(geno(res$input$vcf)$DP,2,mean)),
@@ -315,7 +316,7 @@ max.mapping.bias = 0.8, palette.name = "Paired", ... ) {
                                  r <- .getVariantPosteriors(res, i, max.mapping.bias)
                                  if (is.null(r)) next
                     -            vcf <- res$input$vcf[res$results[[i]]$SNV.posterior$beta$vcf.ids]
                     +            vcf <- res$input$vcf[res$results[[i]]$SNV.posterior$beta.model$vcf.ids]
                                  # brwer.pal requires at least 3 levels
                                  r <- .getAFPlotGroups(r, is.null(info(vcf)$SOMATIC))
@@ -418,7 +419,15 @@ max.mapping.bias = 0.8, palette.name = "Paired", ... ) {
                                      legend("topright", legend=as.character(mycol.palette$group),
                                          col=mycol.palette$color,
                                          pch=mycol.palette$pch, cex=0.8)
+                    -
+                    +
                     +                estimatedContRate <-
                     +                    res$results[[i]]$SNV.posterior$beta.model$posterior.contamination
                     +                if (!is.null(estimatedContRate) &&
                     +                    estimatedContRate > min(r$AR[r$ML.SOMATIC])) {
                     +                    abline(h=estimatedContRate, col="red")
                     +                    text(x=mylogratio.xlim[1], y=estimatedContRate+0.02,
                     +                        labels="Contamination", col="red", pos=4)
                     +                }
                                      text(x=peak.ideal.means[
                                          as.character(r$ML.C[r$ML.SOMATIC])][idx.labels],
                                          y=r$ML.AR[r$ML.SOMATIC][idx.labels],
@@ -597,7 +606,7 @@ ss) {
+                     }
                      .getVariantPosteriors <- function(res, i, max.mapping.bias=NULL) {
                     -    r <- res$results[[i]]$SNV.posterior$beta.model$posterior
                     +    r <- res$results[[i]]$SNV.posterior$beta.model$posteriors
                          if (!is.null(r) && !is.null(max.mapping.bias)) {
                              r <- r[r$MAPPING.BIAS >= max.mapping.bias,]
+                         }
@@ -607,22 +616,56 @@ ss) {
                      .getAFPlotGroups <- function(r, single.mode) {
                          if (single.mode) {
                              groupLevels <- c("dbSNP/germline", "dbSNP/somatic", "novel/somatic",
                     -            "novel/germline", "COSMIC/germline", "COSMIC/somatic")
                     +            "novel/germline", "COSMIC/germline", "COSMIC/somatic", "contamination")
                              r$group <- groupLevels[1]
                              r$group[r$prior.somatic < 0.1 & r$ML.SOMATIC] <- groupLevels[2]
                              r$group[r$prior.somatic >= 0.1 & r$ML.SOMATIC] <- groupLevels[3]
                              r$group[r$prior.somatic >= 0.1 & !r$ML.SOMATIC] <- groupLevels[4]
                              r$group[r$prior.somatic >= 0.9 & !r$ML.SOMATIC] <- groupLevels[5]
                              r$group[r$prior.somatic >= 0.9 & r$ML.SOMATIC] <- groupLevels[6]
                     +        r$group[r$GERMLINE.CONTLOW > 0.5 |
                     +            r$GERMLINE.CONTHIGH > 0.5] <- groupLevels[7]
                              r$group <- factor(r$group, levels=groupLevels)
                              return(r)
+                         }
                          groupLevels <- c("germline", "germline/ML somatic", "somatic",
                     -        "somatic/ML germline")
                     +        "somatic/ML germline", "contamination")
                          r$group <- groupLevels[1]
                          r$group[r$prior.somatic < 0.1 & r$ML.SOMATIC] <- groupLevels[2]
                          r$group[r$prior.somatic >= 0.1 & r$ML.SOMATIC] <- groupLevels[3]
                          r$group[r$prior.somatic >= 0.1 & !r$ML.SOMATIC] <- groupLevels[4]
                     +    r$group[r$GERMLINE.CONTLOW > 0.5 |
                     +        r$GERMLINE.CONTHIGH > 0.5] <- groupLevels[5]
                          r$group <- factor(r$group, levels=groupLevels)
+                         r
+                     }
+                    +
+                    +
                     +.plotContamination <- function(pp,  max.mapping.bias=NULL, plot=TRUE) {
                     +    if (is.null(max.mapping.bias)) max.mapping.bias=0
+                    +
                     +    idx <- pp$GERMLINE.CONTHIGH+pp$GERMLINE.CONTLOW > 0.5 &
                     +        pp$MAPPING.BIAS >= max.mapping.bias
                     +    if (!length(which(idx))) return(0)
                     +    df <- data.frame(
                     +        chr=pp$chr[idx],
                     +        AR=sapply(pp$AR.ADJUSTED[idx], function(x) ifelse(x>0.5, 1-x,x)),
                     +        HIGHLOW=ifelse(pp$GERMLINE.CONTHIGH>pp$GERMLINE.CONTLOW,
                     +            "HIGH", "LOW")[idx]
                     +    )
                     +    df$chr <- factor(df$chr, levels=unique(df$chr))
                     +    # take the chromosome median and then average. the low count
                     +    # might be biased in case contamination rate is < AR cutoff
                     +    estimatedRate <- weighted.mean(
                     +        sapply(split(df$AR, df$HIGHLOW), median),
                     +        sapply(split(df$AR, df$HIGHLOW), length)
                     +    )
                     +    if (plot) {
                     +        gp <- ggplot(df, aes_string(x="chr",y="AR",fill="HIGHLOW"))+geom_boxplot()+
                     +        geom_hline(yintercept=estimatedRate, color="grey", linetype="dashed")+
                     +        labs(fill="")
                     +        print(gp)
                     +   }
                     +   estimatedRate
                     +}
+                    +

R/powerDetectSomatic.R

History View file @ 4d22136

@@ -19,8 +19,8 @@
                      #' mutations.} \item{k}{Minimum number of supporting reads.} \item{f}{Expected
                      #' allelic fraction. }
                      #' @author Markus Riester
                     -#' @references Carter et al., Absolute quantification of somatic DNA
                     -#' alterations in human cancer. Nature Biotechnology 2012.
                     +#' @references Carter et al. (2012), Absolute quantification of somatic DNA
                     +#' alterations in human cancer. Nature Biotechnology.
                      #' @examples
                      #'
                      #' purity <- c(0.1,0.15,0.2,0.25,0.4,0.6,1)

R/readCurationFile.R

History View file @ 4d22136

@@ -15,7 +15,6 @@
                      #' be used to automatically ignore unlikely solutions.
                      #' @param max.ploidy Maximum ploidy to be considered. If \code{NULL}, all. Can
                      #' be used to automatically ignore unlikely solutions.
                     -#' @param verbose Verbose output.
                      #' @return The return value of the corresponding \code{\link{runAbsoluteCN}}
                      #' call, but with the results array manipulated according the curation CSV file
                      #' and arguments of this function.
@@ -35,8 +34,8 @@
                      readCurationFile <- function(file.rds,
                      file.curation = gsub(".rds$", ".csv", file.rds),
                      remove.failed = FALSE, report.best.only=FALSE, min.ploidy = NULL,
                     -max.ploidy = NULL, verbose=FALSE) {
                     -    if (verbose) message("Reading ", file.rds, "...")
                     +max.ploidy = NULL) {
                     +    flog.info("Reading %s...", file.rds)
                          res <- readRDS(file.rds)
                          curation <- read.csv(file.curation, as.is=TRUE, nrows=1)
                          .checkLogical <- function(field) {

R/runAbsoluteCN.R

History View file @ 4d22136

@@ -58,7 +58,7 @@
                      #' \code{\link{filterVcfBasic}}.
                      #' @param args.filterVcf Arguments for variant filtering function. Arguments
                      #' \code{vcf}, \code{tumor.id.in.vcf}, \code{min.coverage},
                     -#' \code{model.homozygous}, \code{error} and \code{verbose} are required in the
                     +#' \code{model.homozygous} and \code{error} are required in the
                      #' filter function and are automatically set.
                      #' @param fun.setPriorVcf Function to set prior for somatic status for each
                      #' variant in the VCF. Defaults to \code{\link{setPriorVcf}}.
@@ -70,15 +70,15 @@
                      #' coverage files. Needs to return a \code{logical} vector whether an interval
                      #' should be used for segmentation. Defaults to \code{\link{filterTargets}}.
                      #' @param args.filterTargets Arguments for target filtering function. Arguments
                     -#' \code{log.ratio}, \code{tumor}, \code{gc.data}, \code{seg.file},
                     -#' \code{normalDB} and \code{verbose} are required and automatically set
                     +#' \code{log.ratio}, \code{tumor}, \code{gc.data}, \code{seg.file} and
                     +#' \code{normalDB} are required and automatically set
                      #' @param fun.segmentation Function for segmenting the copy number log-ratios.
                      #' Expected return value is a \code{data.frame} representation of the
                      #' segmentation. Defaults to \code{\link{segmentationCBS}}.
                      #' @param args.segmentation Arguments for segmentation function. Arguments
                     -#' \code{normal}, \code{tumor}, \code{log.ratio}, \code{plot.cnv},
                     +#' \code{normal}, \code{tumor}, \code{log.ratio}, \code{plot.cnv} and
                      #' \code{min.coverage}, \code{sampleid}, \code{vcf}, \code{tumor.id.in.vcf},
                     -#' \code{centromeres} and \code{verbose} are required in the segmentation function
                     +#' \code{centromeres} are required in the segmentation function
                      #' and automatically set.
                      #' @param fun.focal Function for identifying focal amplifications. Defaults to
                      #' \code{\link{findFocal}}.
@@ -133,6 +133,8 @@
                      #' \code{sd(log.ratio)*log.ratio.calibration}.
                      #' @param remove.off.target.snvs Deprecated. Use the corresponding argument in
                      #' \code{args.filterVcf}.
                     +#' @param smooth.log.ratio Smooth \code{log.ratio} using the \code{DNAcopy}
                     +#' package.
                      #' @param model.homozygous Homozygous germline SNPs are uninformative and by
                      #' default removed. In 100 percent pure samples such as cell lines, however,
                      #' heterozygous germline SNPs appear homozygous in case of LOH. Setting this
@@ -171,11 +173,19 @@
                      #' typically result in a slightly more accurate purity, especially for rather
                      #' silent genomes or very low purities. Otherwise, it will just use the purity
                      #' determined via the SCNA-fit.
                     +#' @param log.file If not \code{NULL}, store verbose output to file.
                      #' @param verbose Verbose output.
                      #' @return A list with elements \item{candidates}{Results of the grid search.}
                      #' \item{results}{All local optima, sorted by final rank.} \item{input}{The
                      #' input data.}
                      #' @author Markus Riester
                     +#' @references Riester et al. (2016). PureCN: Copy number calling and SNV
                     +#' classification using targeted short read sequencing. Source Code for Biology
                     +#' and Medicine, 11, pp. 13.
                     +#'
                     +#' Carter et al. (2012), Absolute quantification of somatic DNA alterations in
                     +#' human cancer. Nature Biotechnology.
                     +#'
                      #' @seealso \code{\link{correctCoverageBias}} \code{\link{segmentationCBS}}
                      #' \code{\link{calculatePowerDetectSomatic}}
                      #' @examples
@@ -221,6 +231,8 @@
                      #' @importFrom utils data read.delim tail packageVersion
                      #' @importFrom S4Vectors queryHits subjectHits DataFrame
                      #' @importFrom data.table data.table
                     +#' @importFrom futile.logger flog.info flog.warn flog.fatal
                     +#'             flog.threshold flog.appender appender.tee
                      runAbsoluteCN <- function(normal.coverage.file = NULL,
                          tumor.coverage.file = NULL, log.ratio = NULL, seg.file = NULL,
                          seg.file.sdev = 0.4, vcf.file = NULL, normalDB = NULL, genome,
@@ -237,29 +249,32 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
                          candidates = NULL, min.coverage = 15, max.coverage.vcf = 300,
                          max.non.clonal = 0.2, max.homozygous.loss = c(0.05, 1e07) , non.clonal.M = 1/3,
                          max.mapping.bias = 0.8, iterations = 30, log.ratio.calibration = 0.25,
                     -    remove.off.target.snvs = NULL, model.homozygous = FALSE, error = 0.001,
                     -    gc.gene.file = NULL, max.dropout = c(0.95, 1.1), max.logr.sdev = 0.75,
                     +    smooth.log.ratio = TRUE, remove.off.target.snvs = NULL,
                     +    model.homozygous = FALSE, error = 0.001, gc.gene.file = NULL,
                     +    max.dropout = c(0.95, 1.1), max.logr.sdev = 0.75,
                          max.segments = 300, min.gof = 0.8, plot.cnv = TRUE, cosmic.vcf.file = NULL,
                     -    post.optimize = FALSE, verbose = TRUE) {
                     -    debug <- FALSE
+                    -
                     +    post.optimize = FALSE, log.file = NULL, verbose = TRUE) {
+                    +
                     +    if (!verbose) flog.threshold("WARN")
                     +    if (!is.null(log.file)) flog.appender(appender.tee(log.file))
+                    +
                     +     # log function arguments
                     +    try(.logHeader(as.list(match.call())[-1]), silent=TRUE)
+                    +
                          # TODO: remove in PureCN 1.8
                          if (!is.null(remove.off.target.snvs)) {
                              args.filterVcf$remove.off.target.snvs <- remove.off.target.snvs
                     -        message("remove.off.target.snvs is deprecated. ",
                     -            "Please use it in args.filterVcf instead.")
                     +        flog.warn("remove.off.target.snvs is deprecated. Please use it in args.filterVcf instead.")
+                         }
                          # TODO: remove in PureCN 1.8
                          if (length(max.homozygous.loss)==1){
                               max.homozygous.loss <- c(max.homozygous.loss, 1e07)
                     -         message("max.homozygous.loss now a double(2) vector. ",
                     -            "Please provide both values.")
                     +         flog.warn("max.homozygous.loss now a double(2) vector. Please provide both values.")
+                         }
                          # TODO: remove in PureCN 1.8
                          if (is.null(normalDB) && !is.null(args.filterTargets$normalDB)) {
                              normalDB <- args.filterTargets$normalDB
                     -        message("normalDB now a runAbsoluteCN argument. ",
                     -            "Please provide it there, not in args.filterTargets.")
                     +        flog.warn("normalDB now a runAbsoluteCN argument. Please provide it there, not in args.filterTargets.")
+                         }
                          centromeres <- .getCentromerePositions(centromeres, genome)
@@ -274,8 +289,7 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
                          test.num.copy <- sort(test.num.copy)
                     -    if (verbose)
                     -        message("Loading GATK coverage files...")
                     +    flog.info("Loading GATK coverage files...")
                          if (!is.null(normal.coverage.file)) {
                              if (is.character(normal.coverage.file)) {
@@ -298,9 +312,6 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
                              if (is.null(sampleid))
                                  sampleid <- basename(tumor.coverage.file)
                          } else {
                     -        if (verbose)
                     -            message("tumor.coverage.file does not appear to be a filename, assuming",
                     -                " it is valid GATK coverage data.")
                              tumor <- tumor.coverage.file
                              if (is.null(sampleid))
                                  sampleid <- "Sample.1"
@@ -326,14 +337,15 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
                                  if (is.null(normal.coverage.file))
                                      normal <- tumor
                                  log.ratio <- .createFakeLogRatios(tumor, seg.file, chr.hash)
                     +            smooth.log.ratio <- FALSE
                                  if (is.null(sampleid))
                     -                sampleid <- read.delim(seg.file)[1, 1]
                     +                sampleid <- read.delim(seg.file, as.is=TRUE)[1, 1]
                              } else {
                                  if (is.null(normal.coverage.file)) {
                                      .stopUserError("Need a normal coverage file if log.ratio and seg.file are not",
                                        " provided.")
+                                 }
                     -            log.ratio <- calculateLogRatio(normal, tumor, verbose = verbose)
                     +            log.ratio <- calculateLogRatio(normal, tumor)
+                             }
                          } else {
                              # the segmentation algorithm will remove targets with low coverage in both tumor
@@ -362,14 +374,14 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
+                         }
                          args.filterTargets <- c(list(log.ratio = log.ratio, tumor = tumor,
                     -        gc.data = gc.data, seg.file = seg.file, normalDB = normalDB,
                     -        verbose = verbose), args.filterTargets)
                     +        gc.data = gc.data, seg.file = seg.file, normalDB = normalDB),
                     +        args.filterTargets)
                          targetsUsed <- do.call(fun.filterTargets,
                              .checkArgs(args.filterTargets, "filterTargets"))
                          # chr.hash is an internal data structure, so we need to do this separately.
                     -    targetsUsed <- .filterTargetsChrHash(targetsUsed, tumor, chr.hash, verbose)
                     +    targetsUsed <- .filterTargetsChrHash(targetsUsed, tumor, chr.hash)
                          targetsUsed <- which(targetsUsed)
                          if (nrow(tumor) != nrow(normal) || nrow(tumor) != length(log.ratio) || (!is.null(gc.gene.file) &&
                              nrow(tumor) != nrow(gc.data))) {
@@ -381,28 +393,31 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
                          if (!is.null(gc.gene.file)) {
                              gc.data <- gc.data[targetsUsed, ]
+                         }
                     -    if (verbose)
                     -        message("Using ", nrow(tumor), " targets.")
                     +    flog.info("Using %i targets.", nrow(tumor))
+                    +
                     +    if (smooth.log.ratio) {
                     +        CNA.obj <- smooth.CNA(CNA(log.ratio,
                     +            .strip.chr.name(normal$chr, chr.hash),
                     +            floor((normal$probe_start + normal$probe_end)/2),
                     +            data.type="logratio", sampleid="sample"))
                     +        log.ratio <- CNA.obj$sample
                     +    }
                          dropoutWarning <- FALSE
                          # clean up noisy targets, but not if the segmentation was already provided.
                          if (is.null(seg.file)) {
                              if (!is.null(gc.gene.file)) {
                     -            dropoutWarning <- .checkGCBias(normal, tumor, gc.data, max.dropout, verbose)
                     -        } else if (verbose) {
                     -            message("No gc.gene.file provided. Cannot check if data was ",
                     -                    "GC-normalized. Was it?")
                     +            dropoutWarning <- .checkGCBias(normal, tumor, gc.data, max.dropout)
                     +        } else {
                     +            flog.info("No gc.gene.file provided. Cannot check if data was GC-normalized. Was it?")
+                             }
+                         }
                          if (!is.null(gc.gene.file) && is.null(gc.data$Gene)) {
                     -        if (verbose)
                     -            message("No Gene column in gc.gene.file.",
                     -                    " You won't get gene-level calls.")
                     +        flog.info("No Gene column in gc.gene.file. You won't get gene-level calls.")
                              gc.gene.file <- NULL
+                         }
+                    -
                          exon.gr <- GRanges(seqnames=tumor$chr, IRanges(start=tumor$probe_start,
                                                                         end=tumor$probe_end))
@@ -416,9 +431,8 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
                          sex.vcf <- NULL
                          if (!is.null(vcf.file)) {
                     -        if (verbose)
                     -            message("Loading VCF...")
                     -        vcf <- .readAndCheckVcf(vcf.file, genome=genome, verbose=verbose)
                     +        flog.info("Loading VCF...")
                     +        vcf <- .readAndCheckVcf(vcf.file, genome=genome)
                              if (length(intersect(tumor$chr, seqlevels(vcf))) < 1) {
                                  .stopUserError("Different chromosome names in coverage and VCF.")
@@ -429,7 +443,6 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
+                             }
                              if (sum(colSums(geno(vcf)$DP) > 0) == 1 && args.filterVcf$use.somatic.status) {
                     -            message("VCF file seems to have only one sample. ", "Using SNVs in single mode.")
                                  args.filterVcf$use.somatic.status <- FALSE
+                             }
@@ -439,23 +452,21 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
                                  normal.id.in.vcf <- .getNormalIdInVcf(vcf, tumor.id.in.vcf)
+                             }
                     -        if (verbose)
                     -            message("Assuming ", tumor.id.in.vcf, " is tumor in VCF file.")
                     +        flog.info("%s is tumor in VCF file.", tumor.id.in.vcf)
                              if (sex != "diploid") {
                     -            sex.vcf <- getSexFromVcf(vcf, tumor.id.in.vcf, verbose = verbose)
                     +            sex.vcf <- getSexFromVcf(vcf, tumor.id.in.vcf)
                                  if (!is.na(sex.vcf) && sex %in% c("F", "M") && sex.vcf != sex) {
                     -                warning("Sex mismatch of coverage and VCF. ",
                     +                flog.warn("Sex mismatch of coverage and VCF. %s%s",
                                              "Could be because of noisy data, contamination, ",
                                        "loss of chrY or a mis-alignment of coverage and VCF.")
+                                 }
+                             }
                              n.vcf.before.filter <- nrow(vcf)
                     -        if (verbose)
                     -            message("Found ", n.vcf.before.filter, " variants in VCF file.")
                     +        flog.info("Found %i variants in VCF file.", n.vcf.before.filter)
                              args.filterVcf <- c(list(vcf = vcf, tumor.id.in.vcf = tumor.id.in.vcf,
                                  model.homozygous = model.homozygous, error = error,
                     -            target.granges = exon.gr, verbose = verbose), args.filterVcf)
                     +            target.granges = exon.gr), args.filterVcf)
                              if (is.null(args.filterVcf$min.coverage)) {
                                  args.filterVcf$min.coverage <- min.coverage
+                             }
@@ -465,34 +476,31 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
                              vcf <- vcf.filtering$vcf
                              if (!is.null(cosmic.vcf.file)) {
                     -            vcf <- .addCosmicCNT(vcf, cosmic.vcf.file, verbose = verbose)
                     +            vcf <- .addCosmicCNT(vcf, cosmic.vcf.file)
+                             }
                     -        args.setPriorVcf <- c(list(vcf = vcf, tumor.id.in.vcf = tumor.id.in.vcf,
                     -            verbose = verbose), args.setPriorVcf)
                     +        args.setPriorVcf <- c(list(vcf = vcf, tumor.id.in.vcf = tumor.id.in.vcf),
                     +            args.setPriorVcf)
                              prior.somatic <- do.call(fun.setPriorVcf,
                                  .checkArgs(args.setPriorVcf, "setPriorVcf"))
                              # get mapping bias
                              args.setMappingBiasVcf$vcf <- vcf
                              args.setMappingBiasVcf$tumor.id.in.vcf <- tumor.id.in.vcf
                     -        args.setMappingBiasVcf$verbose <- verbose
                              mapping.bias <- do.call(fun.setMappingBiasVcf,
                                  .checkArgs(args.setMappingBiasVcf, "setMappingBiasVcf"))
                              idxHqGermline <- prior.somatic < 0.5 & mapping.bias >= max.mapping.bias
                              vcf.germline <- vcf[idxHqGermline]
+                         }
                     -    if (verbose)
                     -        message("Sex of sample: ", sex)
                     -    if (verbose)
                     -        message("Segmenting data...")
                     +    flog.info("Sample sex: %s", sex)
                     +    flog.info("Segmenting data...")
                          args.segmentation <- c(list(normal = normal, tumor = tumor, log.ratio = log.ratio,
                              seg = .loadSegFile(seg.file), plot.cnv = plot.cnv, min.coverage = ifelse(is.null(seg.file),
                                  min.coverage, -1), sampleid = sampleid, vcf = vcf.germline, tumor.id.in.vcf = tumor.id.in.vcf,
                              normal.id.in.vcf = normal.id.in.vcf, max.segments = max.segments, chr.hash = chr.hash,
                     -        centromeres = centromeres, verbose = verbose), args.segmentation)
                     +        centromeres = centromeres), args.segmentation)
                          vcf.germline <- NULL
                          seg <- do.call(fun.segmentation,
@@ -516,20 +524,13 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
                                  n.vcf.before.filter <- nrow(vcf)
                                  vcf <- vcf[!is.na(snv.lr)]
                                  mapping.bias <- mapping.bias[!is.na(snv.lr)]
                     +            prior.somatic <- prior.somatic[!is.na(snv.lr)]
                                  # make sure all SNVs are in covered segments
                     -            if (verbose)
                     -                message("Removing ", n.vcf.before.filter - nrow(vcf), " variants outside segments.")
                     +            flog.info("Removing %i variants outside segments.", n.vcf.before.filter - nrow(vcf))
+                             }
                              ov <- findOverlaps(seg.gr, vcf)
                     -        if (verbose)
                     -            message("Using ", nrow(vcf), " variants.")
+                    -
                     -        # get final somatic priors
                     -        args.setPriorVcf$vcf <- vcf
                     -        args.setPriorVcf$verbose <- FALSE
                     -        prior.somatic <- do.call(fun.setPriorVcf,
                     -            .checkArgs(args.setPriorVcf, "setPriorVcf"))
                     +        flog.info("Using %i variants.", nrow(vcf))
+                         }
                          # get target log-ratios for all segments
@@ -554,7 +555,7 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
                          if (!is.null(seg.file)) {
                              sd.seg <- seg.file.sdev
                          } else {
                     -        exon.lrs <- lapply(exon.lrs, .smoothOutliers)
                     +    #    exon.lrs <- lapply(exon.lrs, .smoothOutliers)
+                         }
                          # renormalize, in case segmentation function changed means
@@ -587,14 +588,10 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
                          if (sum(li < 0) > 0)
                              .stopRuntimeError("Some segments have negative size.")
+                    -
                     -    if (verbose) {
                     -        message("Mean standard deviation of log-ratios: ", round(sd.seg, digits = 2))
                     -    }
                     +    flog.info("Mean standard deviation of log-ratios: %.2f", sd.seg)
                          log.ratio.offset <- rep(0, nrow(seg))
                     -    if (verbose)
                     -        message("Optimizing purity and ploidy. ", "Will take a minute or two...")
                     +    flog.info("2D-grid search of purity and ploidy...")
                          # find local maxima. use a coarser grid for purity, otherwise we will get far too
                          # many solutions, which we will need to cluster later anyways.
@@ -603,7 +600,7 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
                          } else {
                              candidate.solutions <- .optimizeGrid(test.purity = seq(max(0.1, min(test.purity)),
                                  min(0.99, max(test.purity)), by = 1/30), min.ploidy, max.ploidy, test.num.copy = test.num.copy,
                     -            exon.lrs, seg, sd.seg, li, max.exon.ratio, max.non.clonal, verbose, debug)
                     +            exon.lrs, seg, sd.seg, li, max.exon.ratio, max.non.clonal)
                              # if we have > 20 somatic mutations, we can try estimating purity based on
                              # allelic fractions and assuming diploid genomes.
@@ -623,11 +620,9 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
                              candidate.solutions$candidates <- candidate.solutions$candidates[idx.keep,
+                                 ]
+                         }
+                    -
                     -    if (verbose)
                     -        message(paste(strwrap(paste("Local optima:", paste(round(candidate.solutions$candidates$purity,
                     -            digits = 2), round(candidate.solutions$candidates$ploidy, digits = 2),
                     -            sep = "/", collapse = ", "))), collapse = "\n"))
                     +    flog.info(paste(strwrap(paste("Local optima:\n", paste(round(candidate.solutions$candidates$purity,
                     +        digits = 2), round(candidate.solutions$candidates$ploidy, digits = 2),
                     +        sep = "/", collapse = ", "))), collapse = "\n"))
                          simulated.annealing <- TRUE
@@ -640,16 +635,14 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
                                  total.ploidy <- candidate.solutions$candidates$ploidy[cpi]
                                  p <- candidate.solutions$candidates$purity[cpi]
                     -            if (verbose)
                     -                message("Testing local optimum ", cpi, "/", nrow(candidate.solutions$candidates),
                     -                  " at purity ", round(p, digits = 2), " and total ploidy ", round(total.ploidy,
                     -                    digits = 2), "...")
                     +            flog.info("Testing local optimum %i/%i at purity %.2f and total ploidy %.2f...",
                     +                cpi, nrow(candidate.solutions$candidates), p, total.ploidy)
                                  subclonal <- rep(FALSE, nrow(seg))
                                  old.llik <- -1
                                  cnt.llik.equal <- 0
                     -            C.posterior <- matrix(ncol = length(test.num.copy) + 1, nrow = nrow(seg))
                     -            colnames(C.posterior) <- c(test.num.copy, "Subclonal")
                     +            C.likelihood <- matrix(ncol = length(test.num.copy) + 1, nrow = nrow(seg))
                     +            colnames(C.likelihood) <- c(test.num.copy, "Subclonal")
                                  for (iter in seq_len(iterations)) {
                                      # test for convergence
                                      if (abs(old.llik - llik) < 0.0001) {
@@ -710,12 +703,6 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
                                          p, " and total ploidy ", total.ploidy, ".")
+                                     }
                     -                if (debug)
                     -                  message(paste("Iteration:", iter, " Log-likelihood: ", llik, " Purity:",
                     -                    p, " Total Ploidy:", total.ploidy, " Tumor Ploidy:", sum(li *
                     -                      (C))/sum(li), " Fraction sub-clonal:", subclonal.f, " Mean log-ratio offset",
                     -                    mean(log.ratio.offset)))
+                    -
                                      for (i in seq_len(nrow(seg))) {
                                        # Gibbs sample copy number Step 1: calculate log-likelihoods of fits In the first
                                        # iteration, we do not have the integer copy numbers yet, so calculate ploidy
@@ -743,8 +730,9 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
                                          ifelse(C[-i] == 0, 1, 0)) + li[i] * ifelse(Ci == 0, 1, 0))/sum(li),
                                          double(1))
                     -                  # set maximimum homozygous loss size to 10mb.
                     -                  if (li[i]>max.homozygous.loss[2] && test.num.copy[1]<1) frac.homozygous.loss[1] <- 1
                     +                  if (li[i] > max.homozygous.loss[2] && test.num.copy[1]<1) {
                     +                       frac.homozygous.loss[1] <- 1
                     +                  }
                                        log.prior.homozygous.loss <- log(ifelse(frac.homozygous.loss >
                                          max.homozygous.loss[1], 0, 1))
                                        if (iter > 1)
@@ -754,7 +742,7 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
                                        p.rij <- c(p.rij, .calcLlikSegmentSubClonal(exon.lrs[[i]] + log.ratio.offset[i],
                                          max.exon.ratio))
                     -                  C.posterior[i, ] <- exp(p.rij - max(p.rij))
                     +                  C.likelihood[i, ] <- exp(p.rij - max(p.rij))
                                        if (simulated.annealing)
                                          p.rij <- p.rij * exp(iter/4)
@@ -772,8 +760,6 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
                                        old.C <- C[i]
                                        opt.C <- (2^(seg$seg.mean + log.ratio.offset) * total.ploidy)/p -
                                          ((2 * (1 - p))/p)
                     -                  # message(opt.C[i], ' seg: ', seg$seg.mean[i], ' offset: ', log.ratio.offset[i],
                     -                  # ' purity: ', p)
                                        opt.C[opt.C < 0] <- 0
                                        if (id > length(test.num.copy)) {
                                          # optimal non-integer copy number
@@ -783,16 +769,14 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
                                          C[i] <- test.num.copy[id]
                                          subclonal[i] <- FALSE
+                                       }
                     -                  if (old.C != C[i] && debug)
                     -                    message("Old: ", old.C, " New: ", C[i], " LR: ", mean(exon.lrs[[i]]))
+                                     }
+                                 }
                                  if (subclonal.f < max.non.clonal && abs(total.ploidy - candidate.solutions$candidates$ploidy[cpi]) <
 )
                                      break
                                  log.ratio.calibration <- log.ratio.calibration + 0.25
                     -            if (verbose && attempt < max.attempts) {
                     -                message("Recalibrating log-ratios...")
                     +            if (attempt < max.attempts) {
                     +                flog.info("Recalibrating log-ratios...")
+                                 }
+                             }
                              seg.adjusted <- seg
@@ -807,7 +791,7 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
                                      SNV.posterior <- list(beta.model = list(llik = -Inf))
+                                 }
                                  return(list(log.likelihood = llik, purity = p, ploidy = weighted.mean(C,
                     -                li), total.ploidy = total.ploidy, seg = seg.adjusted, C.posterior = data.frame(C.posterior/rowSums(C.posterior),
                     +                li), total.ploidy = total.ploidy, seg = seg.adjusted, C.likelihood = data.frame(C.likelihood/rowSums(C.likelihood),
                                      ML.C = C, ML.Subclonal = subclonal), SNV.posterior = SNV.posterior,
                                      fraction.subclonal = subclonal.f, fraction.homozygous.loss = sum(li[which(C <
 .01)])/sum(li), gene.calls = NA, log.ratio.offset = log.ratio.offset,
@@ -833,15 +817,15 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
                                      tp <- p
                                      pp <- 1
+                                 }
                     +            cont.rate <- prior.contamination
                                  .fitSNV <- function(tp, pp) {
                     -                .fitSNVp <- function(px) {
                     -                  if (verbose)
                     -                    message("Fitting SNVs for purity ", round(px, digits = 2), " and tumor ploidy ",
                     -                      round(weighted.mean(C, li), digits = 2), ".")
                     +                .fitSNVp <- function(px, cont.rate=prior.contamination) {
                     +                    flog.info("Fitting SNVs for purity %.2f, tumor ploidy %.2f and contamination %.2f.",
                     +                        px, weighted.mean(C, li), cont.rate)
                                        list(beta.model = .calcSNVLLik(vcf, tumor.id.in.vcf, ov, px, test.num.copy,
                     -                    C.posterior, C, opt.C, snv.model = "beta", prior.somatic, mapping.bias,
                     -                    snv.lr, sampleid, cont.rate = prior.contamination, prior.K = prior.K,
                     +                    C.likelihood, C, opt.C, snv.model = "beta", prior.somatic, mapping.bias,
                     +                    snv.lr, sampleid, cont.rate = cont.rate, prior.K = prior.K,
                                          max.coverage.vcf = max.coverage.vcf, non.clonal.M = non.clonal.M,
                                          model.homozygous = model.homozygous, error = error, max.mapping.bias = max.mapping.bias))
+                                     }
@@ -852,9 +836,7 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
                                          GoF <- .getGoF(list(SNV.posterior=res.snvllik[[1]]))
                                          idx <- 1
                                          if (GoF < (min.gof-0.05)) {
                     -                        if (verbose) message("Poor goodness-of-fit (",
                     -                            round(GoF, digits=3),
                     -                            "). Skipping post-optimization.")
                     +                        flog.info("Poor goodness-of-fit (%.3f). Skipping post-optimization.", GoF)
                                          } else {
                                              res.snvllik <- c(res.snvllik, lapply(tp[-1], .fitSNVp))
                                            px.rij <- lapply(tp, function(px) vapply(which(!is.na(C)), function(i) .calcLlikSegment(subclonal = subclonal[i],
@@ -870,8 +852,7 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
                                        idx <- 1
+                                     }
                                      p <- tp[idx]
                     -                if (verbose)
                     -                  message("Optimized purity: ", p)
                     +                flog.info("Optimized purity: %.2f", p)
                                      SNV.posterior <- res.snvllik[[idx]]
                                      list(p = p, SNV.posterior = SNV.posterior, llik = px.rij.s[idx])
+                                 }
@@ -879,21 +860,57 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
                                  p <- fitRes$p
                                  SNV.posterior <- fitRes$SNV.posterior
+                             }
+                    -
                     -        list(log.likelihood = llik, purity = p, ploidy = weighted.mean(C, li), total.ploidy = total.ploidy,
                     -            seg = seg.adjusted, C.posterior = data.frame(C.posterior/rowSums(C.posterior),
                     -                ML.C = C, ML.Subclonal = subclonal), SNV.posterior = SNV.posterior,
                     -            fraction.subclonal = subclonal.f, fraction.homozygous.loss = sum(li[which(C <
                     -                0.01)])/sum(li), gene.calls = gene.calls, log.ratio.offset = log.ratio.offset,
                     -            SA.iterations = iter, failed = FALSE)
+                    +
                     +        list(log.likelihood = llik, purity = p, ploidy = weighted.mean(C, li),
                     +            total.ploidy = total.ploidy, seg = seg.adjusted,
                     +            C.posterior = data.frame(C.likelihood/rowSums(C.likelihood), ML.C =
                     +            C, Opt.C = opt.C, ML.Subclonal = subclonal),
                     +            C.likelihood=C.likelihood, SNV.posterior = SNV.posterior,
                     +            fraction.subclonal = subclonal.f, fraction.homozygous.loss =
                     +            sum(li[which(C < 0.01)])/sum(li), gene.calls = gene.calls,
                     +            log.ratio.offset = log.ratio.offset, SA.iterations = iter, failed =
                     +            FALSE)
+                         }
                          results <- lapply(seq_len(nrow(candidate.solutions$candidates)), .optimizeSolution)
                     -    if (verbose)
                     -        message("Remember, posterior probabilities assume a correct SCNA fit.")
                          results <- .rankResults(results)
                          results <- .filterDuplicatedResults(results)
+                    +
                     +    if (grepl("CONTAMINATION", vcf.filtering$flag_comment)) {
                     +        cont.rate <- .plotContamination(
                     +            results[[1]]$SNV.posterior$beta.model$posteriors,
                     +            max.mapping.bias, plot=FALSE)
                     +        if (cont.rate > prior.contamination) {
                     +            flog.info("Initial guess of contamination rate: %.3f", cont.rate)
                     +        }
                     +    }
                     +    ## optimize contamination. we just re-run the fitting
                     +    if (grepl("CONTAMINATION", vcf.filtering$flag_comment) &&
                     +        cont.rate>prior.contamination) {
                     +        flog.info("Optimizing contamination rate...")
+                    +
                     +        res.snvllik <-
                     +            .calcSNVLLik(vcf, tumor.id.in.vcf,
                     +                  ov, results[[1]]$purity, test.num.copy, results[[1]]$C.likelihood,
                     +                  results[[1]]$C.posterior$ML.C,
                     +                  results[[1]]$C.posterior$Opt.C,
                     +                  snv.model = "beta", prior.somatic, mapping.bias,
                     +                  snv.lr, sampleid, cont.rate = cont.rate, prior.K = prior.K,
                     +                  max.coverage.vcf = max.coverage.vcf, non.clonal.M = non.clonal.M,
                     +                  model.homozygous = model.homozygous, error = error,
                     +                  max.mapping.bias = max.mapping.bias)
                     +        results[[1]]$SNV.posterior$beta.model <- res.snvllik
                     +        cont.rate <- .plotContamination(
                     +                    results[[1]]$SNV.posterior$beta.model$posteriors,
                     +                                max.mapping.bias, plot=FALSE)
                     +                flog.info("Optimized contamination rate: %.3f", cont.rate)
                     +        results[[1]]$SNV.posterior$beta.model$posterior.contamination <- cont.rate
                     +        # add contamination rate to flag comment
                     +        vcf.filtering$flag_comment <- gsub("POTENTIAL SAMPLE CONTAMINATION",
                     +            paste0("POTENTIAL SAMPLE CONTAMINATION (", round(cont.rate*100, digits=1),"%)"),
                     +            vcf.filtering$flag_comment)
                     +    }
                          results <- .flagResults(results, max.non.clonal = max.non.clonal, max.logr.sdev = max.logr.sdev,
                              logr.sdev = sd.seg, max.segments = max.segments, min.gof = min.gof, flag = vcf.filtering$flag,
                              flag_comment = vcf.filtering$flag_comment, dropout = dropoutWarning, use.somatic.status = args.filterVcf$use.somatic.status,
@@ -905,10 +922,13 @@ runAbsoluteCN <- function(normal.coverage.file = NULL,
+                         }
                          if (length(results) < 1) {
                     -        warning("Could not find valid purity and ploidy solution.")
                     +        flog.warn("Could not find valid purity and ploidy solution.")
+                         }
                     +    .logFooter()
                          list(candidates = candidate.solutions, results = results, input = list(tumor = tumor.coverage.file,
                              normal = normal.coverage.file, log.ratio = data.frame(probe = normal[, 1],
                                  log.ratio = log.ratio), log.ratio.sdev = sd.seg, vcf = vcf, sampleid = sampleid,
                              sex = sex, sex.vcf = sex.vcf, chr.hash = chr.hash, centromeres = centromeres))
+                     }
+                    +
+                    +

R/segmentationCBS.R

History View file @ 4d22136

@@ -37,9 +37,16 @@
                      #' properly ordered.
                      #' @param centromeres A \code{data.frame} with centromere positions in first
                      #' three columns.  Currently not supported in this function.
                     -#' @param verbose Verbose output.
                      #' @return \code{data.frame} containing the segmentation.
                      #' @author Markus Riester
                     +#' @references Olshen, A. B., Venkatraman, E. S., Lucito, R., Wigler, M.
                     +#' (2004). Circular binary segmentation for the analysis of array-based DNA
                     +#' copy number data. Biostatistics 5: 557-572.
                     +#'
                     +#' Venkatraman, E. S., Olshen, A. B. (2007). A faster circular binary
                     +#' segmentation algorithm for the analysis of array CGH data. Bioinformatics
                     +#' 23: 657-63.
                     +#'
                      #' @seealso \code{\link{runAbsoluteCN}}
                      #' @examples
                      #'
@@ -67,7 +74,7 @@ segmentationCBS <- function(normal, tumor, log.ratio, seg, plot.cnv,
                          min.coverage, sampleid, target.weight.file = NULL, alpha = 0.005, undo.SD =
                              NULL, vcf = NULL, tumor.id.in.vcf = 1, normal.id.in.vcf = NULL,
                          max.segments = NULL, prune.hclust.h = NULL, prune.hclust.method = "ward.D",
                     -    chr.hash = NULL, centromeres = NULL, verbose = TRUE) {
                     +    chr.hash = NULL, centromeres = NULL) {
                          if (is.null(chr.hash)) chr.hash <- .getChrHash(tumor$chr)
@@ -76,24 +83,21 @@ segmentationCBS <- function(normal, tumor, log.ratio, seg, plot.cnv,
                              target.weights <- read.delim(target.weight.file, as.is=TRUE)
                              target.weights <- target.weights[match(as.character(tumor[,1]),
                                  target.weights[,1]),2]
                     -        if (verbose) message("Target weights found, will use weighted CBS.")
                     +        flog.info("Target weights found, will use weighted CBS.")
+                         }
                     -    x <- .CNV.analyze2(normal, tumor, logR=log.ratio, plot.cnv=plot.cnv,
                     +    x <- .CNV.analyze2(normal, tumor, log.ratio=log.ratio, plot.cnv=plot.cnv,
                              min.coverage=min.coverage, sampleid=sampleid, alpha=alpha,
                              weights=target.weights, sdundo=undo.SD, max.segments=max.segments,
                     -        chr.hash=chr.hash, verbose=verbose)
                     +        chr.hash=chr.hash)
                          if (!is.null(vcf)) {
                              x <- .pruneByVCF(x, vcf, tumor.id.in.vcf, chr.hash=chr.hash)
                              x <- .findCNNLOH(x, vcf, tumor.id.in.vcf, alpha=alpha,
                                  chr.hash=chr.hash)
                              x <- .pruneByHclust(x, vcf, tumor.id.in.vcf, h=prune.hclust.h,
                     -            method=prune.hclust.method, chr.hash=chr.hash, verbose=verbose)
                     +            method=prune.hclust.method, chr.hash=chr.hash)
+                         }
                          idx.enough.markers <- x$cna$output$num.mark > 1
                          rownames(x$cna$output) <- NULL
                     -    if (verbose) {
                     -        print(x$cna$output[idx.enough.markers,])
                     -    }
                          x$cna$output[idx.enough.markers,]
+                     }
@@ -159,7 +163,7 @@ iterations=2, chr.hash ) {
+                     }
                      .pruneByHclust <- function(x, vcf, tumor.id.in.vcf, h=NULL, method="ward.D",
                     -    min.variants=5, chr.hash, iterations=2, verbose=TRUE) {
                     +    min.variants=5, chr.hash, iterations=2) {
                          for (iter in seq_len(iterations)) {
                          seg <- x$cna$output
                          #message("HCLUST: ", iter, " Num segment LRs: ", length(table(x$cna$output$seg.mean)))
@@ -180,7 +184,7 @@ iterations=2, chr.hash ) {
                          if (is.null(h)) {
                              h <- .getPruneH(seg)
                     -        if (verbose) message("Setting prune.hclust.h parameter to ", h)
                     +        flog.info("Setting prune.hclust.h parameter to %f.", h)
+                         }
                          numVariants <- sapply(seq_len(nrow(seg)), function(i)
@@ -293,9 +297,9 @@ iterations=2, chr.hash ) {
                      # ExomeCNV version without the x11() calls
                      .CNV.analyze2 <-
                     -function(normal, tumor, logR=NULL, min.coverage=15, weights=NULL, sdundo=NULL,
                     -undo.splits="sdundo", smooth=TRUE, alpha=0.01, sampleid=NULL, plot.cnv=TRUE,
                     -max.segments=NULL, chr.hash=chr.hash, verbose=TRUE) {
                     +function(normal, tumor, log.ratio=NULL, min.coverage=15, weights=NULL, sdundo=NULL,
                     +undo.splits="sdundo", alpha=0.01, sampleid=NULL, plot.cnv=TRUE,
                     +max.segments=NULL, chr.hash=chr.hash) {
                          # first, do it for exons with enough coverage. MR: added less stringent
                          # cutoff in case normal looks great. these could be homozygous deletions
@@ -303,51 +307,46 @@ max.segments=NULL, chr.hash=chr.hash, verbose=TRUE) {
                          well.covered.exon.idx <- .getWellCoveredExons(normal, tumor,
                              min.coverage)
                     -    if (verbose) message("Removing ", sum(!well.covered.exon.idx),
                     -        " low coverage exons.")
                     -    if (is.null(logR)) norm.log.ratio = calculateLogRatio(normal, tumor, verbose)
                     -    else norm.log.ratio = logR
                     +    flog.info("Removing %i low coverage exons.", sum(!well.covered.exon.idx))
                          if (is.null(sdundo)) {
                     -        sdundo <- .getSDundo(norm.log.ratio[well.covered.exon.idx])
                     +        sdundo <- .getSDundo(log.ratio[well.covered.exon.idx])
+                         }
                     -    CNA.obj <- CNA(norm.log.ratio[well.covered.exon.idx],
                     +    CNA.obj <- CNA(log.ratio[well.covered.exon.idx],
                              .strip.chr.name(normal$chr[well.covered.exon.idx], chr.hash),
                              floor((normal$probe_start[well.covered.exon.idx] +
                              normal$probe_end[well.covered.exon.idx])/2), data.type="logratio",
                              sampleid=sampleid)
                     -    smoothed.CNA.obj = if (smooth) smooth.CNA(CNA.obj) else CNA.obj
+                    -
                          try.again <- 0
                          while (try.again < 2) {
                     -        if (verbose) message("Setting undo.SD parameter to ", sdundo)
                     +        flog.info("Setting undo.SD parameter to %f.", sdundo)
                              if (!is.null(weights)) {
                                  weights <- weights[well.covered.exon.idx]
                                  # MR: this shouldn't happen. In doubt, count them as median.
                                  weights[is.na(weights)] <- median(weights, na.rm=TRUE)
                     -            segment.smoothed.CNA.obj <- segment(smoothed.CNA.obj,
                     +            segment.CNA.obj <- segment(CNA.obj,
                                      undo.splits=undo.splits, undo.SD=sdundo,
                     -                verbose=ifelse(verbose, 1, 0), alpha=alpha,weights=weights)
                     +                verbose=0, alpha=alpha,weights=weights)
                              } else {
                     -            segment.smoothed.CNA.obj <- segment(smoothed.CNA.obj,
                     +            segment.CNA.obj <- segment(CNA.obj,
                                      undo.splits=undo.splits, undo.SD=sdundo,
                     -                verbose=ifelse(verbose, 1, 0), alpha=alpha)
                     +                verbose=0, alpha=alpha)
+                             }
                     -        if (is.null(max.segments) || nrow(segment.smoothed.CNA.obj$output)
                     +        if (is.null(max.segments) || nrow(segment.CNA.obj$output)
                                  < max.segments) break
                              sdundo <- sdundo * 1.5
                              try.again <- try.again + 1
+                         }
                          if (plot.cnv) {
                     -        plot(segment.smoothed.CNA.obj, plot.type="s")
                     -        plot(segment.smoothed.CNA.obj, plot.type="w")
                     +        plot(segment.CNA.obj, plot.type="s")
                     +        plot(segment.CNA.obj, plot.type="w")
+                         }
                     -    return(list(cna=segment.smoothed.CNA.obj, logR=norm.log.ratio))
                     +    return(list(cna=segment.CNA.obj, logR=log.ratio))
+                     }
                      .getSegSizes <- function(seg) {

R/segmentationPSCBS.R

History View file @ 4d22136

@@ -22,8 +22,6 @@
                      #' function.
                      #' @param undo.SD \code{undo.SD} for CBS, see documentation of the
                      #' \code{segment} function. If \code{NULL}, try to find a sensible default.
                     -#' @param drop.outliers If \code{TRUE}, calls the
                     -#' \code{dropSegmentationOutliers} function from PSCBS before segmentation.
                      #' @param flavor Flavor value for PSCBS. See \code{segmentByNonPairedPSCBS}.
                      #' @param tauA tauA argument for PSCBS. See \code{segmentByNonPairedPSCBS}.
                      #' @param vcf Optional VCF object with germline allelic ratios.
@@ -43,11 +41,20 @@
                      #' properly ordered.
                      #' @param centromeres A \code{data.frame} with centromere positions in first
                      #' three columns.  If not \code{NULL}, add breakpoints at centromeres.
                     -#' @param verbose Verbose output.
                      #' @param \dots Additional parameters passed to the
                      #' \code{segmentByNonPairedPSCBS} function.
                      #' @return \code{data.frame} containing the segmentation.
                      #' @author Markus Riester
                     +#' @references Olshen, A. B., Venkatraman, E. S., Lucito, R., Wigler, M.
                     +#' (2004). Circular binary segmentation for the analysis of array-based DNA
                     +#' copy number data. Biostatistics 5: 557-572.
                     +#'
                     +#' Venkatraman, E. S., Olshen, A. B. (2007). A faster circular binary
                     +#' segmentation algorithm for the analysis of array CGH data. Bioinformatics
                     +#' 23: 657-63.
                     +#'
                     +#' Olshen et al. (2011). Parent-specific copy number in paired tumor-normal
                     +#' studies using circular binary segmentation. Bioinformatics.
                      #' @seealso \code{\link{runAbsoluteCN}}
                      #' @examples
                      #'
@@ -72,10 +79,10 @@
                      #' @export segmentationPSCBS
                      segmentationPSCBS <- function(normal, tumor, log.ratio, seg, plot.cnv,
                          min.coverage, sampleid, target.weight.file = NULL, alpha = 0.005, undo.SD =
                     -    NULL, drop.outliers=TRUE, flavor = "tcn&dh", tauA = 0.03, vcf = NULL,
                     +    NULL, flavor = "tcn&dh", tauA = 0.03, vcf = NULL,
                          tumor.id.in.vcf = 1, normal.id.in.vcf = NULL, max.segments = NULL,
                          prune.hclust.h = NULL, prune.hclust.method = "ward.D", chr.hash = NULL,
                     -    centromeres = NULL, verbose = TRUE, ...) {
                     +    centromeres = NULL, ...) {
                          debug <- TRUE
@@ -95,8 +102,7 @@ segmentationPSCBS <- function(normal, tumor, log.ratio, seg, plot.cnv,
                              target.weights <- read.delim(target.weight.file, as.is=TRUE)
                              target.weights <- target.weights[match(as.character(tumor[,1]),
                                  target.weights[,1]),2]
                     -        if (verbose) message(
                     -            "Target weights found, but currently not supported by PSCBS. ",
                     +         flog.info("Target weights found, but currently not supported by PSCBS. %s",
                                  "Will simply exclude targets with low weight.")
                              lowWeightTargets <- target.weights < 1/3
                              well.covered.exon.idx[which(lowWeightTargets)] <- FALSE
@@ -134,7 +140,7 @@ segmentationPSCBS <- function(normal, tumor, log.ratio, seg, plot.cnv,
                          #} else {
                          if (is.null(undo.SD)) {
                              undo.SD <- .getSDundo(log.ratio)
                     -        if (verbose) message("Setting undo.SD parameter to ", undo.SD)
                     +        flog.info("Setting undo.SD parameter to %f.", undo.SD)
+                         }
                          knownSegments <- NULL
                          if (!is.null(centromeres)) {
@@ -145,9 +151,6 @@ segmentationPSCBS <- function(normal, tumor, log.ratio, seg, plot.cnv,
                                  chr.hash)
                              knownSegments <- PSCBS::gapsToSegments(knownSegments)
+                         }
                     -    if (drop.outliers) {
                     -        d.f <- PSCBS::dropSegmentationOutliers(d.f)
                     -    }
                          seg <- PSCBS::segmentByNonPairedPSCBS(d.f, tauA=tauA,
                              flavor=flavor, undoTCN=undo.SD, knownSegments=knownSegments,
                              min.width=3,alphaTCN=alpha, ...)
@@ -157,7 +160,7 @@ segmentationPSCBS <- function(normal, tumor, log.ratio, seg, plot.cnv,
                          if (!is.null(vcf)) {
                              x <- .pruneByHclust(x, vcf, tumor.id.in.vcf, h=prune.hclust.h,
                     -            method=prune.hclust.method, chr.hash=chr.hash, verbose=verbose)
                     +            method=prune.hclust.method, chr.hash=chr.hash)
+                         }
                          x$cna$output
+                     }

R/setMappingBiasVcf.R

History View file @ 4d22136

@@ -19,7 +19,6 @@
                      #' @param smooth Impute mapping bias of variants not found in the panel by
                      #' smoothing of neighboring SNPs. Requires \code{normal.panel.vcf.file}.
                      #' @param smooth.n Number of neighboring variants used for smoothing.
                     -#' @param verbose Verbose output.
                      #' @return A \code{numeric(nrow(vcf))} vector with the mapping bias of for each
                      #' variant in the \code{CollapsedVCF}. Mapping bias is expected as scaling
                      #' factor. Adjusted allelic fraction is (observed allelic fraction)/(mapping
@@ -35,8 +34,7 @@
                      #'
                      #' @export setMappingBiasVcf
                      setMappingBiasVcf <- function(vcf, tumor.id.in.vcf = NULL,
                     -normal.panel.vcf.file = NULL, min.normals = 5, smooth = TRUE, smooth.n = 5,
                     -verbose = TRUE) {
                     +normal.panel.vcf.file = NULL, min.normals = 5, smooth = TRUE, smooth.n = 5) {
                          if (is.null(tumor.id.in.vcf)) {
                              tumor.id.in.vcf <- .getTumorIdInVcf(vcf)
@@ -46,12 +44,12 @@ verbose = TRUE) {
                               normal.id.in.vcf <- .getNormalIdInVcf(vcf, tumor.id.in.vcf)
                               faAll <- as.numeric(geno(vcf)$FA[!info(vcf)$SOMATIC,normal.id.in.vcf])
                               mappingBias <- mean(faAll, na.rm=TRUE)*2
                     -         if (verbose) message("Found SOMATIC annotation in VCF. ",
                     -            "Setting mapping bias to ", round(mappingBias, digits=3))
                     +         flog.info("Found SOMATIC annotation in VCF. Setting mapping bias to %.3f.",
                     +            mappingBias)
+                         }
                          if (is.null(info(vcf)$SOMATIC) && is.null(normal.panel.vcf.file)) {
                     -        message(
                     -            "VCF does not contain somatic status. For best results, consider\n",
                     +        flog.info(
                     +            "VCF does not contain somatic status. For best results, consider%s%s",
                                  "providing normal.panel.vcf.file when matched normals are not ",
                                  "available.")
+                         }
@@ -64,10 +62,9 @@ verbose = TRUE) {
                          if (is.null(normal.panel.vcf.file)) {
                              return(tmp)
+                         }
                     -    nvcf <- .readNormalPanelVcfLarge(vcf, normal.panel.vcf.file,
                     -        verbose=verbose)
                     +    nvcf <- .readNormalPanelVcfLarge(vcf, normal.panel.vcf.file)
                          if (nrow(nvcf) < 1) {
                     -        warning("setMappingBiasVcf: no hits in ", normal.panel.vcf.file, ".")
                     +        flog.warn("setMappingBiasVcf: no hits in %s.", normal.panel.vcf.file)
                              return(tmp)
+                         }
@@ -101,15 +98,15 @@ verbose = TRUE) {
                          as.numeric(y)
+                     }
                     -.readNormalPanelVcfLarge <- function(vcf, normal.panel.vcf.file, max.file.size=1, verbose) {
                     +.readNormalPanelVcfLarge <- function(vcf, normal.panel.vcf.file, max.file.size=1) {
                          genome <- genome(vcf)[1]
                          if (file.size(normal.panel.vcf.file)/1000^3 > max.file.size || nrow(vcf)< 1000) {
                     -        if (verbose) message("Scanning ", normal.panel.vcf.file, "...")
                     +        flog.info("Scanning %s...", normal.panel.vcf.file)
                              nvcf <- readVcf(TabixFile(normal.panel.vcf.file), genome=genome,
                                  ScanVcfParam(which = rowRanges(vcf), info=NA, fixed=NA,
                                  geno="FA"))
                          } else {
                     -        if (verbose) message("Loading ", normal.panel.vcf.file, "...")
                     +        flog.info("Loading %s...", normal.panel.vcf.file)
                              nvcf <- readVcf(normal.panel.vcf.file, genome=genome,
                                  ScanVcfParam(info=NA, fixed=NA, geno="FA"))
                              nvcf <- subsetByOverlaps(nvcf, rowRanges(vcf))

inst/extdata/Coverage.R

History View file @ 4d22136

@@ -4,6 +4,7 @@ library('getopt')
                      spec <- matrix(c(
                      'help' , 'h', 0, "logical",
                     +'version',  'v', 0, "logical",
                      'force' , 'f', 0, "logical",
                      'bam', 'b', 1, "character",
                      'gatkcoverage', 'g', 1, "character",
@@ -18,6 +19,11 @@ if ( !is.null(opt$help) ) {
                          q(status=1)
+                     }
                     +if (!is.null(opt$version)) {
                     +    message(as.character(packageVersion("PureCN")))
                     +    q(status=1)
                     +}
+                    +
                      force <- !is.null(opt$force)
                      bam.file <- opt$bam

inst/extdata/NormalDB.R

History View file @ 4d22136

@@ -4,10 +4,11 @@ library('getopt')
                      spec <- matrix(c(
                      'help' ,  'h', 0, "logical",
                     +'version', 'v', 0, "logical",
                      'force' , 'f', 0, "logical",
                      'gcgene', 'c', 1, "character",
                      'method', 'm', 1, "character",
                     -'coveragefiles', 'v', 1, "character",
                     +'coveragefiles', 'b', 1, "character",
                      'assay',   'a',1, "character",
                      'outdir' , 'o', 1, "character"
                      ), byrow=TRUE, ncol=4)
@@ -18,6 +19,12 @@ if ( !is.null(opt$help) ) {
                          q(status=1)
+                     }
                     +if (!is.null(opt$version)) {
                     +    message(as.character(packageVersion("PureCN")))
                     +    q(status=1)
                     +}
+                    +
+                    +
                      .checkFileList <- function(file) {
                          files <- read.delim(file, as.is=TRUE, header=FALSE)[,1]
                          numExists <- sum(file.exists(files), na.rm=TRUE)

inst/extdata/PureCN.R

History View file @ 4d22136

@@ -4,10 +4,11 @@ library('getopt')
                      spec <- matrix(c(
                      'help',           'h', 0, "logical",
                     +'version',        'v', 0, "logical",
                      'force' ,         'f', 0, "logical",
                      'normal',         'n', 1, "character",
                      'tumor',          't', 1, "character",
                     -'vcf',            'v', 1, "character",
                     +'vcf',            'b', 1, "character",
                      'rds',            'r', 1, "character",
                      'genome',         'g', 1, "character",
                      'gcgene',         'c', 1, "character",
@@ -31,6 +32,11 @@ if ( !is.null(opt$help) ) {
                          q(status=1)
+                     }
                     +if (!is.null(opt$version)) {
                     +    message(as.character(packageVersion("PureCN")))
                     +    q(status=1)
                     +}
+                    +
                      force <- !is.null(opt$force)
                      post.optimize <- !is.null(opt$postoptimize)
                      normal.coverage.file <- opt$normal
@@ -54,7 +60,9 @@ file.rds <- opt$rds
                      if (!is.null(file.rds) && file.exists(file.rds)) {
                          if (is.null(outdir)) outdir <- dirname(file.rds)
                      } else {
                     -    if (is.null(sampleid)) stop("Need sampleid.")
                     +    if (is.null(sampleid)) stop("Need --sampleid.")
                     +    if (is.null(genome)) stop("Need --genome")
                     +    genome <- as.character(genome)
                          file.rds <- file.path(outdir, paste0(sampleid, '_purecn.rds'))
                          if (is.null(seg.file)) {
                              tumor.coverage.file <- normalizePath(tumor.coverage.file,
@@ -94,6 +102,7 @@ if (file.exists(file.rds) && !force) {
                          } else if (is.null(normal.coverage.file) && is.null(seg.file)) {
                              stop("Need either normalDB or normal.coverage.file")
+                         }
                     +    file.log <- file.path(outdir, paste0(sampleid, '_purecn.log'))
                          pdf(paste(outdir,"/", sampleid, '_purecn_segmentation.pdf', sep=''),
                              width=10, height=11)
@@ -107,7 +116,7 @@ if (file.exists(file.rds) && !force) {
                                  args.setMappingBiasVcf=
                                      list(normal.panel.vcf.file=normal.panel.vcf.file),
                                  normalDB=normalDB, model.homozygous=model.homozygous,
                     -            post.optimize=post.optimize)
                     +            log.file=file.log, post.optimize=post.optimize)
                          dev.off()
                          saveRDS(ret, file=file.rds)
+                     }

inst/unitTests/test_getSexFromCoverage.R

History View file @ 4d22136

@@ -1,9 +1,9 @@
                      test_getSexFromCoverage <- function() {
                          tumor.coverage.file <- system.file("extdata", "example_tumor.txt", package="PureCN")
                          coverage <- readCoverageGatk(tumor.coverage.file)
                     -    sex <- getSexFromCoverage(coverage, verbose=FALSE)
                     +    sex <- getSexFromCoverage(coverage)
                          checkTrue(is.na(sex))
                     -    sex <- getSexFromCoverage(tumor.coverage.file, verbose=FALSE)
                     +    sex <- getSexFromCoverage(tumor.coverage.file)
                          checkTrue(is.na(sex))
                          chr22 <- coverage[which(coverage$chr=="chr22"),]

inst/unitTests/test_runAbsoluteCN.R

History View file @ 4d22136

@@ -240,7 +240,7 @@ test_runAbsoluteCN <- function() {
                          # test with a log.ratio and no tumor file
                          log.ratio <- calculateLogRatio(readCoverageGatk(normal.coverage.file),
                     -        readCoverageGatk(tumor.coverage.file), verbose=FALSE)
                     +        readCoverageGatk(tumor.coverage.file))
                          ret <- runAbsoluteCN( log.ratio=log.ratio,
                              gc.gene.file=gc.gene.file,

man/calculateLogRatio.Rd

History View file @ 4d22136

@@ -4,7 +4,7 @@
                      \alias{calculateLogRatio}
                      \title{Calculate coverage log-ratio of tumor vs. normal}
                      \usage{
                     -calculateLogRatio(normal, tumor, verbose = TRUE)
                     +calculateLogRatio(normal, tumor)
+                     }
                      \arguments{
                      \item{normal}{Normal coverage read in by the \code{\link{readCoverageGatk}}
@@ -12,8 +12,6 @@ function.}
                      \item{tumor}{Tumor coverage read in by the \code{\link{readCoverageGatk}}
                      function.}
+                    -
                     -\item{verbose}{Verbose output.}
+                     }
                      \value{
                      \code{numeric(nrow(tumor))}, tumor vs. normal copy number log-ratios
@@ -33,7 +31,7 @@ tumor.coverage.file <- system.file("extdata", "example_tumor.txt",
                          package="PureCN")
                      normal <- readCoverageGatk(normal.coverage.file)
                      tumor <- readCoverageGatk(tumor.coverage.file)
                     -log.ratio <- calculateLogRatio(normal, tumor, verbose=FALSE)
                     +log.ratio <- calculateLogRatio(normal, tumor)
+                     }
                      \author{

man/calculatePowerDetectSomatic.Rd

History View file @ 4d22136

@@ -70,7 +70,7 @@ legend("bottomright", legend=paste("Purity", purity), fill=seq_along(purity))
 Markus Riester
 }
 \references{
-Carter et al., Absolute quantification of somatic DNA
-alterations in human cancer. Nature Biotechnology 2012.
+Carter et al. (2012), Absolute quantification of somatic DNA
+alterations in human cancer. Nature Biotechnology.
 }
 

man/createTargetWeights.Rd

History View file @ 4d22136

@@ -5,7 +5,7 @@
                      \title{Calculate target weights}
                      \usage{
                      createTargetWeights(tumor.coverage.files, normal.coverage.files,
                     -  target.weight.file, verbose = TRUE)
                     +  target.weight.file)
+                     }
                      \arguments{
                      \item{tumor.coverage.files}{A small number (1-3) of GATK tumor or normal
@@ -16,8 +16,6 @@ coverage samples.}
                      with files in \code{tumor.coverage.files}.}
                      \item{target.weight.file}{Output filename.}
+                    -
                     -\item{verbose}{Verbose output.}
+                     }
                      \value{
                      A \code{data.frame} with target weights.

man/filterTargets.Rd

History View file @ 4d22136

@@ -5,8 +5,7 @@
                      \title{Remove low quality targets}
                      \usage{
                      filterTargets(log.ratio, tumor, gc.data, seg.file, filter.lowhigh.gc = 0.001,
                     -  min.targeted.base = 5, normalDB = NULL, normalDB.min.coverage = 0.2,
                     -  verbose)
                     +  min.targeted.base = 5, normalDB = NULL, normalDB.min.coverage = 0.2)
+                     }
                      \arguments{
                      \item{log.ratio}{Copy number log-ratios, one for each target or interval in
@@ -33,8 +32,6 @@ likely very different from the true GC content of the probes.}
                      \item{normalDB.min.coverage}{Exclude targets with coverage lower than 20
                      percent of the chromosome median in the pool of normals.}
+                    -
                     -\item{verbose}{Verbose output.}
+                     }
                      \value{
                      \code{logical(length(log.ratio))} specifying which targets should be

man/filterVcfBasic.Rd

History View file @ 4d22136

@@ -6,10 +6,10 @@
                      \usage{
                      filterVcfBasic(vcf, tumor.id.in.vcf = NULL, use.somatic.status = TRUE,
                        snp.blacklist = NULL, af.range = c(0.03, 0.97),
                     -  contamination.cutoff = c(0.05, 0.075), min.coverage = 15,
                     +  contamination.cutoff = c(0.075, 0.02), min.coverage = 15,
                        min.base.quality = 25, min.supporting.reads = NULL, error = 0.001,
                        target.granges = NULL, remove.off.target.snvs = TRUE,
                     -  model.homozygous = FALSE, interval.padding = 50, verbose = TRUE)
                     +  model.homozygous = FALSE, interval.padding = 50)
+                     }
                      \arguments{
                      \item{vcf}{\code{CollapsedVCF} object, read in with the \code{readVcf}
@@ -34,8 +34,9 @@ contamination. If a matched normal is available, this value is ignored,
                      because homozygosity can be confirmed in the normal.}
                      \item{contamination.cutoff}{Count SNPs in dbSNP with allelic fraction
                     -smaller than the first value, if found on most chromosomes, remove all with
                     -AF smaller than the second value.}
                     +smaller than the first value or greater than 1-first value, if found on most chromosomes, mark sample
                     +as contaminated if the fraction of putative contamination SNPs exceeds
                     +the second value.}
                      \item{min.coverage}{Minimum coverage in tumor. Variants with lower coverage
                      are ignored.}
@@ -62,8 +63,6 @@ SNPs. Ignored in case a matched normal is provided in the VCF.}
                      \item{interval.padding}{Include variants in the interval flanking regions of
                      the specified size in bp. Requires \code{target.granges}.}
+                    -
                     -\item{verbose}{Verbose output.}
+                     }
                      \value{
                      A list with elements \item{vcf}{The filtered \code{CollapsedVCF}

man/filterVcfMuTect.Rd

History View file @ 4d22136

@@ -8,7 +8,7 @@ filterVcfMuTect(vcf, tumor.id.in.vcf = NULL, stats.file = NULL,
                        ignore = c("clustered_read_position", "fstar_tumor_lod",
                        "nearby_gap_events", "poor_mapping_region_alternate_allele_mapq",
                        "poor_mapping_region_mapq0", "possible_contamination", "strand_artifact",
                     -  "seen_in_panel_of_normals"), verbose = TRUE, ...)
                     +  "seen_in_panel_of_normals"), ...)
+                     }
                      \arguments{
                      \item{vcf}{\code{CollapsedVCF} object, read in with the \code{readVcf}
@@ -20,8 +20,6 @@ function from the VariantAnnotation package.}
                      \item{ignore}{MuTect flags that mark variants for exclusion.}
                     -\item{verbose}{Verbose output.}
+                    -
                      \item{\dots}{Additional arguments passed to \code{\link{filterVcfBasic}}.}
+                     }
                      \value{

man/findBestNormal.Rd

History View file @ 4d22136

@@ -6,8 +6,7 @@
                      \usage{
                      findBestNormal(tumor.coverage.file, normalDB, pcs = 1:3, num.normals = 1,
                        ignore.sex = FALSE, sex = NULL, normal.coverage.files = NULL,
                     -  pool = FALSE, pool.weights = c("voom", "equal"), plot.pool = FALSE,
                     -  verbose = TRUE, ...)
                     +  pool = FALSE, pool.weights = c("voom", "equal"), plot.pool = FALSE, ...)
+                     }
                      \arguments{
                      \item{tumor.coverage.file}{GATK coverage file of a tumor sample.}
@@ -39,8 +38,6 @@ weight all best normals equally.}
                      \item{plot.pool}{Allows the pooling function to create plots.}
                     -\item{verbose}{Verbose output.}
+                    -
                      \item{\dots}{Additional arguments passed to \code{\link{poolCoverage}}.}
+                     }
                      \value{

man/getSexFromCoverage.Rd

History View file @ 4d22136

@@ -5,7 +5,7 @@
                      \title{Get sample sex from coverage}
                      \usage{
                      getSexFromCoverage(coverage.file, min.ratio = 25, min.ratio.na = 20,
                     -  remove.outliers = TRUE, verbose = TRUE)
                     +  remove.outliers = TRUE)
+                     }
                      \arguments{
                      \item{coverage.file}{GATK coverage file or data read with
@@ -23,8 +23,6 @@ be considered when setting cutoffs.}
                      \item{remove.outliers}{Removes coverage outliers before calculating mean
                      chromosome coverages.}
+                    -
                     -\item{verbose}{Verbose output.}
+                     }
                      \value{
                      Returns a \code{character(1)} with \code{M} for male, \code{F} for

man/getSexFromVcf.Rd

History View file @ 4d22136

@@ -5,8 +5,7 @@
                      \title{Get sample sex from a VCF file}
                      \usage{
                      getSexFromVcf(vcf, tumor.id.in.vcf = NULL, min.or = 4, min.or.na = 2.5,
                     -  max.pv = 0.001, homozygous.cutoff = 0.95, af.cutoff = 0.2,
                     -  verbose = TRUE)
                     +  max.pv = 0.001, homozygous.cutoff = 0.95, af.cutoff = 0.2)
+                     }
                      \arguments{
                      \item{vcf}{CollapsedVCF object, read in with the \code{readVcf} function
@@ -29,8 +28,6 @@ homozygous.}
                      \item{af.cutoff}{Remove all SNVs with allelic fraction lower than the
                      specified value.}
+                    -
                     -\item{verbose}{Verbose output.}
+                     }
                      \value{
                      Returns a \code{character(1)} with \code{M} for male, \code{F} for

man/readCurationFile.Rd

History View file @ 4d22136

@@ -6,7 +6,7 @@
                      \usage{
                      readCurationFile(file.rds, file.curation = gsub(".rds$", ".csv", file.rds),
                        remove.failed = FALSE, report.best.only = FALSE, min.ploidy = NULL,
                     -  max.ploidy = NULL, verbose = FALSE)
                     +  max.ploidy = NULL)
+                     }
                      \arguments{
                      \item{file.rds}{Output of the \code{\link{runAbsoluteCN}} function,
@@ -25,8 +25,6 @@ be used to automatically ignore unlikely solutions.}
                      \item{max.ploidy}{Maximum ploidy to be considered. If \code{NULL}, all. Can
                      be used to automatically ignore unlikely solutions.}
+                    -
                     -\item{verbose}{Verbose output.}
+                     }
                      \value{
                      The return value of the corresponding \code{\link{runAbsoluteCN}}

man/runAbsoluteCN.Rd

History View file @ 4d22136

@@ -20,10 +20,11 @@ runAbsoluteCN(normal.coverage.file = NULL, tumor.coverage.file = NULL,
                        max.coverage.vcf = 300, max.non.clonal = 0.2,
                        max.homozygous.loss = c(0.05, 1e+07), non.clonal.M = 1/3,
                        max.mapping.bias = 0.8, iterations = 30, log.ratio.calibration = 0.25,
                     -  remove.off.target.snvs = NULL, model.homozygous = FALSE, error = 0.001,
                     -  gc.gene.file = NULL, max.dropout = c(0.95, 1.1), max.logr.sdev = 0.75,
                     -  max.segments = 300, min.gof = 0.8, plot.cnv = TRUE,
                     -  cosmic.vcf.file = NULL, post.optimize = FALSE, verbose = TRUE)
                     +  smooth.log.ratio = TRUE, remove.off.target.snvs = NULL,
                     +  model.homozygous = FALSE, error = 0.001, gc.gene.file = NULL,
                     +  max.dropout = c(0.95, 1.1), max.logr.sdev = 0.75, max.segments = 300,
                     +  min.gof = 0.8, plot.cnv = TRUE, cosmic.vcf.file = NULL,
                     +  post.optimize = FALSE, log.file = NULL, verbose = TRUE)
+                     }
                      \arguments{
                      \item{normal.coverage.file}{GATK coverage file of normal control (optional
@@ -85,7 +86,7 @@ to \code{\link{filterVcfMuTect}}, which in turn also calls
                      \item{args.filterVcf}{Arguments for variant filtering function. Arguments
                      \code{vcf}, \code{tumor.id.in.vcf}, \code{min.coverage},
                     -\code{model.homozygous}, \code{error} and \code{verbose} are required in the
                     +\code{model.homozygous} and \code{error} are required in the
                      filter function and are automatically set.}
                      \item{fun.setPriorVcf}{Function to set prior for somatic status for each
@@ -103,17 +104,17 @@ coverage files. Needs to return a \code{logical} vector whether an interval
                      should be used for segmentation. Defaults to \code{\link{filterTargets}}.}
                      \item{args.filterTargets}{Arguments for target filtering function. Arguments
                     -\code{log.ratio}, \code{tumor}, \code{gc.data}, \code{seg.file},
                     -\code{normalDB} and \code{verbose} are required and automatically set}
                     +\code{log.ratio}, \code{tumor}, \code{gc.data}, \code{seg.file} and
                     +\code{normalDB} are required and automatically set}
                      \item{fun.segmentation}{Function for segmenting the copy number log-ratios.
                      Expected return value is a \code{data.frame} representation of the
                      segmentation. Defaults to \code{\link{segmentationCBS}}.}
                      \item{args.segmentation}{Arguments for segmentation function. Arguments
                     -\code{normal}, \code{tumor}, \code{log.ratio}, \code{plot.cnv},
                     +\code{normal}, \code{tumor}, \code{log.ratio}, \code{plot.cnv} and
                      \code{min.coverage}, \code{sampleid}, \code{vcf}, \code{tumor.id.in.vcf},
                     -\code{centromeres} and \code{verbose} are required in the segmentation function
                     +\code{centromeres} are required in the segmentation function
                      and automatically set.}
                      \item{fun.focal}{Function for identifying focal amplifications. Defaults to
@@ -187,6 +188,9 @@ that should converge quickly. Allowed range is 10 to 250.}
                      \item{log.ratio.calibration}{Re-calibrate log-ratios in the window
                      \code{sd(log.ratio)*log.ratio.calibration}.}
                     +\item{smooth.log.ratio}{Smooth \code{log.ratio} using the \code{DNAcopy}
                     +package.}
+                    +
                      \item{remove.off.target.snvs}{Deprecated. Use the corresponding argument in
                      \code{args.filterVcf}.}
@@ -238,6 +242,8 @@ typically result in a slightly more accurate purity, especially for rather
                      silent genomes or very low purities. Otherwise, it will just use the purity
                      determined via the SCNA-fit.}
                     +\item{log.file}{If not \code{NULL}, store verbose output to file.}
+                    +
                      \item{verbose}{Verbose output.}
+                     }
                      \value{
@@ -292,6 +298,14 @@ res <- runAbsoluteCN(seg.file=seg.file, fun.segmentation=funSeg, max.ploidy = 4,
                      \author{
                      Markus Riester
+                     }
                     +\references{
                     +Riester et al. (2016). PureCN: Copy number calling and SNV
                     +classification using targeted short read sequencing. Source Code for Biology
                     +and Medicine, 11, pp. 13.
+                    +
                     +Carter et al. (2012), Absolute quantification of somatic DNA alterations in
                     +human cancer. Nature Biotechnology.
                     +}
                      \seealso{
                      \code{\link{correctCoverageBias}} \code{\link{segmentationCBS}}
                      \code{\link{calculatePowerDetectSomatic}}

man/segmentationCBS.Rd

History View file @ 4d22136

@@ -8,7 +8,7 @@ segmentationCBS(normal, tumor, log.ratio, seg, plot.cnv, min.coverage, sampleid,
                        target.weight.file = NULL, alpha = 0.005, undo.SD = NULL, vcf = NULL,
                        tumor.id.in.vcf = 1, normal.id.in.vcf = NULL, max.segments = NULL,
                        prune.hclust.h = NULL, prune.hclust.method = "ward.D", chr.hash = NULL,
                     -  centromeres = NULL, verbose = TRUE)
                     +  centromeres = NULL)
+                     }
                      \arguments{
                      \item{normal}{GATK coverage data for normal sample.}
@@ -60,8 +60,6 @@ properly ordered.}
                      \item{centromeres}{A \code{data.frame} with centromere positions in first
                      three columns.  Currently not supported in this function.}
+                    -
                     -\item{verbose}{Verbose output.}
+                     }
                      \value{
                      \code{data.frame} containing the segmentation.
@@ -95,6 +93,15 @@ ret <-runAbsoluteCN(normal.coverage.file=normal.coverage.file,
                      \author{
                      Markus Riester
+                     }
                     +\references{
                     +Olshen, A. B., Venkatraman, E. S., Lucito, R., Wigler, M.
                     +(2004). Circular binary segmentation for the analysis of array-based DNA
                     +copy number data. Biostatistics 5: 557-572.
+                    +
                     +Venkatraman, E. S., Olshen, A. B. (2007). A faster circular binary
                     +segmentation algorithm for the analysis of array CGH data. Bioinformatics
                     +23: 657-63.
                     +}
                      \seealso{
                      \code{\link{runAbsoluteCN}}
+                     }

man/segmentationPSCBS.Rd

History View file @ 4d22136

@@ -6,10 +6,10 @@
                      \usage{
                      segmentationPSCBS(normal, tumor, log.ratio, seg, plot.cnv, min.coverage,
                        sampleid, target.weight.file = NULL, alpha = 0.005, undo.SD = NULL,
                     -  drop.outliers = TRUE, flavor = "tcn&dh", tauA = 0.03, vcf = NULL,
                     -  tumor.id.in.vcf = 1, normal.id.in.vcf = NULL, max.segments = NULL,
                     -  prune.hclust.h = NULL, prune.hclust.method = "ward.D", chr.hash = NULL,
                     -  centromeres = NULL, verbose = TRUE, ...)
                     +  flavor = "tcn&dh", tauA = 0.03, vcf = NULL, tumor.id.in.vcf = 1,
                     +  normal.id.in.vcf = NULL, max.segments = NULL, prune.hclust.h = NULL,
                     +  prune.hclust.method = "ward.D", chr.hash = NULL, centromeres = NULL,
                     +  ...)
+                     }
                      \arguments{
                      \item{normal}{GATK coverage data for normal sample.}
@@ -38,9 +38,6 @@ function.}
                      \item{undo.SD}{\code{undo.SD} for CBS, see documentation of the
                      \code{segment} function. If \code{NULL}, try to find a sensible default.}
                     -\item{drop.outliers}{If \code{TRUE}, calls the
                     -\code{dropSegmentationOutliers} function from PSCBS before segmentation.}
+                    -
                      \item{flavor}{Flavor value for PSCBS. See \code{segmentByNonPairedPSCBS}.}
                      \item{tauA}{tauA argument for PSCBS. See \code{segmentByNonPairedPSCBS}.}
@@ -70,8 +67,6 @@ properly ordered.}
                      \item{centromeres}{A \code{data.frame} with centromere positions in first
                      three columns.  If not \code{NULL}, add breakpoints at centromeres.}
                     -\item{verbose}{Verbose output.}
+                    -
                      \item{\dots}{Additional parameters passed to the
                      \code{segmentByNonPairedPSCBS} function.}
+                     }
@@ -108,6 +103,18 @@ gc.gene.file <- system.file("extdata", "example_gc.gene.file.txt",
                      \author{
                      Markus Riester
+                     }
                     +\references{
                     +Olshen, A. B., Venkatraman, E. S., Lucito, R., Wigler, M.
                     +(2004). Circular binary segmentation for the analysis of array-based DNA
                     +copy number data. Biostatistics 5: 557-572.
+                    +
                     +Venkatraman, E. S., Olshen, A. B. (2007). A faster circular binary
                     +segmentation algorithm for the analysis of array CGH data. Bioinformatics
                     +23: 657-63.
+                    +
                     +Olshen et al. (2011). Parent-specific copy number in paired tumor-normal
                     +studies using circular binary segmentation. Bioinformatics.
                     +}
                      \seealso{
                      \code{\link{runAbsoluteCN}}
+                     }

man/setMappingBiasVcf.Rd

History View file @ 4d22136

@@ -5,7 +5,7 @@
                      \title{Set Mapping Bias VCF}
                      \usage{
                      setMappingBiasVcf(vcf, tumor.id.in.vcf = NULL, normal.panel.vcf.file = NULL,
                     -  min.normals = 5, smooth = TRUE, smooth.n = 5, verbose = TRUE)
                     +  min.normals = 5, smooth = TRUE, smooth.n = 5)
+                     }
                      \arguments{
                      \item{vcf}{\code{CollapsedVCF} object, read in with the \code{readVcf}
@@ -26,8 +26,6 @@ calculating position-specific mapping bias. Requires
                      smoothing of neighboring SNPs. Requires \code{normal.panel.vcf.file}.}
                      \item{smooth.n}{Number of neighboring variants used for smoothing.}
+                    -
                     -\item{verbose}{Verbose output.}
+                     }
                      \value{
                      A \code{numeric(nrow(vcf))} vector with the mapping bias of for each

man/setPriorVcf.Rd

History View file @ 4d22136

@@ -5,7 +5,7 @@
                      \title{Set Somatic Prior VCF}
                      \usage{
                      setPriorVcf(vcf, prior.somatic = c(0.5, 5e-04, 0.999, 1e-04, 0.995, 0.01),
                     -  tumor.id.in.vcf = NULL, verbose = TRUE)
                     +  tumor.id.in.vcf = NULL)
+                     }
                      \arguments{
                      \item{vcf}{\code{CollapsedVCF} object, read in with the \code{readVcf}
@@ -24,8 +24,6 @@ value is for the case that variant is in both dbSNP and COSMIC > 2.}
                      \item{tumor.id.in.vcf}{Id of tumor in case multiple samples are stored in
                      VCF.}
+                    -
                     -\item{verbose}{Verbose output.}
+                     }
                      \value{
                      A \code{numeric(nrow(vcf))} vector with the prior probability of

vignettes/PureCN.Rnw

History View file @ 4d22136

@@ -326,10 +326,10 @@ present in 5 or more samples.
                      We next use coverage data of normal samples to estimate the expected variance
                      in coverage per target:
                     -<<targetweightfile1>>=
                     +<<targetweightfile1, message=FALSE>>=
                      target.weight.file <- "target_weights.txt"
                      createTargetWeights(tumor.coverage.file, normal.coverage.files,
                     -    target.weight.file, verbose=FALSE)
                     +    target.weight.file)
+                     @
                      This function calculates target-level copy number log-ratios using all normal
@@ -858,11 +858,11 @@ target-level copy number log-ratios, and \Biocpkg{PureCN} should be used for
                      segmentation and purity/ploidy inference only, it is possible to provide these
                      log-ratios:
                     -<<customlogratio>>=
                     +<<customlogratio, message=FALSE>>=
                      # We still use the log-ratio exactly as normalized by PureCN for this
                      # example
                      log.ratio <- calculateLogRatio(readCoverageGatk(normal.coverage.file),
                     -    readCoverageGatk(tumor.coverage.file), verbose=FALSE)
                     +    readCoverageGatk(tumor.coverage.file))
                      retLogRatio <- runAbsoluteCN(log.ratio=log.ratio,
                          gc.gene.file=gc.gene.file, vcf.file=vcf.file,
@@ -964,6 +964,7 @@ Rscript Coverage.R --outdir ~/tmp/ --gatkcoverage example_tumor.txt \
                      Argument name       & Corresponding PureCN argument & PureCN function \\
                      \midrule
                      --help -h           & & \\
                     +--version -v          & & \\
                      --force -f          & & \\
                      --outdir -o         & & \\
                      --bam -b            & bam.file & \Rfunction{calculateBamCoverageByInterval} \\
@@ -1010,11 +1011,12 @@ Rscript PureCN.R --rds Sample1_purecn.rds
                      Argument name       & Corresponding PureCN argument & PureCN function \\
                      \midrule
                      --help -h           & & \\
                     +--version -v        & & \\
                      --force -f          & & \\
                      --outdir -o         &          &                            \\
                      --normal -n         & normal.coverage.file   & \Rfunction{runAbsoluteCN}  \\
                      --tumor -t          & tumor.coverage.file     & \Rfunction{runAbsoluteCN} \\
                     +--vcf -b            & vcf.file          & \Rfunction{runAbsoluteCN}   \\
                      --rds -r            & file.rds          & \Rfunction{readCurationFile}   \\
                      --genome -g         & genome             & \Rfunction{runAbsoluteCN}  \\
                      --gcgene -c         & gc.gene.file      & \Rfunction{runAbsoluteCN}   \\

...	...	@@ -70,7 +70,7 @@ legend("bottomright", legend=paste("Purity", purity), fill=seq_along(purity))
70	70	Markus Riester
71	71	}
72	72	\references{
73		-Carter et al., Absolute quantification of somatic DNA
74		-alterations in human cancer. Nature Biotechnology 2012.
	73	+Carter et al. (2012), Absolute quantification of somatic DNA
	74	+alterations in human cancer. Nature Biotechnology.
75	75	}
76	76