\name{assignReads}

\alias{assignReads}
\alias{removeReads}


\title{
  Assign reads to the edges of a SplicingGraphs object
}

\description{
  \code{assignReads} assigns reads to the exonic and intronic edges of a
  \link{SplicingGraphs} object.

  \code{removeReads} removes all the reads assigned to the exonic and
  intronic edges of a \link{SplicingGraphs} object.
}

\usage{
assignReads(sg, reads, sample.name=NA)

removeReads(sg)
}

\arguments{
  \item{sg}{
    A \link{SplicingGraphs} object.
  }
  \item{reads}{
    A \link[GenomicAlignments]{GAlignments},
    \link[GenomicAlignments]{GAlignmentPairs}, or
    \link[GenomicRanges]{GRangesList} object, containing the
    reads to assign to the exons and introns in \code{sg}.
    It must have unique names on it, typically the QNAME ("query name")
    field coming from the BAM file. More on this in the 'About the read
    names' section below.
  }
  \item{sample.name}{
    A single string containing the name of the sample where the reads
    are coming from.
  }
}

\details{
  TODO
}

\section{About read names}{
  The read names are typically imported from the BAM file by calling
  \code{\link[GenomicAlignments]{readGAlignments}} (or
  \code{\link[GenomicAlignments]{readGAlignmentPairs}}) with
  \code{use.names=TRUE}. This extracts the "query names" from the
  file (stored in the QNAME field), and makes them the names of the
  returned object.

  The \code{reads} object must have unique names on it. The presence of
  duplicated names generally indicates one (or both) of the following
  situations:

  \itemize{
    \item (a) \code{reads} contains paired-end reads that have not been
              paired;
    \item (b) some of the reads are \emph{secondary alignments}.
  }

  If (a): you can find out whether reads in a BAM file are single- or
  paired-end with the \code{\link[Rsamtools]{quickBamFlagSummary}} utility
  from the \pkg{Rsamtools} package. If they're paired-end, load them
  with \code{\link[GenomicAlignments]{readGAlignmentPairs}}
  instead of \code{\link[GenomicAlignments]{readGAlignments}}, and that
  will pair them.

  If (b): you can filter out secondary alignments by passing
  \code{'isSecondaryAlignment=FALSE'} to \code{\link[Rsamtools]{scanBamFlag}}
  when preparing the \link[Rsamtools]{ScanBamParam} object used to load
  the reads. For example:
  \preformatted{
    library(Rsamtools)
    flag0 <- scanBamFlag(isSecondaryAlignment=FALSE,
                         isNotPassingQualityControls=FALSE,
                         isDuplicate=FALSE)
    param0 <- ScanBamParam(flag=flag0)
    reads <- readGAlignments("path/to/BAM/file", use.names=TRUE,
                                  param=param0)
  }
  This will filter out records that have flag 0x100 (secondary alignment)
  set to 1. See \code{?\link[Rsamtools]{scanBamFlag}} in the \pkg{Rsamtools}
  package for more information.
  See the SAM Specs on the SAMtools project page at
  \url{http://samtools.sourceforge.net/} for a description of the
  SAM/BAM flags.
}

\value{
  For \code{assignReads}: the supplied \link{SplicingGraphs} object with
  the reads assigned to it.

  For \code{removeReads}: the supplied \link{SplicingGraphs} object with
  all reads previously assigned with \code{assignReads} removed from it.
}

\author{
  H. Pages
}

\seealso{
  This man page is part of the \pkg{SplicingGraphs} package.
  Please see \code{?`\link{SplicingGraphs-package}`} for an overview of the
  package and for an index of its man pages.

  Other topics related to this man page and documented in other packages:
  \itemize{
    \item The \link[GenomicRanges]{GRangesList} class defined in the
          \pkg{GenomicRanges} package.

    \item The \link[GenomicAlignments]{GAlignments} and
          \link[GenomicAlignments]{GAlignmentPairs} classes, as well as
          the \code{\link[GenomicAlignments]{readGAlignments}} and
          \code{\link[GenomicAlignments]{readGAlignmentPairs}} functions
          defined in the \pkg{GenomicAlignments} package.

    \item The \code{\link[Rsamtools]{quickBamFlagSummary}} and
          \code{\link[Rsamtools]{ScanBamParam}} functions defined in the
          \pkg{Rsamtools} package.
  }
}

\examples{
## ---------------------------------------------------------------------
## 1. Make SplicingGraphs object 'sg' from toy gene model (see
##    '?SplicingGraphs')
## ---------------------------------------------------------------------
example(SplicingGraphs) 
sg

## 'sg' has 1 element per gene and 'names(sg)' gives the gene ids.
names(sg)

## ---------------------------------------------------------------------
## 2. Load toy reads
## ---------------------------------------------------------------------
## Load toy reads (single-end) from a BAM file. We filter out secondary
## alignments, reads not passing quality controls, and PCR or optical
## duplicates (see ?scanBamFlag in the Rsamtools package for more
## information):
library(Rsamtools)
flag0 <- scanBamFlag(isSecondaryAlignment=FALSE,
                     isNotPassingQualityControls=FALSE,
                     isDuplicate=FALSE)
param0 <- ScanBamParam(flag=flag0)
gal <- readGAlignments(toy_reads_bam(), use.names=TRUE, param=param0)
gal

## ---------------------------------------------------------------------
## 3. Assign the reads to the exons and introns in 'sg'
## ---------------------------------------------------------------------
## The same read can be assigned to more than 1 exon or intron (e.g. a
## junction read with 1 gap can be assigned to 2 exons and 1 intron).
sg <- assignReads(sg, gal, sample.name="TOYREADS")

## See the assignments to the splicing graph edges.
edge_by_tx <- sgedgesByTranscript(sg, with.hits.mcols=TRUE)
edge_data <- mcols(unlist(edge_by_tx))
colnames(edge_data)
head(edge_data)
edge_data[ , c("sgedge_id", "TOYREADS.hits")]

edge_by_gene <- sgedgesByGene(sg, with.hits.mcols=TRUE)
mcols(unlist(edge_by_gene))

## See the assignments to the reduced splicing graph edges.
redge_by_gene <- rsgedgesByGene(sg, with.hits.mcols=TRUE)
mcols(unlist(redge_by_gene))

## ---------------------------------------------------------------------
## 4. Summarize the reads assigned to 'sg' and eventually remove them
## ---------------------------------------------------------------------
## See '?countReads'.
}