Bioconductor Code: motifTestR

Raw Blame Patch Log History
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/testMotifPos.R
\name{testMotifPos}
\alias{testMotifPos}
\title{Test for a Uniform Distribution across a set of best matches}
\usage{
testMotifPos(
  x,
  stringset,
  binwidth = 10,
  abs = FALSE,
  rc = TRUE,
  min_score = "80\%",
  break_ties = "all",
  alt = c("greater", "less", "two.sided"),
  sort_by = c("p", "none"),
  mc.cores = 1,
  ...
)
}
\arguments{
\item{x}{A Position Weight Matrix, universalmotif object or list thereof.
Alternatively can be a single DataFrame or list of DataFrames as returned
by \link{getPwmMatches} with \code{best_only = TRUE}}

\item{stringset}{An XStringSet. Not required if matches are supplied as x}

\item{binwidth}{Width of bins across the range to group data into}

\item{abs}{Use absolute positions around zero to find symmetrical enrichment}

\item{rc}{logical(1) Also find matches using the reverse complement of each
PWM}

\item{min_score}{The minimum score to return a match}

\item{break_ties}{Choose how to resolve matches with tied scores}

\item{alt}{Alternative hypothesis for the binomial test}

\item{sort_by}{Column to sort results by}

\item{mc.cores}{Passed to \link[parallel]{mclapply}}

\item{...}{Passed to \link[Biostrings]{matchPWM}}
}
\value{
A data.frame with columns \code{start}, \code{end}, \code{centre}, \code{width}, \code{total_matches},
\code{matches_in_region}, \code{expected}, \code{enrichment}, \code{prop_total}, \code{p}
and \code{consensus_motif}
The total matches represent the total number of matches within the set of
sequences, whilst the number observed in the final region are also given,
along with the proportion of the total this represents.
Enrichment is simply the ratio of observed to expected based on the
expectation of the null hypothesis

The consensus motif across all matches is returned as a Position Frequency
Matrix (PFM) using \link[Biostrings]{consensusMatrix}.
}
\description{
Test for a Uniform Distribution across a set of best matches
}
\details{
This function tests for an even positional spread of motif matches across a
set of sequences, using the assumption (i.e. H~0~) that if there is no
positional bias, matches will be evenly distributed across all positions
within a set of sequences.
Conversely, if there is positional bias, typically but not necessarily near
the centre of a range, this function intends to detect this signal, as a
rejection of the null hypothesis.

Input can be provided as the output from \link{getPwmMatches} setting
\code{best_only = TRUE} if these matches have already been identified.
If choosing to provide this object to the argument \code{matches}, nothing is required
for the arguments \code{pwm}, \code{stringset}, \code{rc}, \code{min_score} or \code{break_ties}
Otherwise, a Position Weight Matrix (PWM) and an \code{XStringSet} are required,
along with the relevant arguments, with best matches identified within the
function.

The set of best matches are then grouped into bins along the range, with the
central bin containing zero, and tallied.
Setting \code{abs} to \code{TRUE} will set all positions from the centre as
\emph{absolute values}, returning counts purely as bins with distances from zero,
marking this as an inclusive lower bound.
Motif alignments are assigned into bins based on the central position of the
match, as provided in the column \code{from_centre} when calling
\link{getPwmMatches}.

The \link[stats]{binom.test} is performed on each bin using the alternative
hypothesis, with the returned p-values across all bins combined using the
Harmonic Mean p-value (HMP) (See \link[harmonicmeanp]{p.hmp}).
All bins with raw p-values below the HMP are identified and the returned
values for start, end, centre, width, matches in region, expected and
enrichment are across this set of bins.
The expectation is that where a positional bias is evident, this will be a
narrow range containing a non-trivial proportion of the total matches.

It should also be noted that \code{binom.test()} can return p-values of zero, as
beyond machine precision. In these instances, zero p-values are excluded from
calculation of the HMP. This will give a very slight conservative bias, and
assumes that for these extreme cases, neighbouring bins are highly likely to
also return extremely low p-values and no significance will be lost.
}
\examples{
## Load the example PWM
data("ex_pfm")
esr1 <- ex_pfm$ESR1

## Load the example sequences
data("ar_er_seq")

## Get the best match and use this data
matches <- getPwmMatches(esr1, ar_er_seq, best_only = TRUE)
## Test for enrichment in any position
testMotifPos(matches)

## Provide a list of PWMs, testing for distance from zero
testMotifPos(ex_pfm, ar_er_seq, abs = TRUE, binwidth = 10)


}