R/909.CellIndexMaps.R
096f7f6a
 #########################################################################/**
f1d6fcf0
 # @RdocDocumentation "9. Advanced - Cell-index maps for reading and writing"
096f7f6a
 #
 # \description{
 #   This part defines read and write maps that can be used to remap 
 #   cell indices before reading and writing data from and to file,
 #   respectively.
 #
 #   This package provides methods to create read and write (cell-index) 
 #   maps from Affymetrix CDF files.  These can be used to store the cell
 #   data in an optimal order so that when data is read it is read in
 #   contiguous blocks, which is faster.
63b4c964
 #
 #   In addition to this, read maps may also be used to read CEL files that
 #   have been "reshuffled" by other software.  For instance, the dChip 
 #   software (\url{http://www.dchip.org/}) rotates Affymetrix Exon,
 #   Tiling and Mapping 500K data.  See example below how to read 
 #   such data "unrotated".
096f7f6a
 #
 #   For more details how cell indices are defined, see 
f1d6fcf0
 #   @see "2. Cell coordinates and cell indices".
096f7f6a
 # }
 #
 # \section{Motivation}{
 #   When reading data from file, it is faster to read the data in
 #   the order that it is stored compared with, say, in a random order.
7ba76859
 #   The main reason for this is that the read arm of the hard drive
096f7f6a
 #   has to move more if data is not read consecutively.  Same applies
 #   when writing data to file.  The read and write cache of the file
 #   system may compensate a bit for this, but not completely.
 #
 #   In Affymetrix CEL files, cell data is stored in order of cell indices.
 #   Moreover, (except for a few early chip types) Affymetrix randomizes
 #   the locations of the cells such that cells in the same unit (probeset)
 #   are scattered across the array.  
 #   Thus, when reading CEL data arranged by units using for instance 
 #   @see "readCelUnits", the order of the cells requested is both random
 #   and scattered.  
 #   
 #   Since CEL data is often queried unit by unit (except for some
 #   probe-level normalization methods), one can improve the speed of
 #   reading data by saving data such that cells in the same unit are
 #   stored together.  A \emph{write map} is used to remap cell indices
 #   to file indices.  When later reading that data back, a 
d42f2ebf
 #   \emph{read map} is used to remap file indices to cell indices.
096f7f6a
 #   Read and write maps are described next.
 # }
 #
 # \section{Definition of read and write maps}{
 #   Consider cell indices \eqn{i=1, 2, ..., N*K} and file indices 
 #   \eqn{j=1, 2, ..., N*K}.
 #   A \emph{read map} is then a \emph{bijective} (one-to-one) function
 #   \eqn{h()} such that
 #   \deqn{
 #     i = h(j),
 #   }
 #   and the corresponding \emph{write map} is the inverse function
 #   \eqn{h^{-1}()} such that
 #   \deqn{
 #     j = h^{-1}(i).
 #   }
 #   Since the mapping is required to be bijective, it holds that 
 #   \eqn{i = h(h^{-1}(i))} and that \eqn{j = h^{-1}(h(j))}. 
 #   For example, consider the "reversing" read map function
 #   \eqn{h(j)=N*K-j+1}.  The write map function is \eqn{h^{-1}(i)=N*K-i+1}.
 #   To verify the bijective property of this map, we see that
 #   \eqn{h(h^{-1}(i)) = h(N*K-i+1) = N*K-(N*K-i+1)+1 = i} as well as
 #   \eqn{h^{-1}(h(j)) = h^{-1}(N*K-j+1) = N*K-(N*K-j+1)+1 = j}.
 # }
 #
 # \section{Read and write maps in R}{
 #   In this package, read and write maps are represented as @integer 
 #   @vectors of length \eqn{N*K} with \emph{unique} elements in
 #   \eqn{\{1,2,...,N*K\}}.
 #   Consider cell and file indices as in previous section.
 #
 #   For example, the "reversing" read map in previous section can be
 #   represented as
 #   \preformatted{
 #     readMap <- (N*K):1
 #   }
 #   Given a @vector \code{j} of file indices, the cell indices are
 #   the obtained as \code{i = readMap[j]}. 
 #   The corresponding write map is
 #   \preformatted{
 #     writeMap <- (N*K):1
 #   }
 #   and given a @vector \code{i} of cell indices, the file indices are
 #   the obtained as \code{j = writeMap[i]}.
 #
 #   Note also that the bijective property holds for this mapping, that is
 #   \code{i == readMap[writeMap[i]]} and \code{i == writeMap[readMap[i]]} 
 #   are both @TRUE.
 #
 #   Because the mapping is bijective, the write map can be calculated from
 #   the read map by:
 #   \preformatted{
 #     writeMap <- order(readMap)
 #   }
 #   and vice versa:
 #   \preformatted{
 #     readMap <- order(writeMap)
 #   }
886f672a
 #   Note, the @see "invertMap" method is much faster than \code{order()}.
096f7f6a
 #
 #   Since most algorithms for Affymetrix data are based on probeset (unit)
 #   models, it is natural to read data unit by unit.  Thus, to optimize the
 #   speed, cells should be stored in contiguous blocks of units.
886f672a
 #   The methods @see "readCdfUnitsWriteMap" can be used to generate a 
 #   \emph{write map} from a CDF file such that if the units are read in
 #   order, @see "readCelUnits" will read the cells data in order.  
096f7f6a
 #   Example:
 #   \preformatted{
 #     Find any CDF file
 #     cdfFile <- findCdf()
 #
 #     # Get the order of cell indices
886f672a
 #     indices <- readCdfCellIndices(cdfFile)
096f7f6a
 #     indices <- unlist(indices, use.names=FALSE)
 #
886f672a
 #     # Get an optimal write map for the CDF file
 #     writeMap <- readCdfUnitsWriteMap(cdfFile)
096f7f6a
 #
886f672a
 #     # Get the read map
 #     readMap <- invertMap(writeMap)
 #
 #     # Validate correctness
096f7f6a
 #     indices2 <- readMap[indices]    # == 1, 2, 3, ..., N*K
 #   }
 #
 #   \emph{Warning}, do not misunderstand this example.  It can not be used
d42f2ebf
 #   improve the reading speed of default CEL files.  For this, the data in 
 #   the CEL files has to be rearranged (by the corresponding write map).
096f7f6a
 # }
 #
63b4c964
 # \section{Reading rotated CEL files}{
 #   It might be that a CEL file was rotated by another software, e.g.
 #   the dChip software rotates Affymetrix Exon, Tiling and Mapping 500K
 #   arrays 90 degrees clockwise, which remains rotated when exported 
 #   as CEL files.  To read such data in a non-rotated way, a read
 #   map can be used to "unrotate" the data.  The 90-degree clockwise 
7ba76859
 #   rotation that dChip effectively uses to store such data is explained by:
63b4c964
 #   \preformatted{
 #     h <- readCdfHeader(cdfFile)
 #     # (x,y) chip layout rotated 90 degrees clockwise
 #     nrow <- h$cols
 #     ncol <- h$rows
 #     y <- (nrow-1):0
 #     x <- rep(1:ncol, each=nrow)
 #     writeMap <- as.vector(y*ncol + x)
 #   }
 #
 #   Thus, to read this data "unrotated", use the following read map:
 #   \preformatted{
 #     readMap <- invertMap(writeMap)
 #     data <- readCel(celFile, indices=1:10, readMap=readMap)
 #   }
 # }
 #
76cf4b26
 # @author "HB"
d223eaa3
 #
 # @keyword internal
096f7f6a
 #*/#########################################################################