Bioconductor Code: affxparser

R/909.CellIndexMaps.R

096f7f6a	#########################################################################/**
f1d6fcf0	# @RdocDocumentation "9. Advanced - Cell-index maps for reading and writing"
096f7f6a	# # \description{ # This part defines read and write maps that can be used to remap # cell indices before reading and writing data from and to file, # respectively. # # This package provides methods to create read and write (cell-index) # maps from Affymetrix CDF files. These can be used to store the cell # data in an optimal order so that when data is read it is read in # contiguous blocks, which is faster.
63b4c964	# # In addition to this, read maps may also be used to read CEL files that # have been "reshuffled" by other software. For instance, the dChip # software (\url{http://www.dchip.org/}) rotates Affymetrix Exon, # Tiling and Mapping 500K data. See example below how to read # such data "unrotated".
096f7f6a	# # For more details how cell indices are defined, see
f1d6fcf0	# @see "2. Cell coordinates and cell indices".
096f7f6a	# } # # \section{Motivation}{ # When reading data from file, it is faster to read the data in # the order that it is stored compared with, say, in a random order.
7ba76859	# The main reason for this is that the read arm of the hard drive
096f7f6a	# has to move more if data is not read consecutively. Same applies # when writing data to file. The read and write cache of the file # system may compensate a bit for this, but not completely. # # In Affymetrix CEL files, cell data is stored in order of cell indices. # Moreover, (except for a few early chip types) Affymetrix randomizes # the locations of the cells such that cells in the same unit (probeset) # are scattered across the array. # Thus, when reading CEL data arranged by units using for instance # @see "readCelUnits", the order of the cells requested is both random # and scattered. # # Since CEL data is often queried unit by unit (except for some # probe-level normalization methods), one can improve the speed of # reading data by saving data such that cells in the same unit are # stored together. A \emph{write map} is used to remap cell indices # to file indices. When later reading that data back, a
d42f2ebf	# \emph{read map} is used to remap file indices to cell indices.
096f7f6a	# Read and write maps are described next. # } # # \section{Definition of read and write maps}{ # Consider cell indices \eqn{i=1, 2, ..., NK} and file indices # \eqn{j=1, 2, ..., NK}. # A \emph{read map} is then a \emph{bijective} (one-to-one) function # \eqn{h()} such that # \deqn{ # i = h(j), # } # and the corresponding \emph{write map} is the inverse function # \eqn{h^{-1}()} such that # \deqn{ # j = h^{-1}(i). # } # Since the mapping is required to be bijective, it holds that # \eqn{i = h(h^{-1}(i))} and that \eqn{j = h^{-1}(h(j))}. # For example, consider the "reversing" read map function # \eqn{h(j)=NK-j+1}. The write map function is \eqn{h^{-1}(i)=NK-i+1}. # To verify the bijective property of this map, we see that # \eqn{h(h^{-1}(i)) = h(NK-i+1) = NK-(NK-i+1)+1 = i} as well as # \eqn{h^{-1}(h(j)) = h^{-1}(NK-j+1) = NK-(NK-j+1)+1 = j}. # } # # \section{Read and write maps in R}{ # In this package, read and write maps are represented as @integer # @vectors of length \eqn{NK} with \emph{unique} elements in # \eqn{\{1,2,...,NK\}}. # Consider cell and file indices as in previous section. # # For example, the "reversing" read map in previous section can be # represented as # \preformatted{ # readMap <- (NK):1 # } # Given a @vector \code{j} of file indices, the cell indices are # the obtained as \code{i = readMap[j]}. # The corresponding write map is # \preformatted{ # writeMap <- (NK):1 # } # and given a @vector \code{i} of cell indices, the file indices are # the obtained as \code{j = writeMap[i]}. # # Note also that the bijective property holds for this mapping, that is # \code{i == readMap[writeMap[i]]} and \code{i == writeMap[readMap[i]]} # are both @TRUE. # # Because the mapping is bijective, the write map can be calculated from # the read map by: # \preformatted{ # writeMap <- order(readMap) # } # and vice versa: # \preformatted{ # readMap <- order(writeMap) # }
886f672a	# Note, the @see "invertMap" method is much faster than \code{order()}.
096f7f6a	# # Since most algorithms for Affymetrix data are based on probeset (unit) # models, it is natural to read data unit by unit. Thus, to optimize the # speed, cells should be stored in contiguous blocks of units.
886f672a	# The methods @see "readCdfUnitsWriteMap" can be used to generate a # \emph{write map} from a CDF file such that if the units are read in # order, @see "readCelUnits" will read the cells data in order.
096f7f6a	# Example: # \preformatted{ # Find any CDF file # cdfFile <- findCdf() # # # Get the order of cell indices
886f672a	# indices <- readCdfCellIndices(cdfFile)
096f7f6a	# indices <- unlist(indices, use.names=FALSE) #
886f672a	# # Get an optimal write map for the CDF file # writeMap <- readCdfUnitsWriteMap(cdfFile)
096f7f6a	#
886f672a	# # Get the read map # readMap <- invertMap(writeMap) # # # Validate correctness
096f7f6a	# indices2 <- readMap[indices] # == 1, 2, 3, ..., N*K # } # # \emph{Warning}, do not misunderstand this example. It can not be used
d42f2ebf	# improve the reading speed of default CEL files. For this, the data in # the CEL files has to be rearranged (by the corresponding write map).
096f7f6a	# } #
63b4c964	# \section{Reading rotated CEL files}{ # It might be that a CEL file was rotated by another software, e.g. # the dChip software rotates Affymetrix Exon, Tiling and Mapping 500K # arrays 90 degrees clockwise, which remains rotated when exported # as CEL files. To read such data in a non-rotated way, a read # map can be used to "unrotate" the data. The 90-degree clockwise
7ba76859	# rotation that dChip effectively uses to store such data is explained by:
63b4c964	# \preformatted{ # h <- readCdfHeader(cdfFile) # # (x,y) chip layout rotated 90 degrees clockwise # nrow <- h$cols # ncol <- h$rows # y <- (nrow-1):0 # x <- rep(1:ncol, each=nrow) # writeMap <- as.vector(y*ncol + x) # } # # Thus, to read this data "unrotated", use the following read map: # \preformatted{ # readMap <- invertMap(writeMap) # data <- readCel(celFile, indices=1:10, readMap=readMap) # } # } #
76cf4b26	# @author "HB"
d223eaa3	# # @keyword internal
096f7f6a	#*/#########################################################################