Skip to content

More inner loops with more arguments than necessary #350

@sjakobi

Description

@sjakobi

In #273 and #347 we have optimized several functions by essentially performing static argument transformations by hand. The functions below look like good candidates to attempt similar optimizations.

  • unfoldrN: (Done in Hand SAT optimization for Data.ByteString.unfoldrN #356) I believe we could float out the p argument and use n as the offset to use with pokeByteOff.
    unfoldrN :: Int -> (a -> Maybe (Word8, a)) -> a -> (ByteString, Maybe a)
    unfoldrN i f x0
    | i < 0 = (empty, Just x0)
    | otherwise = unsafePerformIO $ createAndTrim' i $ \p -> go p x0 0
    where
    go !p !x !n
    | n == i = return (0, n, Just x)
    | otherwise = case f x of
    Nothing -> return (0, n, Nothing)
    Just (w,x') -> do poke p w
    go (p `plusPtr` 1) x' (n+1)
  • partition: In the sep and rev loops, p2 could be computed from p1. Alternatively, both memory locations could be tracked via a single offset.

    bytestring/Data/ByteString.hs

    Lines 1477 to 1507 in c470cf2

    partition :: (Word8 -> Bool) -> ByteString -> (ByteString, ByteString)
    partition f s = unsafeDupablePerformIO $
    do fp' <- mallocByteString len
    withForeignPtr fp' $ \p ->
    do let end = p `plusPtr` (len - 1)
    mid <- sep 0 p end
    rev mid end
    let i = mid `minusPtr` p
    return (BS fp' i,
    BS (plusForeignPtr fp' i) (len - i))
    where
    len = length s
    incr = (`plusPtr` 1)
    decr = (`plusPtr` (-1))
    sep !i !p1 !p2
    | i == len = return p1
    | f w = do poke p1 w
    sep (i + 1) (incr p1) p2
    | otherwise = do poke p2 w
    sep (i + 1) p1 (decr p2)
    where
    w = s `unsafeIndex` i
    rev !p1 !p2
    | p1 >= p2 = return ()
    | otherwise = do a <- peek p1
    b <- peek p2
    poke p1 b
    poke p2 a
    rev (incr p1) (decr p2)
  • hGetLine.findEOL: It might be worthwhile to float out the w and raw arguments.

    bytestring/Data/ByteString.hs

    Lines 1846 to 1852 in c470cf2

    findEOL r w raw
    | r == w = return w
    | otherwise = do
    c <- readWord8Buf raw r
    if c == fromIntegral (ord '\n')
    then return r -- NB. not r+1: don't include the '\n'
    else findEOL (r+1) w raw
  • Char8.readInt.digits: The p and b arguments are always incremented in sync, which indicates that one could be made static and floated out.
    digits !maxq !maxr !e !ptr = go ptr
    where
    go :: Ptr Word8 -> Int -> Word -> IO (Int, Word, Bool)
    go !p !b !a | p == e = return (b, a, True)
    go !p !b !a = do
    !w <- fromIntegral <$> peek p
    let !d = w - 0x30
    if d > 9 -- No more digits
    then return (b, a, True)
    else if a < maxq -- Look for more
    then go (p `plusPtr` 1) (b + 1) (a * 10 + d)
    else if a > maxq -- overflow
    then return (b, a, False)
    else if d <= maxr -- Ideally this will be the last digit
    then go (p `plusPtr` 1) (b + 1) (a * 10 + d)
    else return (b, a, False) -- overflow
  • Lazy.Char8.readInt.digits: Same situation as in the strict version
    digits !maxq !maxr !e !ptr = go ptr
    where
    go :: Ptr Word8 -> Int -> Word -> IO (Int, Word, Bool)
    go !p !b !a | p == e = return (b, a, True)
    go !p !b !a = do
    !w <- fromIntegral <$> peek p
    let !d = w - 0x30
    if d > 9 -- No more digits
    then return (b, a, True)
    else if a < maxq -- Look for more
    then go (p `plusPtr` 1) (b + 1) (a * 10 + d)
    else if a > maxq -- overflow
    then return (b, a, False)
    else if d <= maxr -- Ideally this will be the last digit
    then go (p `plusPtr` 1) (b + 1) (a * 10 + d)
    else return (b, a, False) -- overflow
  • packZipWith.zipWith_: The r argument could be floated out. @Boarders points out that this doesn't always improve performance though: Try optimizing loops in Data.ByteString.findIndex[End] in the style of #273 #338 (comment). Investigating this might be worthwhile though.

    bytestring/Data/ByteString.hs

    Lines 1667 to 1684 in c470cf2

    packZipWith :: (Word8 -> Word8 -> Word8) -> ByteString -> ByteString -> ByteString
    packZipWith f (BS fp l) (BS fq m) = unsafeDupablePerformIO $
    withForeignPtr fp $ \a ->
    withForeignPtr fq $ \b ->
    create len $ go a b
    where
    go p1 p2 = zipWith_ 0
    where
    zipWith_ :: Int -> Ptr Word8 -> IO ()
    zipWith_ !n !r
    | n >= len = return ()
    | otherwise = do
    x <- peekByteOff p1 n
    y <- peekByteOff p2 n
    pokeByteOff r n (f x y)
    zipWith_ (n+1) r
    len = min l m
  • Short.partition.go.go': bw2 = br - bw1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions