-
Notifications
You must be signed in to change notification settings - Fork 144
Open
Labels
Description
In #273 and #347 we have optimized several functions by essentially performing static argument transformations by hand. The functions below look like good candidates to attempt similar optimizations.
-
unfoldrN
: (Done in Hand SAT optimization for Data.ByteString.unfoldrN #356) I believe we could float out thep
argument and usen
as the offset to use withpokeByteOff
.
Lines 868 to 878 in c470cf2
unfoldrN :: Int -> (a -> Maybe (Word8, a)) -> a -> (ByteString, Maybe a) unfoldrN i f x0 | i < 0 = (empty, Just x0) | otherwise = unsafePerformIO $ createAndTrim' i $ \p -> go p x0 0 where go !p !x !n | n == i = return (0, n, Just x) | otherwise = case f x of Nothing -> return (0, n, Nothing) Just (w,x') -> do poke p w go (p `plusPtr` 1) x' (n+1) -
partition
: In thesep
andrev
loops,p2
could be computed fromp1
. Alternatively, both memory locations could be tracked via a single offset.
Lines 1477 to 1507 in c470cf2
partition :: (Word8 -> Bool) -> ByteString -> (ByteString, ByteString) partition f s = unsafeDupablePerformIO $ do fp' <- mallocByteString len withForeignPtr fp' $ \p -> do let end = p `plusPtr` (len - 1) mid <- sep 0 p end rev mid end let i = mid `minusPtr` p return (BS fp' i, BS (plusForeignPtr fp' i) (len - i)) where len = length s incr = (`plusPtr` 1) decr = (`plusPtr` (-1)) sep !i !p1 !p2 | i == len = return p1 | f w = do poke p1 w sep (i + 1) (incr p1) p2 | otherwise = do poke p2 w sep (i + 1) p1 (decr p2) where w = s `unsafeIndex` i rev !p1 !p2 | p1 >= p2 = return () | otherwise = do a <- peek p1 b <- peek p2 poke p1 b poke p2 a rev (incr p1) (decr p2) -
hGetLine.findEOL
: It might be worthwhile to float out thew
andraw
arguments.
Lines 1846 to 1852 in c470cf2
findEOL r w raw | r == w = return w | otherwise = do c <- readWord8Buf raw r if c == fromIntegral (ord '\n') then return r -- NB. not r+1: don't include the '\n' else findEOL (r+1) w raw -
Char8.readInt.digits
: Thep
andb
arguments are always incremented in sync, which indicates that one could be made static and floated out.
bytestring/Data/ByteString/Char8.hs
Lines 1056 to 1071 in c470cf2
digits !maxq !maxr !e !ptr = go ptr where go :: Ptr Word8 -> Int -> Word -> IO (Int, Word, Bool) go !p !b !a | p == e = return (b, a, True) go !p !b !a = do !w <- fromIntegral <$> peek p let !d = w - 0x30 if d > 9 -- No more digits then return (b, a, True) else if a < maxq -- Look for more then go (p `plusPtr` 1) (b + 1) (a * 10 + d) else if a > maxq -- overflow then return (b, a, False) else if d <= maxr -- Ideally this will be the last digit then go (p `plusPtr` 1) (b + 1) (a * 10 + d) else return (b, a, False) -- overflow -
Lazy.Char8.readInt.digits
: Same situation as in the strict version
bytestring/Data/ByteString/Lazy/Char8.hs
Lines 877 to 892 in c470cf2
digits !maxq !maxr !e !ptr = go ptr where go :: Ptr Word8 -> Int -> Word -> IO (Int, Word, Bool) go !p !b !a | p == e = return (b, a, True) go !p !b !a = do !w <- fromIntegral <$> peek p let !d = w - 0x30 if d > 9 -- No more digits then return (b, a, True) else if a < maxq -- Look for more then go (p `plusPtr` 1) (b + 1) (a * 10 + d) else if a > maxq -- overflow then return (b, a, False) else if d <= maxr -- Ideally this will be the last digit then go (p `plusPtr` 1) (b + 1) (a * 10 + d) else return (b, a, False) -- overflow -
packZipWith.zipWith_
: Ther
argument could be floated out. @Boarders points out that this doesn't always improve performance though: Try optimizing loops inData.ByteString.findIndex[End]
in the style of #273 #338 (comment). Investigating this might be worthwhile though.
Lines 1667 to 1684 in c470cf2
packZipWith :: (Word8 -> Word8 -> Word8) -> ByteString -> ByteString -> ByteString packZipWith f (BS fp l) (BS fq m) = unsafeDupablePerformIO $ withForeignPtr fp $ \a -> withForeignPtr fq $ \b -> create len $ go a b where go p1 p2 = zipWith_ 0 where zipWith_ :: Int -> Ptr Word8 -> IO () zipWith_ !n !r | n >= len = return () | otherwise = do x <- peekByteOff p1 n y <- peekByteOff p2 n pokeByteOff r n (f x y) zipWith_ (n+1) r len = min l m -
Short.partition.go.go'
:bw2 = br - bw1