SlideShare a Scribd company logo
SIGMOD’08
      SIGMOD’08

   A Case for Flash Memory SSD in
   Enterprise Database Applications
         Sang-Won Lee                                Bongki Moon
              Sungkyunkwan University                 University of Arizona




 Chanik Park                    Jae-Myung Kim                   Sang-Woo Kim
Samsung Electronics Co., Ldt.
                         Ldt.       Altibase Corp.                   Sungkyunkwan University




   COMPUTER SCIENCE DEPARTMENT                          ACM SIGMOD, Vancouver Canada, June 2008 -1-
Magnetic Disk vs Flash SSD
  Champion                                           M-Tron Flash SSD
 for 50 years                                         32GB 2.5 inch




                       New
                    challengers!
Seagate ST340016A
  40GB,7200rpm
                        Samsung FlashSSD
                          32GB 1.8 inch


                                   ACM SIGMOD, Vancouver Canada, June 2008 -2-
Trend in Market Today
• In mobile storage market
      NAND flash memory wins over hard disk in mobile storage market
        • PDA, MP3, mobile phone, digital camera, ...
      Due to advantages in size, weight, shock resistance, power
      consumption, noise …
• In personal computer market
      Compete with hard disk in personal computer market
        • 32GB Flash SSD: M-Tron, Samsung, SanDisk
      Vendors launched new lines of personal computers with NAND flash
      SSD replacing hard disk
        • Apple, Samsung, and others




 COMPUTER SCIENCE DEPARTMENT                      ACM SIGMOD, Vancouver Canada, June 2008 -3-
Market Trend in Prospect
• Price drops quickly
      NAND flash is a lot cheaper than DRAM;
        • ASP/MB of NAND < 1/3 of ASP/MB of DRAM as of 2007.
      Still much more expensive than magnetic disk.
      Annual drop in ASP/MB was about 60% in 2006.
      Projected annual drop in ASP/MB is about 30-40% in next 5 years.
      [Eli Harari@SanDisk, August 2007]

• Emerging Enterprise Market
      NAND ASP was $10/GB in 2007. With 40% annual drop, it could be
      $800/TB in 2012.
      Not inconceivable to run a full database server on a computing
      platform with TB-scale Flash SSD as secondary storage.



 COMPUTER SCIENCE DEPARTMENT                  ACM SIGMOD, Vancouver Canada, June 2008 -4-
Technology Trend in Prospect
• NAND flash density increases faster than Moore’s law
      Predicted twofold annual increase of NAND flash density until 2012
      [Hwang, ProcIEEE’03]
      Toshiba hopes for 512GB SSD by the end of 2009
        • 30 nm chip-making process, Multi-level-cell (MLC)


• Bandwidth catches up
      Samsung MCAQE32G8APP-0XA [2006]
        • Sustained read 56 MB/sec, sustained write 32 MB/sec
      Samsung, Mtron [Feb. 2008]
        • Sustained read 100~120 MB/sec, sustained write 80~90 MB/sec
      Intel-Micron’s 4-plane architecture + higher clock speed [Feb. 2008]
        • Sustained read 200 MB/sec, sustained write 100 MB/sec
      Samsung MLC-based 256GB SSD with SATA-II [May 2008]
        • Sustained read 200 MB/sec, sustained write 160 MB/sec


 COMPUTER SCIENCE DEPARTMENT                      ACM SIGMOD, Vancouver Canada, June 2008 -5-
Past Trend of Disk
• From 1983 to 2003 [Patterson, CACM 47(10) 2004]
     Capacity increased about 2500 times (0.03 GB 73.4 GB)
     Bandwidth improved 143.3 times (0.6 MB/s 86 MB/s)
     Latency improved 8.5 times (48.3 ms 5.7 ms)
      Year       1983       1990      1994             1998            2003
     Product      CDC      Seagate   Seagate        Seagate          Seagate
                94145-36   ST41600   ST15150        ST39102         ST373453
    Capacity    0.03 GB    1.4 GB    4.3 GB          9.1 GB          73.4 GB
      RPM        3600       5400      7200            10000            15000
    Bandwidth     0.6         4         9               24               86
     (MB/sec)
     Media        5.25      5.25       3.5              3.0             2.5
    diameter
     Latency      48.3      17.1      12.7              8.8             5.7
     (msec)


                                              ACM SIGMOD, Vancouver Canada, June 2008 -6-
Latency of Disk Lags
• Trend
    In the time that bandwidth doubles, latency improves by
    no more than a factor of 1.2 to 1.4.
     • Latency improves by no more than square root of the
       improvement in bandwidth.
    The bandwidth-latency imbalance may be even more
    evident in the future.
• The trouble is
    Latency remains important for
     • Interactive applications, database logging (or whenever I/O must
       be done synchronously)
• What can NAND Flash Memory do for this?


                                            ACM SIGMOD, Vancouver Canada, June 2008 -7-
Magnetic Disk vs NAND Flash
• Below is what the data sheets show
                         Sustained Transfer Rate      Average Latency
     Magnetic Disk               110 MB/sec               8.33 msec
     NAND Flash SSD            56 MB/sec (read)       0.2 msec (read)
                               32 MB/sec (write)      0.4 msec (write)


    Magnetic Disk : Seagate Barracuda 7200.10 ST3250310AS
    NAND Flash SSD : Samsung MCAQE32G8APP-0XA drive with
    K9WAG08U1A 16 Gbits SLC NAND chips

      • Newer SSD products report much higher bandwidth for read and write




 COMPUTER SCIENCE DEPARTMENT                       ACM SIGMOD, Vancouver Canada, June 2008 -8-
Characteristics of NAND Flash
• No mechanical latency
      Flash memory is an electronic device without moving parts
      Provides uniform random access speed without seek/rotational
      latency
        • Very low latency, independently of physical location of data
• Asymmetric read & write speed
      Read speed is typically at least twice faster than write speed
        • (E.g.) Samsung 16 Gbits SLC NAND chips: 80 µsec vs 200 µsec (2 KB)
• No in-place update
      No data item or page can be updated in place before erasing it first.
        • An erase unit (typically 128 KB) is much larger than a page (2 KB).
        • (E.g.) Samsung 16 Gbits SLC NAND chips: 1.5 msec (128 KB)
      Write (and erase) optimization is critical


 COMPUTER SCIENCE DEPARTMENT                        ACM SIGMOD, Vancouver Canada, June 2008 -9-
Flash SSD for Databases?
• Immediate benefit for some DB operations
    Reduce commit-time delay by fast logging
    Reduce read time for multi-versioned data


• Still, many concerns to be addressed
    Random scattered I/O is very common in OLTP
     • Slow random writes by flash SSD can handle this?
    Flash-aware design of DBMS?
    Flash-friendly algorithms?
    Flash-friendly implementation?



                                          ACM SIGMOD, Vancouver Canada, June 2008 -10-
Transactional Log
                        SQL Queries




                   System Buffer Cache




      Database     Transaction   Temporary           Rollback

     Table space   (Redo) Log    Table Space         Segments




                                         ACM SIGMOD, Vancouver Canada, June 2008 -11-
Commit-time Delay by Logging
• Write Ahead Log (WAL)                                     T1    T2        …   Tn
      A committing transaction force-writes its                  SQL
      log records
                                                       Buffer                        Log Buffer
      Makes it hard to hide latency
      With a separate disk for logging                                 pi
       • No seek delay, but …
       • Half a revolution of spindle on average
       • 4.2 msec (7200RPM), 2.0 msec (15k RPM)        DB
      With a Flash SSD: about 0.4 msec                                                  LOG



• Commit-time delay remains to be a significant overhead
      Group-commit helps but the delay doesn’t go away altogether.
• How much commit-time delay?
      On average, 8.1 msec (HDD) vs 1.3 msec (SDD) : 6-fold reduction
       • TPC-B benchmark with 20 concurrent users.

                                                   ACM SIGMOD, Vancouver Canada, June 2008 -12-
HDD vs SSD for Logging
• With SSD for log
      CPU better utilized
       • By shortening commit-
         time, and serving more
         active transactions.
      Leads to higher TPS

• Exaggerated by caching entire
  DB in memory
• TPC-B to stress-test logging
      Transaction commit rate
      higher than TPC-C


                                  ACM SIGMOD, Vancouver Canada, June 2008 -13-
Temporary Table Space
                        SQL Queries




                   System Buffer Cache




      Database     Transaction   Temporary           Rollback

     Table space   (Redo) Log    Table Space         Segments




                                         ACM SIGMOD, Vancouver Canada, June 2008 -14-
Temp Data and Query Time
• Query processing often generates temp data
    Sorts, joins, index creation, etc.
    Typically bulky, performed in foreground;
    Direct impact on query processing time
• Typically stored in separate storage devices

• Ask the same question
    What happens if SSD replaces HDD for
    temporary table spaces?


                               ACM SIGMOD, Vancouver Canada, June 2008 -15-
External Sort: I/O Pattern
• External Sort algorithm runs in two phases
     Sorted run generation
      • Partitioned to chunks, sorted separately and, saved in sorted runs
      • Read sequentially from table space, written sequentially into temp space
     Merging sorted runs
      • Read randomly from temp space, written sequentially into table space


• Dominant I/O patterns are sequential write followed by
  random read
     No-in-place-update limitation is avoided.
     These are flash-friendly I/O patterns!!




                                                  ACM SIGMOD, Vancouver Canada, June 2008 -16-
External Sort: Performance
• HDD vs SSD as a medium for a temp table space
     Sort a table of 2 M tuples (200 MB), with 2 MB buffer cache
• SSD is good at sequential write + random read
     Almost an order of magnitude reduction in merge times




                                            ACM SIGMOD, Vancouver Canada, June 2008 -17-
One Less Tuning Knob?
• Cluster sizes for Sorting?
• With a larger cluster
     Disk bandwidth improves (by
     hiding latency)
     The amount of I/O may also
     increase due to reduced fan-in
     for merging sorted runs
• Flash SSD is
     With low latency, not as sensitive
     to the cluster size
     2KB page was the best with the
     max fan-in



                                          ACM SIGMOD, Vancouver Canada, June 2008 -18-
Hash-Sort Duality a Myth?

• The I/O pattern of hashing is said to be
     random write (for writing hash buckets) + sequential read
     (for probing hash buckets)
     As opposed to sort (sequential write + random read)


• If it’s the case, hashing is not flash-friendly.
     Re-implement hashing to make it flash-friendly?
     It appears already done by some vendors.
      • The observed I/O pattern was quite similar to that of sort
        (sequential write + random read)



                                              ACM SIGMOD, Vancouver Canada, June 2008 -19-
Hash Join: Performance
• HDD vs SSD as a medium for a temp table space
     Hash-join two tables of 2 M tuples (200 MB) each, with 2 MB buffer
     cache
     About 3-fold reduction in join time




                                             ACM SIGMOD, Vancouver Canada, June 2008 -20-
Rollback Segments
                        SQL Queries




                   System Buffer Cache




      Database     Transaction   Temporary           Rollback

     Table space   (Redo) Log    Table Space         Segments




                                         ACM SIGMOD, Vancouver Canada, June 2008 -21-
MVCC Rollback Segments
• Multi-version Concurrency Control (MVCC)
    Alternative to traditional Lock-based CC
    Support read consistency and snapshot isolation
    Oracle, PostgresSQL, Sybase, SQL Server 2005, MySQL
• Rollback Segments
    When updating an object, its current value is recorded in
    the rollback segment
    To fetch the correct version of an object, check whether
    it has been updated by other transactions
    Each transaction is assigned to a rollback segment; old
    images of data are written to the rollback segment
    sequentially (in append-only fashion).



                                     ACM SIGMOD, Vancouver Canada, June 2008 -22-
MVCC Write Pattern
                                 • Write requests from TPC-C workload
                                           Concurrent transactions generate multiple streams of append-only
                                           traffic in parallel (apart by approximately 1 MB)
                                           HDD moves disk arm very frequently
                                           SSD has no negative effect from no in-place update limitation
                                 800
Logical sector address (x1000)




                                 700

                                 600

                                 500

                                 400

                                 300

                                 200

                                 100

                                  0
                                       0   100   200    300    400   500   600
                                                   Time (second)



                                                                                  ACM SIGMOD, Vancouver Canada, June 2008 -23-
MVCC Read Performance
     C   …
                                                          • To support MV read consistency,
                            T1                     T0
     B

     A
         100

         50           A     100                A    200
                                                            I/O activities will increase
                                                               A long chain of old versions may have
T2
                                                               to be traversed for each access to a
                                                               frequently updated object
                                                          • Read requests are scattered
                                          50
                                       ->




                                                            randomly
                                  100 A:




         Rollback segment
                                   (2)




                                                               Old versions of an object may be
                                                               stored in several rollback segments
                                                               With SSD, 10-fold read time reduction
                                                               was not surprising
                                        00
                                       - 1
                                  200 A:
                                        >




         Rollback segment
                                   (1)




                                                                       ACM SIGMOD, Vancouver Canada, June 2008 -24-
Database Table Space
                        SQL Queries




                   System Buffer Cache




      Database     Transaction   Temporary           Rollback

     Table space   (Redo) Log    Table Space         Segments




                                         ACM SIGMOD, Vancouver Canada, June 2008 -25-
Workload in Table Space
• TPC-C workload
    Exhibit little locality and sequentiality
      • Mix of small/medium/large read-write, read-only (join)
    Highly skewed
      • ~80% of accesses to 20% of tuples
• Write caching not as effective as read caching
    Physical read/write ratio is much lower that logical
    read/write ratio


• All bad news for flash memory SSD
    Due to the No-in-place-update limitation
    In-Page Logging (IPL) approach [SIGMOD’07]

                                            ACM SIGMOD, Vancouver Canada, June 2008 -26-
Concluding Remarks
• Clear and present evidences that Flash memory SSD can co-
  exist or even replace Magnetic Disk
      Even now for logging, rollback segments and temp table spaces
      Write optimization needed for database table spaces
• Flash-Aware DBMS Design is a must!
      Flash-friendly algorithms, flash-friendly implementations
      Need fresh new look at almost everything: Buffer management, B-
      trees, Sorting and Hashing, Self-Tuning, File Systems, etc.




 COMPUTER SCIENCE DEPARTMENT                  ACM SIGMOD, Vancouver Canada, June 2008 -27-

More Related Content

What's hot (20)

PDF
SSD vs HDD - A Shift In Data Storage by Todd Dinkelman
nomathjobs
 
PDF
IMCSummit 2015 - Day 1 Developer Session - The Science and Engineering Behind...
In-Memory Computing Summit
 
PDF
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
In-Memory Computing Summit
 
PDF
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld
 
PDF
Dumb Simple PostgreSQL Performance (NYCPUG)
Joshua Drake
 
PDF
NVMe over Fibre Channel Introduction
Calvin Zito
 
PPT
SSD PPT BY SAURABH
Saurabh Kumar
 
PDF
SSD Seminar Report
VishalKSetti
 
PDF
Intel ssd dc data center family for PCIe
Low Hong Chuan
 
PDF
Benchmarking Performance: Benefits of PCIe NVMe SSDs for Client Workloads
Samsung Business USA
 
PDF
PCI Express* based Storage: Data Center NVM Express* Platform Topologies
Odinot Stanislas
 
PDF
2018 Infortrend All Flash Arrays Introduction (GS3025A)
infortrendgroup
 
PDF
Solid State Drives - Seminar for Computer Engineering Semester 6 - VIT,Univer...
ravipbhat
 
DOCX
3 5 SSD
ssuser1eca7d
 
PDF
"Achieving Flash Memory's Full Potential" @ Flash Memory Summit 2012
Darpan Dinker
 
PDF
Evaluating MLC vs TLC vs V-NAND for Enterprise SSDs – Whitepaper
Samsung Business USA
 
PDF
Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra
DataStax Academy
 
PPT
Apresentacao Solid Access Corp Presentation Openware 5 20 10
Sidnir Vieira
 
PDF
FlashMemorySummit_2015_NVMFS
Dhananjoy ( Joy ) Das
 
PPTX
Webinar: How NVMe Will Change Flash Storage
Storage Switzerland
 
SSD vs HDD - A Shift In Data Storage by Todd Dinkelman
nomathjobs
 
IMCSummit 2015 - Day 1 Developer Session - The Science and Engineering Behind...
In-Memory Computing Summit
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
In-Memory Computing Summit
 
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld
 
Dumb Simple PostgreSQL Performance (NYCPUG)
Joshua Drake
 
NVMe over Fibre Channel Introduction
Calvin Zito
 
SSD PPT BY SAURABH
Saurabh Kumar
 
SSD Seminar Report
VishalKSetti
 
Intel ssd dc data center family for PCIe
Low Hong Chuan
 
Benchmarking Performance: Benefits of PCIe NVMe SSDs for Client Workloads
Samsung Business USA
 
PCI Express* based Storage: Data Center NVM Express* Platform Topologies
Odinot Stanislas
 
2018 Infortrend All Flash Arrays Introduction (GS3025A)
infortrendgroup
 
Solid State Drives - Seminar for Computer Engineering Semester 6 - VIT,Univer...
ravipbhat
 
3 5 SSD
ssuser1eca7d
 
"Achieving Flash Memory's Full Potential" @ Flash Memory Summit 2012
Darpan Dinker
 
Evaluating MLC vs TLC vs V-NAND for Enterprise SSDs – Whitepaper
Samsung Business USA
 
Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra
DataStax Academy
 
Apresentacao Solid Access Corp Presentation Openware 5 20 10
Sidnir Vieira
 
FlashMemorySummit_2015_NVMFS
Dhananjoy ( Joy ) Das
 
Webinar: How NVMe Will Change Flash Storage
Storage Switzerland
 

Viewers also liked (17)

PDF
Link 8 summer 2011
Garrett Duffy
 
PPT
Webs 2
0scaarciy0uuh
 
PDF
Portfolio
loods62
 
PPT
Trad.It.Pres Hispaniane Tprelanzamiento1
Raffaella Hispanianet
 
PPT
Presentazione di Hispanianet
Raffaella Hispanianet
 
PPTX
Swm Presentation Ru
guestd722b07
 
PDF
Boekje
loods62
 
PPT
Trad.It.Pres Hispaniane Tprelanzamiento1
Raffaella Hispanianet
 
PPT
RDC overview Sept 09
Garrett Duffy
 
PPS
45 Lessons in Life
guest8bb8fa
 
PPTX
Verbieden helpt niet
NeleVDB
 
PPT
Presentazione di H I S P A N I A N E T
Raffaella Hispanianet
 
PPTX
Verbieden Helpt Niet.
NeleVDB
 
PDF
Link9
Garrett Duffy
 
PDF
Unit 1&2
IIT BHU
 
PPT
Introduction to Administaff
gxhamm
 
PPT
КОНВЕРГЕНЦИЯ КАК НОВЫЙ МЕДИЙНЫЙ СУПЕРГЕРОЙ
Mary Sirosh
 
Link 8 summer 2011
Garrett Duffy
 
Portfolio
loods62
 
Trad.It.Pres Hispaniane Tprelanzamiento1
Raffaella Hispanianet
 
Presentazione di Hispanianet
Raffaella Hispanianet
 
Swm Presentation Ru
guestd722b07
 
Boekje
loods62
 
Trad.It.Pres Hispaniane Tprelanzamiento1
Raffaella Hispanianet
 
RDC overview Sept 09
Garrett Duffy
 
45 Lessons in Life
guest8bb8fa
 
Verbieden helpt niet
NeleVDB
 
Presentazione di H I S P A N I A N E T
Raffaella Hispanianet
 
Verbieden Helpt Niet.
NeleVDB
 
Unit 1&2
IIT BHU
 
Introduction to Administaff
gxhamm
 
КОНВЕРГЕНЦИЯ КАК НОВЫЙ МЕДИЙНЫЙ СУПЕРГЕРОЙ
Mary Sirosh
 
Ad

Similar to Sigmod08ssd slides (20)

PDF
Nachos 2
Nightcrowl
 
PDF
Nachos 2
Nightcrowl
 
PDF
SSD Deployment Strategies for MySQL
Yoshinori Matsunobu
 
PDF
An Overview of Flash Storage for Databases
ConFoo
 
PDF
Application acceleration from the data storage perspective
Interop
 
PPT
Solid state drives
Manmath Agarwal
 
PPT
Nachos Extra Points
Eduardo Triana
 
PDF
Using Storage Class Memory
PerconaPerformance
 
PDF
An Assessment of SSD Performance in the IBM System Storage DS8000
IBM India Smarter Computing
 
PDF
An Assessment of SSD Performance in the IBM System Storage DS8000
IBM India Smarter Computing
 
PDF
[G2]fa ce deview_2012
NAVER D2
 
PPT
Open Ware Ramsan Dram Ssd
Sidnir Vieira
 
PPTX
Selection Of Perfect Memory for SOC design
tusharchauhan96901
 
PPTX
Ppt ssd
Amogha Bandrikalli
 
PPTX
Making the most of ssd in oracle11g
Guy Harrison
 
PPT
Storage and storage devices
Nilabh Verma
 
PDF
Overview and current topics in solid state storage
Interop
 
PPTX
Top Technology Trends
InnoTech
 
PPT
Computer Memory
PravinGhosekar
 
PDF
Fusion-io SSD and SQL Server 2008
Mark Ginnebaugh
 
Nachos 2
Nightcrowl
 
Nachos 2
Nightcrowl
 
SSD Deployment Strategies for MySQL
Yoshinori Matsunobu
 
An Overview of Flash Storage for Databases
ConFoo
 
Application acceleration from the data storage perspective
Interop
 
Solid state drives
Manmath Agarwal
 
Nachos Extra Points
Eduardo Triana
 
Using Storage Class Memory
PerconaPerformance
 
An Assessment of SSD Performance in the IBM System Storage DS8000
IBM India Smarter Computing
 
An Assessment of SSD Performance in the IBM System Storage DS8000
IBM India Smarter Computing
 
[G2]fa ce deview_2012
NAVER D2
 
Open Ware Ramsan Dram Ssd
Sidnir Vieira
 
Selection Of Perfect Memory for SOC design
tusharchauhan96901
 
Making the most of ssd in oracle11g
Guy Harrison
 
Storage and storage devices
Nilabh Verma
 
Overview and current topics in solid state storage
Interop
 
Top Technology Trends
InnoTech
 
Computer Memory
PravinGhosekar
 
Fusion-io SSD and SQL Server 2008
Mark Ginnebaugh
 
Ad

Recently uploaded (20)

PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 

Sigmod08ssd slides

  • 1. SIGMOD’08 SIGMOD’08 A Case for Flash Memory SSD in Enterprise Database Applications Sang-Won Lee Bongki Moon Sungkyunkwan University University of Arizona Chanik Park Jae-Myung Kim Sang-Woo Kim Samsung Electronics Co., Ldt. Ldt. Altibase Corp. Sungkyunkwan University COMPUTER SCIENCE DEPARTMENT ACM SIGMOD, Vancouver Canada, June 2008 -1-
  • 2. Magnetic Disk vs Flash SSD Champion M-Tron Flash SSD for 50 years 32GB 2.5 inch New challengers! Seagate ST340016A 40GB,7200rpm Samsung FlashSSD 32GB 1.8 inch ACM SIGMOD, Vancouver Canada, June 2008 -2-
  • 3. Trend in Market Today • In mobile storage market NAND flash memory wins over hard disk in mobile storage market • PDA, MP3, mobile phone, digital camera, ... Due to advantages in size, weight, shock resistance, power consumption, noise … • In personal computer market Compete with hard disk in personal computer market • 32GB Flash SSD: M-Tron, Samsung, SanDisk Vendors launched new lines of personal computers with NAND flash SSD replacing hard disk • Apple, Samsung, and others COMPUTER SCIENCE DEPARTMENT ACM SIGMOD, Vancouver Canada, June 2008 -3-
  • 4. Market Trend in Prospect • Price drops quickly NAND flash is a lot cheaper than DRAM; • ASP/MB of NAND < 1/3 of ASP/MB of DRAM as of 2007. Still much more expensive than magnetic disk. Annual drop in ASP/MB was about 60% in 2006. Projected annual drop in ASP/MB is about 30-40% in next 5 years. [Eli Harari@SanDisk, August 2007] • Emerging Enterprise Market NAND ASP was $10/GB in 2007. With 40% annual drop, it could be $800/TB in 2012. Not inconceivable to run a full database server on a computing platform with TB-scale Flash SSD as secondary storage. COMPUTER SCIENCE DEPARTMENT ACM SIGMOD, Vancouver Canada, June 2008 -4-
  • 5. Technology Trend in Prospect • NAND flash density increases faster than Moore’s law Predicted twofold annual increase of NAND flash density until 2012 [Hwang, ProcIEEE’03] Toshiba hopes for 512GB SSD by the end of 2009 • 30 nm chip-making process, Multi-level-cell (MLC) • Bandwidth catches up Samsung MCAQE32G8APP-0XA [2006] • Sustained read 56 MB/sec, sustained write 32 MB/sec Samsung, Mtron [Feb. 2008] • Sustained read 100~120 MB/sec, sustained write 80~90 MB/sec Intel-Micron’s 4-plane architecture + higher clock speed [Feb. 2008] • Sustained read 200 MB/sec, sustained write 100 MB/sec Samsung MLC-based 256GB SSD with SATA-II [May 2008] • Sustained read 200 MB/sec, sustained write 160 MB/sec COMPUTER SCIENCE DEPARTMENT ACM SIGMOD, Vancouver Canada, June 2008 -5-
  • 6. Past Trend of Disk • From 1983 to 2003 [Patterson, CACM 47(10) 2004] Capacity increased about 2500 times (0.03 GB 73.4 GB) Bandwidth improved 143.3 times (0.6 MB/s 86 MB/s) Latency improved 8.5 times (48.3 ms 5.7 ms) Year 1983 1990 1994 1998 2003 Product CDC Seagate Seagate Seagate Seagate 94145-36 ST41600 ST15150 ST39102 ST373453 Capacity 0.03 GB 1.4 GB 4.3 GB 9.1 GB 73.4 GB RPM 3600 5400 7200 10000 15000 Bandwidth 0.6 4 9 24 86 (MB/sec) Media 5.25 5.25 3.5 3.0 2.5 diameter Latency 48.3 17.1 12.7 8.8 5.7 (msec) ACM SIGMOD, Vancouver Canada, June 2008 -6-
  • 7. Latency of Disk Lags • Trend In the time that bandwidth doubles, latency improves by no more than a factor of 1.2 to 1.4. • Latency improves by no more than square root of the improvement in bandwidth. The bandwidth-latency imbalance may be even more evident in the future. • The trouble is Latency remains important for • Interactive applications, database logging (or whenever I/O must be done synchronously) • What can NAND Flash Memory do for this? ACM SIGMOD, Vancouver Canada, June 2008 -7-
  • 8. Magnetic Disk vs NAND Flash • Below is what the data sheets show Sustained Transfer Rate Average Latency Magnetic Disk 110 MB/sec 8.33 msec NAND Flash SSD 56 MB/sec (read) 0.2 msec (read) 32 MB/sec (write) 0.4 msec (write) Magnetic Disk : Seagate Barracuda 7200.10 ST3250310AS NAND Flash SSD : Samsung MCAQE32G8APP-0XA drive with K9WAG08U1A 16 Gbits SLC NAND chips • Newer SSD products report much higher bandwidth for read and write COMPUTER SCIENCE DEPARTMENT ACM SIGMOD, Vancouver Canada, June 2008 -8-
  • 9. Characteristics of NAND Flash • No mechanical latency Flash memory is an electronic device without moving parts Provides uniform random access speed without seek/rotational latency • Very low latency, independently of physical location of data • Asymmetric read & write speed Read speed is typically at least twice faster than write speed • (E.g.) Samsung 16 Gbits SLC NAND chips: 80 µsec vs 200 µsec (2 KB) • No in-place update No data item or page can be updated in place before erasing it first. • An erase unit (typically 128 KB) is much larger than a page (2 KB). • (E.g.) Samsung 16 Gbits SLC NAND chips: 1.5 msec (128 KB) Write (and erase) optimization is critical COMPUTER SCIENCE DEPARTMENT ACM SIGMOD, Vancouver Canada, June 2008 -9-
  • 10. Flash SSD for Databases? • Immediate benefit for some DB operations Reduce commit-time delay by fast logging Reduce read time for multi-versioned data • Still, many concerns to be addressed Random scattered I/O is very common in OLTP • Slow random writes by flash SSD can handle this? Flash-aware design of DBMS? Flash-friendly algorithms? Flash-friendly implementation? ACM SIGMOD, Vancouver Canada, June 2008 -10-
  • 11. Transactional Log SQL Queries System Buffer Cache Database Transaction Temporary Rollback Table space (Redo) Log Table Space Segments ACM SIGMOD, Vancouver Canada, June 2008 -11-
  • 12. Commit-time Delay by Logging • Write Ahead Log (WAL) T1 T2 … Tn A committing transaction force-writes its SQL log records Buffer Log Buffer Makes it hard to hide latency With a separate disk for logging pi • No seek delay, but … • Half a revolution of spindle on average • 4.2 msec (7200RPM), 2.0 msec (15k RPM) DB With a Flash SSD: about 0.4 msec LOG • Commit-time delay remains to be a significant overhead Group-commit helps but the delay doesn’t go away altogether. • How much commit-time delay? On average, 8.1 msec (HDD) vs 1.3 msec (SDD) : 6-fold reduction • TPC-B benchmark with 20 concurrent users. ACM SIGMOD, Vancouver Canada, June 2008 -12-
  • 13. HDD vs SSD for Logging • With SSD for log CPU better utilized • By shortening commit- time, and serving more active transactions. Leads to higher TPS • Exaggerated by caching entire DB in memory • TPC-B to stress-test logging Transaction commit rate higher than TPC-C ACM SIGMOD, Vancouver Canada, June 2008 -13-
  • 14. Temporary Table Space SQL Queries System Buffer Cache Database Transaction Temporary Rollback Table space (Redo) Log Table Space Segments ACM SIGMOD, Vancouver Canada, June 2008 -14-
  • 15. Temp Data and Query Time • Query processing often generates temp data Sorts, joins, index creation, etc. Typically bulky, performed in foreground; Direct impact on query processing time • Typically stored in separate storage devices • Ask the same question What happens if SSD replaces HDD for temporary table spaces? ACM SIGMOD, Vancouver Canada, June 2008 -15-
  • 16. External Sort: I/O Pattern • External Sort algorithm runs in two phases Sorted run generation • Partitioned to chunks, sorted separately and, saved in sorted runs • Read sequentially from table space, written sequentially into temp space Merging sorted runs • Read randomly from temp space, written sequentially into table space • Dominant I/O patterns are sequential write followed by random read No-in-place-update limitation is avoided. These are flash-friendly I/O patterns!! ACM SIGMOD, Vancouver Canada, June 2008 -16-
  • 17. External Sort: Performance • HDD vs SSD as a medium for a temp table space Sort a table of 2 M tuples (200 MB), with 2 MB buffer cache • SSD is good at sequential write + random read Almost an order of magnitude reduction in merge times ACM SIGMOD, Vancouver Canada, June 2008 -17-
  • 18. One Less Tuning Knob? • Cluster sizes for Sorting? • With a larger cluster Disk bandwidth improves (by hiding latency) The amount of I/O may also increase due to reduced fan-in for merging sorted runs • Flash SSD is With low latency, not as sensitive to the cluster size 2KB page was the best with the max fan-in ACM SIGMOD, Vancouver Canada, June 2008 -18-
  • 19. Hash-Sort Duality a Myth? • The I/O pattern of hashing is said to be random write (for writing hash buckets) + sequential read (for probing hash buckets) As opposed to sort (sequential write + random read) • If it’s the case, hashing is not flash-friendly. Re-implement hashing to make it flash-friendly? It appears already done by some vendors. • The observed I/O pattern was quite similar to that of sort (sequential write + random read) ACM SIGMOD, Vancouver Canada, June 2008 -19-
  • 20. Hash Join: Performance • HDD vs SSD as a medium for a temp table space Hash-join two tables of 2 M tuples (200 MB) each, with 2 MB buffer cache About 3-fold reduction in join time ACM SIGMOD, Vancouver Canada, June 2008 -20-
  • 21. Rollback Segments SQL Queries System Buffer Cache Database Transaction Temporary Rollback Table space (Redo) Log Table Space Segments ACM SIGMOD, Vancouver Canada, June 2008 -21-
  • 22. MVCC Rollback Segments • Multi-version Concurrency Control (MVCC) Alternative to traditional Lock-based CC Support read consistency and snapshot isolation Oracle, PostgresSQL, Sybase, SQL Server 2005, MySQL • Rollback Segments When updating an object, its current value is recorded in the rollback segment To fetch the correct version of an object, check whether it has been updated by other transactions Each transaction is assigned to a rollback segment; old images of data are written to the rollback segment sequentially (in append-only fashion). ACM SIGMOD, Vancouver Canada, June 2008 -22-
  • 23. MVCC Write Pattern • Write requests from TPC-C workload Concurrent transactions generate multiple streams of append-only traffic in parallel (apart by approximately 1 MB) HDD moves disk arm very frequently SSD has no negative effect from no in-place update limitation 800 Logical sector address (x1000) 700 600 500 400 300 200 100 0 0 100 200 300 400 500 600 Time (second) ACM SIGMOD, Vancouver Canada, June 2008 -23-
  • 24. MVCC Read Performance C … • To support MV read consistency, T1 T0 B A 100 50 A 100 A 200 I/O activities will increase A long chain of old versions may have T2 to be traversed for each access to a frequently updated object • Read requests are scattered 50 -> randomly 100 A: Rollback segment (2) Old versions of an object may be stored in several rollback segments With SSD, 10-fold read time reduction was not surprising 00 - 1 200 A: > Rollback segment (1) ACM SIGMOD, Vancouver Canada, June 2008 -24-
  • 25. Database Table Space SQL Queries System Buffer Cache Database Transaction Temporary Rollback Table space (Redo) Log Table Space Segments ACM SIGMOD, Vancouver Canada, June 2008 -25-
  • 26. Workload in Table Space • TPC-C workload Exhibit little locality and sequentiality • Mix of small/medium/large read-write, read-only (join) Highly skewed • ~80% of accesses to 20% of tuples • Write caching not as effective as read caching Physical read/write ratio is much lower that logical read/write ratio • All bad news for flash memory SSD Due to the No-in-place-update limitation In-Page Logging (IPL) approach [SIGMOD’07] ACM SIGMOD, Vancouver Canada, June 2008 -26-
  • 27. Concluding Remarks • Clear and present evidences that Flash memory SSD can co- exist or even replace Magnetic Disk Even now for logging, rollback segments and temp table spaces Write optimization needed for database table spaces • Flash-Aware DBMS Design is a must! Flash-friendly algorithms, flash-friendly implementations Need fresh new look at almost everything: Buffer management, B- trees, Sorting and Hashing, Self-Tuning, File Systems, etc. COMPUTER SCIENCE DEPARTMENT ACM SIGMOD, Vancouver Canada, June 2008 -27-