SlideShare a Scribd company logo
MySQL Innovation Works -- InnoSQL

                 David Jiang
         jiangchengyao@gmail.com
           weibo.com/insidemysql
About Me
 7+ years work on different databases
   SQL Server
   MySQL
   Oracle
 Now work for Netease Development and Research Center
  Lab
   MySQL kernel development
 Author
   <<Inside MySQL: InnoDB Storage Engine>>
   <<Inside MySQL: SQL Programming >> (coming soon
    2012.3)
What is InnoSQL
 A new MySQL branch
   Open source
   High performance (flash cache)
   Ease of use
   Fully compatible with original MySQL
   Collect creative idea for MySQL and make it happen
 MySQL Innovation Works
   http://www.innomysql.org
InnoSQL Feature
 Flash Cache for InnoDB
   Provide high performance than just use SSD as durable storage
 Share memory(SHM) for InnoDB Buffer Pool
   Quick warm-up InnoDB buffer pool
   Less than 1 sec !!!
 InnoDB IO Statistic
   Get each SQL’s physical and logic read
 Page Clean Thread
   Remove block in user query thread
InnoSQL Flash Cache
 InnoSQL Flash Cache
   Using SSD as Cache
 Other flash cache solution
   Facebook flash cache
   Oracle flash cache
   Secondary Buffer Pool for InnoDB ( InnoSQL 5.5.8 )
Facebook Flash Cache
 A general solution
 Open source
   https://github.com/facebook/flashcache
 Integration with file systems
   built using the Linux Device Mapper
 Not optimize for database
 Good in read intensive workload
 Worse in write intensive workload
 Need time to warm up
Oracle Flash Cache
 Work for Oracle 11g
 Page write to flash cache is slow
   Not so aggressive
 Need warm up
Secondary Buffer Pool
 Support in InnoSQL 5.5.8
 Good in read intensive workload
 Also not good for write intensive workload
   TPC-C
 Can warm up database when start up
   Slow for each start
 Cache is not a persistent storage
Why need warm up ?
 Capacity:
   SSD >> Memory
 Speed
   SSD << Memory
Flash Cache in InnoSQL 5.5.13
 Can cache both read & write operation
 Sequential write on SSD
   No random write
 Merge write
 Cache is persistent
Why not use SSD as durable storage
 SSD is good for random read
   7000+ IOPS
   100 ~ 150 IOPS for disk
 SSD life cycle
 SSD write performance
   Write: page
   Wipe: extent ( 128~256 page)
 Database is not fully optimized for SSD
   Read ahead algorithm
   512 bytes alignment write for log file
   Random write
Why use SSD as Cache
 Cache is everywhere
   Register
   L1 cache
   L2 cache       volatile

   L3 cache
   Memory
                              SSD
   Disk
                  non-volatile
   Tape
Question
 Using your SSD as volatile or non-volatile ?
Analyze
 If use SSD as durable storage
    Non-volatile
    But now the database not fully optimize it
 If use Secondary Buffer Pool or Oracle Flash Cache
    Volatile
    Performance degrade
       Need to write twice ( flash cache & durable storage )
 If use Facebook flash cache
    Volatile or Non-volatile
       Base on cache modes
           Writethrough
           Writearound
           writeback
    Performance degrade
      Still need to write twice, but use some optimization
    Not fully optimize for database
Cache in MySQL InnoDB
 InnoDB Buffer Pool
   Cache page
   Asynchronous operation for page
     Read page in buffer pool first
     Modify page in buffer pool first
     Then make fuzzy or sharp checkpoint to disk
     Need log manager for recovery
   More buffer pool, better performance
     Because speed gap between disk and memory
     However, we can not get enough memory to cache all the database
Cache in MySQL InnoDB
 Insert Buffer
    Insert buffer is a B+ Tree,
       MySQL version < 4.1.x, one table on insert buffer tree.
          (page_no, fields_type_info, actual record)
       >=4.1, only on insert buffer tree.
          (space_id, one-byte-marker, page_no,fields_type_info, actual record)
          index by (space_id, page_no)
    Work for non-unique secondary index
       Write to insert buffer , if page is not in the buffer pool
       Insert buffer bitmap page to track the free space of page
          2 bit per page
    Merge write operation
       Merge write
       Delay page write
       raise write performance
       However, increase read operation
    MySQL 5.5 Change Buffer
       insert、purge、delete mark
InnoDB Insert Buffer
mysql> show engine innodb statusG;
*************************** 1. row ***************************
Status:
=====================================
090922 11:52:51 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 15 seconds
……
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
          Used Page       Free Page         Seg size=size+free list len+1
-------------------------------------
Ibuf: size 2249, free list len 3346, seg size 5596,
374650 inserts, 51897 merged recs, 14300 merges
Hash table size 4980499, node heap has 1246 buffer(s)
1640.60 hash searches/s, 3709.46 non-hash searches/s
                merged recs: merges = insert buffer efficiency
Cache in MySQL InnoDB
 Cache can increase performance
 Delay write operation
   Gap between disk and cache
 However, there is another cache in InnoDB
   Doublewrite
What is Doublewrite ?
 Doublewrite
   Avoid partial write problem
     512 byte write is always OK
     But 16K write is not
   Doublewrite buffer
     2M
   Doublewrite file
     2M
     Share tablespace: ibdata1
Doublewrite Architecture
       Stores all data twice, first to the doublewrite buffer, and then
        to the actual data files
       --skip-innodb_doublewrite

mysql> show global status like 'innodb_dbl%'G;
************** 1. row ************************
Variable_name: Innodb_dblwr_pages_written
     Value: 152362
************** 2. row ************************
Variable_name: Innodb_dblwr_writes
     Value: 1465
2 rows in set (0.00 sec)
Doublewrite Feature
 Size: 2M
 All the page should first write here
 Sequential write
 Cache write



     Hence, what about have a 100G or 300G doublewrite ?
     This makes flash cache happen
Flash Cache in InnoSQL 5.5.13
 Replace original doublewrite work
 Now user can have a large doublewrite
 Page write is sequential
   SSD write feature
 Doublewrite can read now
   SSD random read feature
 Cache both read and write operation
 Persistent cache
 Merge write
   60 ~ 70% in workload like TPC-C
 Support AIO read on flash cache
   Not supported in Secondary Buffer Pool
Flash Cache Architecture
Flash Cache Data Structure
/** Flash cache block struct */
struct trx_flashcache_block_struct{
   unsigned      space:32;       /*!< tablespace id */
   unsigned      offset:32;      /*!< page number */
   unsigned      fil_offset:32; /*!< flash cache page number */
   unsigned      state:2;        /*!< flash cache state*/
   trx_flashcache_block_t* hash; /*!< hash chain */
};                                   Four State:
                                    BLOCK_NOT_USED
                                    BLOCK_READY_FOR_FLUSH
                                    BLOCK_READ_CACHE
                                    BLOCK_FLUSHED
Flash Cache Data Structure
struct trx_flashcache_struct{
   mutex_t         fc_mutex;/*!< mutex protecting flash cache */
   hash_table_t* fc_hash; /*!< hash table of flash cache pages */
   ulint           fc_size; /*!< flash cache size */
   ulint           write_off; /*!< write to flash cache offset */
   ulint           flush_off; /*!< flush to disk this offset */
   ulint           write_round; /* write round */
   ulint           flush_round; /* flush round */
   trx_flashcache_block_t* block; /* flash cache block */
   byte*           read_buf_unalign; /* unalign read buf */
   byte*           read_buf;          /* read buf */
}
From Developer Perspective View
    Write                                Flash Cache File
                       flush_offset                                  write_offset



     Block     Block       Block      Block    Block        Block   Block    Block



Flash Cache Block


                                       Flash Cache Hash Table           Lookup
    Flash Cache Log File
    write_offset                            (In Memory)
    flush _offset
    write_round
    flush_round
Flash Cache Flush Algorithms
   Flush page in flash cache to disk
   Take over the flush in master thread
   Flush in flash cache background thread
   Algorithms
     Less than innodb_flash_cache_write_cache_pct
        No flush
        Default 10
     Less than innodb_flash_cache_do_full_io_pct
        Flush 10% innodb_io_capacity
        Default 90
     Else
        Flush 100% innodb_io_capacity
     If idle
        Flush 100% innodb_io_capacity
Merge Write in Flash Cache

         flush_offset
                                                              write_offset


 (7,7)     (2,6)        (0,6)   (3,7)   ……   (3,7)   (2,6)   (4,8)




                   Page (2,6)、(3,7) can be merged
                   This much like insert buffer
                   Delay write operation
Flash Cache Benchmark
 Sysbench OLTP
   Read intensive
 TPC-C
   Write intensive
 Blogbench
   Blog like application oriented
   Developed by Netease
Sysbench OLTP




InnoDB Buffer Pool: 6G
DB Size: 19G
innodb_flush_method = O_DIRECT
innodb_flush_log_at_trx_commit = 1
TPC-C




InnoDB Buffer Pool: 12G
DB Size: 39G                         SSD:3607.183 Tpm
innodb_flush_method = O_DIRECT       Flash Cache:7230.05 Tpm
innodb_flush_log_at_trx_commit = 1   Merge Write Ratio:65.47%
Flash Cache: 100G
Blogbench




InnoDB Buffer Pool: 4G
DB Size: 21G
innodb_flush_method = O_DIRECT
innodb_flush_log_at_trx_commit = 1
Merge write ratio: 60%
Conclusion
 Flash Cache can work in both read and write workload
 Work better than using SSD as durable storage
 Optimize for SSD in database kernel
 No more writes in flash cache
 Merge write support
SHM for InnoDB Buffer Pool
 Use share memory to allocate innodb buffer pool
 Why use share memory?
   Speed warm up
 Warm up speed?
   Random read 10~20M/sec
   30G buffer pool need 30~60 minutes
Warm up Method
 Use SQL to warm up
   SELECT count(*) FROM table ( force index ( primary key ) )
   Warm up speed convert to sequential read
   But can not make database to previous workload environment
 Dump buffer pool to file
   MySQL 5.6+ support
   Warm up speed convert to sequential read
   Make database to previous workload environment
   Dump file is big
   Database crash ?
Warm up Method
 Percona Server
   Export (space_id, page_no) in LRU list to file
   Load this file order by (space_id,page_no) to make read
    sequential when MySQL is startup
   Make database to previous workload environment
   Still need long time to warm up
     if you have big buffer pool:128G、256G
Warm up in InnoSQL
 Use share memory
   --innodb_use_shm_preload=1
 Share memory configuration like Oracle
   /proc/sys/kernel/shmmax
   /proc/sys/kernel/shmall
 Warm up less than 1 sec
   All page is in memory
SHM for InnoDB Buffer Pool
# list share memory info
innosql@db-62:~$ ipcs -a
 ------ Shared Memory Segments --------
key       shmid owner perms bytes nattch status
0x0008c231 4653056 innosql 600          549715968 0
 ------ Semaphore Arrays --------
key       semid owner perms nsems
 ------ Message Queues --------
key       msqid owner perms used-bytes messages

# remove share memory
innosql@db-62:~$ ipcrm -m 4653056
InnoDB IO Statistics
 Get read IO statistics
   Like SQL Server:SET STATISTICS IO ON
 InnoSQL realize it in Slow query Log
   Both file and table
 Help SQL developer
   10 reads may be not good in OLTP application
 Help DBA
   Know the SQL real IO statistics
   Not only the time it consumes
 Still in develop
   You can preview this feature
InnoDB IO Statistics
# Time: 111103 13:29:06
# User@Host: root[root] @ localhost [::1]
# Query_time: 119.293823 Lock_time: 119.274822 Rows_sent: 1
   Rows_examined: 1 Logical_reads: 198 Physical_reads: 3
use tpcc;
SET timestamp=1320298146;
select * from warehouse where w_id=1;
# Time: 111103 13:31:28
# User@Host: root[root] @ localhost [::1]
# Query_time: 0.335019 Lock_time: 0.333019 Rows_sent: 1
   Rows_examined: 1 Logical_reads: 164 Physical_reads: 50
SET timestamp=1320298288;
select * from history;
Configuration
 long_query_time
 io_slow_query
 slow_query_type
   0 long_query_time
   1 io_slow_query
   2 both
Page Cleaner Thread
 Flush page in Master Thread
   Adaptive Flush
   IO Capacity
 Problem
   Master Thread have a lot to cope
   Async flush can block user query thread
 Page cleaner thread
   MySQL 5.6 support
   InnoSQL support it in MySQL 5.5
   Can also help flush in FLUSH_LRU_LIST
Flush Algorithms in InnoDB
 checkpoint_age:current_lsn – checkpint_lsn
 async_water_mark: ~78%*Log_Group_Size
 sync_water_mark: ~90%*Log_Group_Size
 For example:
   Log file size 1G, Log file number 2
   Async_water_mark = ~1.5G
   Sync_water_mark = ~1.8G
Flush Algorithms in InnoDB
 checkpoint_age < async_water_mark
   adaptive_flusing
   5% innodb_io_capacity
 async_water_mark < checkpoint_age < sync_water_mark
   Block one user query thread
   Async flush
 checkpoint_age > sync_water
   Block all user query thread
   Sync flush
 n_dirty_pages > innodb_max_dirty_page_pct
   Flush innodb_io_capacity
Page Cleaner Thread
 Reduce master thread burden
 Async flush move to this background
   No block happened in user query thread
However
 Flush not only happen in master thread
 FLUSH_LRU_LIST
   Check if there at least 64 page can be used
   In this situation, flush almost in user query thread
   Adaptive flush, innodb_io_capacity helps nothing
   Happen in user query thread
 InnoSQL also move this flush to page cleaner thread
   MySQL 5.6 does not support
   Still need more optimize
Q &A

More Related Content

What's hot (18)

PPTX
Sql server scalability fundamentals
Chris Adkin
 
PPTX
Leveraging memory in sql server
Chris Adkin
 
PPTX
VLDB Administration Strategies
Murilo Miranda
 
PPTX
Column store indexes and batch processing mode (nx power lite)
Chris Adkin
 
PDF
Percona xtrabackup - MySQL Meetup @ Mumbai
Nilnandan Joshi
 
PPTX
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
Chris Adkin
 
PDF
Your browser, your storage (extended version)
Francesco Fullone
 
PDF
Hdg explains swapfile.sys, hiberfil.sys and pagefile
Trường Tiền
 
PDF
Lowest Storage Cost per Desktop with NetApp without any Tradeoffs
NetApp
 
PPT
Drupal Performance - SerBenfiquista.com Case Study
hernanibf
 
PDF
Introducing Xtrabackup Manager
Henrik Ingo
 
PPTX
Sql server engine cpu cache as the new ram
Chris Adkin
 
PPTX
Understanding Solid State Disk and the Oracle Database Flash Cache (older ver...
Guy Harrison
 
PPTX
Building scalable application with sql server
Chris Adkin
 
ODP
PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Experts, Inc.
 
PPTX
Scaling sql server 2014 parallel insert
Chris Adkin
 
PDF
2008 MySQL Conference Recap
Chris Barber
 
PDF
The effect of page size modification on jvm
Parameswaran Selvam
 
Sql server scalability fundamentals
Chris Adkin
 
Leveraging memory in sql server
Chris Adkin
 
VLDB Administration Strategies
Murilo Miranda
 
Column store indexes and batch processing mode (nx power lite)
Chris Adkin
 
Percona xtrabackup - MySQL Meetup @ Mumbai
Nilnandan Joshi
 
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
Chris Adkin
 
Your browser, your storage (extended version)
Francesco Fullone
 
Hdg explains swapfile.sys, hiberfil.sys and pagefile
Trường Tiền
 
Lowest Storage Cost per Desktop with NetApp without any Tradeoffs
NetApp
 
Drupal Performance - SerBenfiquista.com Case Study
hernanibf
 
Introducing Xtrabackup Manager
Henrik Ingo
 
Sql server engine cpu cache as the new ram
Chris Adkin
 
Understanding Solid State Disk and the Oracle Database Flash Cache (older ver...
Guy Harrison
 
Building scalable application with sql server
Chris Adkin
 
PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Experts, Inc.
 
Scaling sql server 2014 parallel insert
Chris Adkin
 
2008 MySQL Conference Recap
Chris Barber
 
The effect of page size modification on jvm
Parameswaran Selvam
 

Viewers also liked (9)

PPTX
云端的数据库
thinkinlamp
 
PPT
数据仓库
thinkinlamp
 
PPT
Scrum beyond software (think in lamp version)
thinkinlamp
 
PPSX
对My sql dba的一些思考
thinkinlamp
 
PPT
项目中的知识管理
thinkinlamp
 
PPT
《Scrum漫谈》
thinkinlamp
 
PPT
别让专业水平外的因素拖
thinkinlamp
 
PPT
Nosql七种武器之长生剑 mongodb的使用介绍
thinkinlamp
 
PPTX
The art of storytelling and how it can help make a better world
(mostly) TRUE THINGS
 
云端的数据库
thinkinlamp
 
数据仓库
thinkinlamp
 
Scrum beyond software (think in lamp version)
thinkinlamp
 
对My sql dba的一些思考
thinkinlamp
 
项目中的知识管理
thinkinlamp
 
《Scrum漫谈》
thinkinlamp
 
别让专业水平外的因素拖
thinkinlamp
 
Nosql七种武器之长生剑 mongodb的使用介绍
thinkinlamp
 
The art of storytelling and how it can help make a better world
(mostly) TRUE THINGS
 
Ad

Similar to My sql innovation work -innosql (20)

PDF
The InnoDB Storage Engine for MySQL
Morgan Tocker
 
PDF
MySQL Oslayer performace optimization
Louis liu
 
ODP
Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...
Zarafa
 
ODP
Bcache and Aerospike
Anshu Prateek
 
PDF
SSD based storage tuning for databases
Angelo Rajadurai
 
PPT
jacobs_tuuri_performance
Hiroshi Ono
 
PDF
Database performance tuning for SSD based storage
Angelo Rajadurai
 
PPTX
cPanelCon 2014: InnoDB Anatomy
Ryan Robson
 
PDF
MySQL5.7 Innodb_enhance_parti_20160317
Saewoong Lee
 
PPTX
Managing Memory & Locks - Series 1 Memory Management
DAGEOP LTD
 
PPTX
Optimizing Oracle databases with SSD - April 2014
Guy Harrison
 
PPTX
Serve like a boss (part two)
Hamed Nemati
 
ODP
Caching and tuning fun for high scalability
Wim Godden
 
PDF
Percona 服务器与 XtraDB 存储引擎
YUCHENG HU
 
ODP
Inno db 5_7_features
Tinku Ajit
 
ODP
Caching and tuning fun for high scalability @ PHPTour
Wim Godden
 
PDF
Measuring Firebird Disk I/O
Mind The Firebird
 
PPTX
Percona XtraBackup - New Features and Improvements
Marcelo Altmann
 
PPT
Linux Memory
Vitaly Nahshunov
 
The InnoDB Storage Engine for MySQL
Morgan Tocker
 
MySQL Oslayer performace optimization
Louis liu
 
Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...
Zarafa
 
Bcache and Aerospike
Anshu Prateek
 
SSD based storage tuning for databases
Angelo Rajadurai
 
jacobs_tuuri_performance
Hiroshi Ono
 
Database performance tuning for SSD based storage
Angelo Rajadurai
 
cPanelCon 2014: InnoDB Anatomy
Ryan Robson
 
MySQL5.7 Innodb_enhance_parti_20160317
Saewoong Lee
 
Managing Memory & Locks - Series 1 Memory Management
DAGEOP LTD
 
Optimizing Oracle databases with SSD - April 2014
Guy Harrison
 
Serve like a boss (part two)
Hamed Nemati
 
Caching and tuning fun for high scalability
Wim Godden
 
Percona 服务器与 XtraDB 存储引擎
YUCHENG HU
 
Inno db 5_7_features
Tinku Ajit
 
Caching and tuning fun for high scalability @ PHPTour
Wim Godden
 
Measuring Firebird Disk I/O
Mind The Firebird
 
Percona XtraBackup - New Features and Improvements
Marcelo Altmann
 
Linux Memory
Vitaly Nahshunov
 
Ad

More from thinkinlamp (20)

PPSX
浅谈 My sql 性能调优
thinkinlamp
 
PPT
2011 06-12-why do we need the rabbit
thinkinlamp
 
PDF
2011 06-12-lamp-mysql-顾春江
thinkinlamp
 
PPT
蜘蛛
thinkinlamp
 
PPTX
大型微博应用Feed系统浅析
thinkinlamp
 
PDF
Enterprise connect
thinkinlamp
 
PPT
I os tech talk 观后感
thinkinlamp
 
PPT
网页游戏开发与敏捷开发
thinkinlamp
 
PPT
My sql自动化监控
thinkinlamp
 
PPTX
服务化的网站架构
thinkinlamp
 
PPTX
大型互联网应用架构设计
thinkinlamp
 
PPT
Php extension开发
thinkinlamp
 
PPT
大型Sns数据库设计
thinkinlamp
 
PDF
MySQL高可用
thinkinlamp
 
PDF
Mysql overview_20100811
thinkinlamp
 
PPT
面向搜索引擎的友好程序开发
thinkinlamp
 
PPT
基于架构的开发模式
thinkinlamp
 
PPT
系统邮件实战技巧
thinkinlamp
 
PPT
Scrum pennygame
thinkinlamp
 
PPT
领域驱动设计
thinkinlamp
 
浅谈 My sql 性能调优
thinkinlamp
 
2011 06-12-why do we need the rabbit
thinkinlamp
 
2011 06-12-lamp-mysql-顾春江
thinkinlamp
 
蜘蛛
thinkinlamp
 
大型微博应用Feed系统浅析
thinkinlamp
 
Enterprise connect
thinkinlamp
 
I os tech talk 观后感
thinkinlamp
 
网页游戏开发与敏捷开发
thinkinlamp
 
My sql自动化监控
thinkinlamp
 
服务化的网站架构
thinkinlamp
 
大型互联网应用架构设计
thinkinlamp
 
Php extension开发
thinkinlamp
 
大型Sns数据库设计
thinkinlamp
 
MySQL高可用
thinkinlamp
 
Mysql overview_20100811
thinkinlamp
 
面向搜索引擎的友好程序开发
thinkinlamp
 
基于架构的开发模式
thinkinlamp
 
系统邮件实战技巧
thinkinlamp
 
Scrum pennygame
thinkinlamp
 
领域驱动设计
thinkinlamp
 

Recently uploaded (20)

PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
July Patch Tuesday
Ivanti
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
Complete Network Protection with Real-Time Security
L4RGINDIA
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
July Patch Tuesday
Ivanti
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Complete Network Protection with Real-Time Security
L4RGINDIA
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 

My sql innovation work -innosql

  • 1. MySQL Innovation Works -- InnoSQL David Jiang [email protected] weibo.com/insidemysql
  • 2. About Me  7+ years work on different databases  SQL Server  MySQL  Oracle  Now work for Netease Development and Research Center Lab  MySQL kernel development  Author  <<Inside MySQL: InnoDB Storage Engine>>  <<Inside MySQL: SQL Programming >> (coming soon 2012.3)
  • 3. What is InnoSQL  A new MySQL branch  Open source  High performance (flash cache)  Ease of use  Fully compatible with original MySQL  Collect creative idea for MySQL and make it happen  MySQL Innovation Works  http://www.innomysql.org
  • 4. InnoSQL Feature  Flash Cache for InnoDB  Provide high performance than just use SSD as durable storage  Share memory(SHM) for InnoDB Buffer Pool  Quick warm-up InnoDB buffer pool  Less than 1 sec !!!  InnoDB IO Statistic  Get each SQL’s physical and logic read  Page Clean Thread  Remove block in user query thread
  • 5. InnoSQL Flash Cache  InnoSQL Flash Cache  Using SSD as Cache  Other flash cache solution  Facebook flash cache  Oracle flash cache  Secondary Buffer Pool for InnoDB ( InnoSQL 5.5.8 )
  • 6. Facebook Flash Cache  A general solution  Open source  https://github.com/facebook/flashcache  Integration with file systems  built using the Linux Device Mapper  Not optimize for database  Good in read intensive workload  Worse in write intensive workload  Need time to warm up
  • 7. Oracle Flash Cache  Work for Oracle 11g  Page write to flash cache is slow  Not so aggressive  Need warm up
  • 8. Secondary Buffer Pool  Support in InnoSQL 5.5.8  Good in read intensive workload  Also not good for write intensive workload  TPC-C  Can warm up database when start up  Slow for each start  Cache is not a persistent storage
  • 9. Why need warm up ?  Capacity:  SSD >> Memory  Speed  SSD << Memory
  • 10. Flash Cache in InnoSQL 5.5.13  Can cache both read & write operation  Sequential write on SSD  No random write  Merge write  Cache is persistent
  • 11. Why not use SSD as durable storage  SSD is good for random read  7000+ IOPS  100 ~ 150 IOPS for disk  SSD life cycle  SSD write performance  Write: page  Wipe: extent ( 128~256 page)  Database is not fully optimized for SSD  Read ahead algorithm  512 bytes alignment write for log file  Random write
  • 12. Why use SSD as Cache  Cache is everywhere  Register  L1 cache  L2 cache volatile  L3 cache  Memory SSD  Disk non-volatile  Tape
  • 13. Question  Using your SSD as volatile or non-volatile ?
  • 14. Analyze  If use SSD as durable storage  Non-volatile  But now the database not fully optimize it  If use Secondary Buffer Pool or Oracle Flash Cache  Volatile  Performance degrade  Need to write twice ( flash cache & durable storage )  If use Facebook flash cache  Volatile or Non-volatile  Base on cache modes  Writethrough  Writearound  writeback  Performance degrade  Still need to write twice, but use some optimization  Not fully optimize for database
  • 15. Cache in MySQL InnoDB  InnoDB Buffer Pool  Cache page  Asynchronous operation for page  Read page in buffer pool first  Modify page in buffer pool first  Then make fuzzy or sharp checkpoint to disk  Need log manager for recovery  More buffer pool, better performance  Because speed gap between disk and memory  However, we can not get enough memory to cache all the database
  • 16. Cache in MySQL InnoDB  Insert Buffer  Insert buffer is a B+ Tree,  MySQL version < 4.1.x, one table on insert buffer tree.  (page_no, fields_type_info, actual record)  >=4.1, only on insert buffer tree.  (space_id, one-byte-marker, page_no,fields_type_info, actual record)  index by (space_id, page_no)  Work for non-unique secondary index  Write to insert buffer , if page is not in the buffer pool  Insert buffer bitmap page to track the free space of page  2 bit per page  Merge write operation  Merge write  Delay page write  raise write performance  However, increase read operation  MySQL 5.5 Change Buffer  insert、purge、delete mark
  • 17. InnoDB Insert Buffer mysql> show engine innodb statusG; *************************** 1. row *************************** Status: ===================================== 090922 11:52:51 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 15 seconds …… ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX Used Page Free Page Seg size=size+free list len+1 ------------------------------------- Ibuf: size 2249, free list len 3346, seg size 5596, 374650 inserts, 51897 merged recs, 14300 merges Hash table size 4980499, node heap has 1246 buffer(s) 1640.60 hash searches/s, 3709.46 non-hash searches/s merged recs: merges = insert buffer efficiency
  • 18. Cache in MySQL InnoDB  Cache can increase performance  Delay write operation  Gap between disk and cache  However, there is another cache in InnoDB  Doublewrite
  • 19. What is Doublewrite ?  Doublewrite  Avoid partial write problem  512 byte write is always OK  But 16K write is not  Doublewrite buffer  2M  Doublewrite file  2M  Share tablespace: ibdata1
  • 20. Doublewrite Architecture  Stores all data twice, first to the doublewrite buffer, and then to the actual data files  --skip-innodb_doublewrite mysql> show global status like 'innodb_dbl%'G; ************** 1. row ************************ Variable_name: Innodb_dblwr_pages_written Value: 152362 ************** 2. row ************************ Variable_name: Innodb_dblwr_writes Value: 1465 2 rows in set (0.00 sec)
  • 21. Doublewrite Feature  Size: 2M  All the page should first write here  Sequential write  Cache write Hence, what about have a 100G or 300G doublewrite ? This makes flash cache happen
  • 22. Flash Cache in InnoSQL 5.5.13  Replace original doublewrite work  Now user can have a large doublewrite  Page write is sequential  SSD write feature  Doublewrite can read now  SSD random read feature  Cache both read and write operation  Persistent cache  Merge write  60 ~ 70% in workload like TPC-C  Support AIO read on flash cache  Not supported in Secondary Buffer Pool
  • 24. Flash Cache Data Structure /** Flash cache block struct */ struct trx_flashcache_block_struct{ unsigned space:32; /*!< tablespace id */ unsigned offset:32; /*!< page number */ unsigned fil_offset:32; /*!< flash cache page number */ unsigned state:2; /*!< flash cache state*/ trx_flashcache_block_t* hash; /*!< hash chain */ }; Four State: BLOCK_NOT_USED BLOCK_READY_FOR_FLUSH BLOCK_READ_CACHE BLOCK_FLUSHED
  • 25. Flash Cache Data Structure struct trx_flashcache_struct{ mutex_t fc_mutex;/*!< mutex protecting flash cache */ hash_table_t* fc_hash; /*!< hash table of flash cache pages */ ulint fc_size; /*!< flash cache size */ ulint write_off; /*!< write to flash cache offset */ ulint flush_off; /*!< flush to disk this offset */ ulint write_round; /* write round */ ulint flush_round; /* flush round */ trx_flashcache_block_t* block; /* flash cache block */ byte* read_buf_unalign; /* unalign read buf */ byte* read_buf; /* read buf */ }
  • 26. From Developer Perspective View Write Flash Cache File flush_offset write_offset Block Block Block Block Block Block Block Block Flash Cache Block Flash Cache Hash Table Lookup Flash Cache Log File write_offset (In Memory) flush _offset write_round flush_round
  • 27. Flash Cache Flush Algorithms  Flush page in flash cache to disk  Take over the flush in master thread  Flush in flash cache background thread  Algorithms  Less than innodb_flash_cache_write_cache_pct  No flush  Default 10  Less than innodb_flash_cache_do_full_io_pct  Flush 10% innodb_io_capacity  Default 90  Else  Flush 100% innodb_io_capacity  If idle  Flush 100% innodb_io_capacity
  • 28. Merge Write in Flash Cache flush_offset write_offset (7,7) (2,6) (0,6) (3,7) …… (3,7) (2,6) (4,8) Page (2,6)、(3,7) can be merged This much like insert buffer Delay write operation
  • 29. Flash Cache Benchmark  Sysbench OLTP  Read intensive  TPC-C  Write intensive  Blogbench  Blog like application oriented  Developed by Netease
  • 30. Sysbench OLTP InnoDB Buffer Pool: 6G DB Size: 19G innodb_flush_method = O_DIRECT innodb_flush_log_at_trx_commit = 1
  • 31. TPC-C InnoDB Buffer Pool: 12G DB Size: 39G SSD:3607.183 Tpm innodb_flush_method = O_DIRECT Flash Cache:7230.05 Tpm innodb_flush_log_at_trx_commit = 1 Merge Write Ratio:65.47% Flash Cache: 100G
  • 32. Blogbench InnoDB Buffer Pool: 4G DB Size: 21G innodb_flush_method = O_DIRECT innodb_flush_log_at_trx_commit = 1 Merge write ratio: 60%
  • 33. Conclusion  Flash Cache can work in both read and write workload  Work better than using SSD as durable storage  Optimize for SSD in database kernel  No more writes in flash cache  Merge write support
  • 34. SHM for InnoDB Buffer Pool  Use share memory to allocate innodb buffer pool  Why use share memory?  Speed warm up  Warm up speed?  Random read 10~20M/sec  30G buffer pool need 30~60 minutes
  • 35. Warm up Method  Use SQL to warm up  SELECT count(*) FROM table ( force index ( primary key ) )  Warm up speed convert to sequential read  But can not make database to previous workload environment  Dump buffer pool to file  MySQL 5.6+ support  Warm up speed convert to sequential read  Make database to previous workload environment  Dump file is big  Database crash ?
  • 36. Warm up Method  Percona Server  Export (space_id, page_no) in LRU list to file  Load this file order by (space_id,page_no) to make read sequential when MySQL is startup  Make database to previous workload environment  Still need long time to warm up  if you have big buffer pool:128G、256G
  • 37. Warm up in InnoSQL  Use share memory  --innodb_use_shm_preload=1  Share memory configuration like Oracle  /proc/sys/kernel/shmmax  /proc/sys/kernel/shmall  Warm up less than 1 sec  All page is in memory
  • 38. SHM for InnoDB Buffer Pool # list share memory info innosql@db-62:~$ ipcs -a ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x0008c231 4653056 innosql 600 549715968 0 ------ Semaphore Arrays -------- key semid owner perms nsems ------ Message Queues -------- key msqid owner perms used-bytes messages # remove share memory innosql@db-62:~$ ipcrm -m 4653056
  • 39. InnoDB IO Statistics  Get read IO statistics  Like SQL Server:SET STATISTICS IO ON  InnoSQL realize it in Slow query Log  Both file and table  Help SQL developer  10 reads may be not good in OLTP application  Help DBA  Know the SQL real IO statistics  Not only the time it consumes  Still in develop  You can preview this feature
  • 40. InnoDB IO Statistics # Time: 111103 13:29:06 # User@Host: root[root] @ localhost [::1] # Query_time: 119.293823 Lock_time: 119.274822 Rows_sent: 1 Rows_examined: 1 Logical_reads: 198 Physical_reads: 3 use tpcc; SET timestamp=1320298146; select * from warehouse where w_id=1; # Time: 111103 13:31:28 # User@Host: root[root] @ localhost [::1] # Query_time: 0.335019 Lock_time: 0.333019 Rows_sent: 1 Rows_examined: 1 Logical_reads: 164 Physical_reads: 50 SET timestamp=1320298288; select * from history;
  • 41. Configuration  long_query_time  io_slow_query  slow_query_type  0 long_query_time  1 io_slow_query  2 both
  • 42. Page Cleaner Thread  Flush page in Master Thread  Adaptive Flush  IO Capacity  Problem  Master Thread have a lot to cope  Async flush can block user query thread  Page cleaner thread  MySQL 5.6 support  InnoSQL support it in MySQL 5.5  Can also help flush in FLUSH_LRU_LIST
  • 43. Flush Algorithms in InnoDB  checkpoint_age:current_lsn – checkpint_lsn  async_water_mark: ~78%*Log_Group_Size  sync_water_mark: ~90%*Log_Group_Size  For example:  Log file size 1G, Log file number 2  Async_water_mark = ~1.5G  Sync_water_mark = ~1.8G
  • 44. Flush Algorithms in InnoDB  checkpoint_age < async_water_mark  adaptive_flusing  5% innodb_io_capacity  async_water_mark < checkpoint_age < sync_water_mark  Block one user query thread  Async flush  checkpoint_age > sync_water  Block all user query thread  Sync flush  n_dirty_pages > innodb_max_dirty_page_pct  Flush innodb_io_capacity
  • 45. Page Cleaner Thread  Reduce master thread burden  Async flush move to this background  No block happened in user query thread
  • 46. However  Flush not only happen in master thread  FLUSH_LRU_LIST  Check if there at least 64 page can be used  In this situation, flush almost in user query thread  Adaptive flush, innodb_io_capacity helps nothing  Happen in user query thread  InnoSQL also move this flush to page cleaner thread  MySQL 5.6 does not support  Still need more optimize
  • 47. Q &A