SlideShare a Scribd company logo
MDG Cache
Statistics
BVQ
Analyse
Michael Pirker
Michael.Pirker@SVA.de
+49 151 180 25 26 00
Page 2
• This document shows Examples of BVQ MDisk Group Cache Partition Analysis.
• I have added some text from this IBM RedPaper
Thanks to Barry White
Each page with text taken from this RedPaper is marked with the IBM RedPaper logo
Page 3
Terminology
• Cache Cache is a combination of the actual physical cache medium and the
algorithms that are managing the data within the cache.
• Demote-Ready A track that is in the demote-ready list is either old read cache
data, or write cache data that is successfully destaged to disk.
• Destage The act of removing modified data from the cache and writing it to disk
• Modified Data Write data in cache that is not destaged to disk
• Non-owner Node The non-preferred node for a given VDisk
• Owner Node The preferred node for a given VDisk
• Partner Node The other node in a SVC IO Group
• Page The unit of data held in the cache. In the SVC, this is 4 KB. The data in the cache is managed at the page level.
A page belongs to one track.
• Track The unit of locking and destage granularity in the cache. In the SVC, a track is 32 KB in size (eight pages).
(A track might only be partially populated with valid pages.)
Page 4
Node Performance / SVC Write Cache Statistics
Write Cache is at a very high level – due to the fact that we see a mean value of all mdisk groups here it is
very likely that one or several cache partitions are overloaded. This overlaod will causes performance
issues!
CPU %CPU %
Write Cache
Full %
Write Cache
Full %
Read Cache
Full %
Read Cache
Full %
Write Cache up to 80% full Peaks
up to 90% too high
R/W Cache Max 77.5% OK
CPU 20% bis 35% - OK
Page 5
XIV Gen1
444400ms/00ms/00ms/00ms/opopopop
88880%0%0%0%----100%100%100%100%
1700ms/1700ms/1700ms/1700ms/opopopop
MDG Cache Partition Examples Overview
DS3512
1500ms/1500ms/1500ms/1500ms/opopopop
88880%0%0%0%----100%100%100%100%
DS3512
2400ms/2400ms/2400ms/2400ms/opopopop
88880%0%0%0%----100%100%100%100%
XIV Gen2
600ms/600ms/600ms/600ms/opopopop
88880%0%0%0%----100%100%100%100%
1700ms/1700ms/1700ms/1700ms/opopopop
1750ms/1750ms/1750ms/1750ms/opopopop
Overview of all MDisk Group Cache Partitions of the Customer
Page 6
MDG Group Cache Partition 3512
• This Cache Partition is very heavyly used – it sometimes reaches 100% max Fullness but
the write destage rate stays in normal ranges when this happens.
Cache Partition Fullness
min avg max
Cache Partition Fullness
min avg max
Track AccessTrack Access
Track lockTrack lock
Write DestageWrite Destage
Page 7
MDG Group Cache Partition 3512-02
• This Cache Partition looks overloaded – long periods of 100% full and even more then
100% - Write destage rates have phases of extreme high activity.
High Destage Rates
Panik Destage!
High Destage Rates
Panik Destage!
Long Periods of 100%
Max Write Cache Full
Long Periods of 100%
Max Write Cache Full
Page 8
MDG Group Cache Partition XIV01
• Looks like this XIV has generally got still some performance reserve. But there are peaks up to 90% - we
should try to figure out, where they are coming from. Most likely some Monster Volumes that write big
amounts of data into the SVC. These Volumes should not start in the moment.
Page 9
MDG Group Cache Partition XIV02
This Peak is Reference!This Peak is Reference!
• Looks like the XIV is very hard working but not yet overloaded. What happens when we add
more load to this system? Write Cache Full will start to raise and problems will start when
we reach the 100% limit more often. There is one peak up to 90% - this is the upper level
not the “flat line” at 83%
Page 10
Performant System – no Cache issue
• This is the system of another customer with only onbe Mdisk Group in the SVC. This one MDISK Group
can use all the existing Node Cache. We also have very perfromant storage systems in the backend.
Page 11
Write Life Cycle to SVC
Excerpt from Read Piece
• When a write is issued to the SVC, it passes through the upper layers in the
software stack and add into the cache. Before the cache returns completion to
the host, the write must mirror the partner node. After the I/O is written to the
partner node, and acknowledged back to the owner node, the I/O is completed
back to the host1.
• The LRU algorithm places a pointer to this data at the top of the LRU. As
subsequent I/O is requested, they are placed at the top of the LRU list. Over
time, our initial write moves down the list and eventually reaches the bottom.
• When certain conditions are met, the cache decides that it needs to free a certain
amount of data. This data is taken from the bottom of the LRU list, and in the
case of write data, is committed to disk. This destage operation is only performed
by the owner node.
• When the owner node receives confirmation that the write to disk is successful,
the control blocks associated with the track are modified to mark that the track
now contains read cache data, and is added to the demote-ready list.
Subsequent reads of recently written data are returned from cache. The owner
node notifies the partner node that the write data is complete, and the partner
node discards the data. The data is discarded on the non-owner (partner) node,
because reads are not expected to occur to non-owner nodes.
• Write
put in Cache as Tracks
copy to partner node
accnowledge back
• Cache LTU Algorithm
New Data Top of list Tracks move
down the list when new data arrives
in cache
• Cache becomes full Destage
needs to be freed.
Destage LRU Tracks
Only Owner node destages data!
• Destage Operation
write Track to disk system
receive ACK
track modified to read cache
Track added to demote ready list.
Partner Node will be informed.
Partner Node discards data
because mirror not needed for read
cache data
Page 12
Read Life Cycle
Excerpt from Read Piece
• When a read is issued to the SVC, it passes through the upper layers in the
software stack and the cache checks to see if the required data resides in cache.
• If a read is made to a track already in the cache (and that track is populated with
enough data to satisfy the read), the read is completed instantly, and the track is
moved to the top of the demote-ready list.
• If a read is satisfied with data held in the demote-ready list, the control blocks
associated with the track is modified to denote that the data is at the top of the
LRU list. Any reference to the demote-ready list is removed.
• There is a distinction made between actual host read I/O requests, and the
speculative nature of old writes that are turned into read cache data. It is for this
reason that the demote-ready list is emptied first (before the LRU list) when the
cache algorithms decide they need more free space.
• Read
put in Cache as Tracks
copy to partner node
accnowledge back
• Cache LTU Algorithm
New Data Top of list Tracks move
down the list when new data arrives
in cache
• Cache becomes full Destage
needs to be freed.
Destage LRU Tracks
Only Owner node destages data!
• Destage Operation
write Track to disk system
receive ACK
track modified to read cache
Track added to demote ready list.
Partner Node will be informed.
Partner Node discards data
because mirror not needed for read
cache data
Page 13
Cache algorithm life cycle
Excerpt from Read Piece
• The cache algorithms attempt to maintain a steady state of optimal population that is not
too full, and not too empty. To achieve this, the cache maintains a count of how much
data it contains, and how this relates to the available capacity.
• As the cache reaches a predefined high capacity threshold level it starts to free space at
a rate known as trickle. Data is removed from the bottom of the LRU at a slow rate. If
the data is a write, it is destaged. If the data is a read, it is discarded.
• Destage operations are therefore the limiting factor in how quickly the cache is emptied,
because writes are at the mercy of the latency of the actual disk writes. In the case of the
SVC this is not as bad as it sounds, as the disks in this case are controller LUNs. Almost
every controller supported by the SVC has some form of internal cache. When the I/O
rate being submitted by the SVC to a controller is within acceptable limits for that
controller, you expect writes to complete within a few milliseconds.
• However, problems can arise because the SVC can generally sustain much greater data
rates than most storage controllers can sustain. This includes large enterprise controllers
with very large caches.
• SVC Version 4.2.0 added additional monitoring of the response time being measured for
destage operations. This response time is used to ramp up, or down, the number of
concurrent destage operations the SVC node submits. This allows the SVC to
dynamically match the characteristics of the environment that it is deployed. Up to 1024
destage operations can be submitted in each batch, and it is this batch that is monitored
and dynamically adjusted.
• Cache
maintain a steady state of optimal
population
• High Capacity Threshold
Free space (rate is Trickle)
From bottom of LRU
Destage for write data
Discard for read data
• Backend Performance is key
• SVC measures response time of
stiorage system and dynamically
adjusts number of destage operations
per batch
Page 14
Cache algorithm life cycle
Excerpt from Read Piece
• If the SVC incoming I/O rate continues, and the trickle of data from the LRU does not
reduce the cache below the high capacity threshold, trickle continues.
• If the trickle rate is not keeping the cache usage in equilibrium, and the cache usage
continues to grow, a second high capacity threshold is reached.
• This will result in two simultaneous operations:
Any data in the demote-ready list is discarded. This is done in batches of 1024 tracks of
data. However, because this is a discard operation, it does not suffer from any latency
issues and a large amount of data is discarded quickly. This might drop the cache usage
below both high capacity thresholds.
The LRU list begins to drop entries off the bottom at a rate much faster than trickle.
• The combination of these two operations usually results in the cache usage reaching an
equilibrium, and the cache maintains itself between the first and second high usage
thresholds. The incoming I/O rate continues until the cache reaches the third and final
threshold, and the destage rate increases to reach its maximum
• Note: The destage rate, and number of concurrent destaged tracks are two
different attributes. The rate determines how long to wait between each batch. The
number of concurrent tracks determines how many elements to build into a batch.
• Note: If the back-end disk controllers cannot cope with the amount of data being
sent from the SVC, the cache might reach 100% full. This results in a one-in, one-
out situation where the host I/O is only serviced as quickly as the back-end
controllers can complete the I/O, essentially negating the benefits of the SVC
cache. Too much I/O is being driven from the host for the environment in which
the SVC is deployed.
• Cache
what to do, when Cache cannot
be reduced quick enough
Page 15
Cache Partitioning
Excerpt from Read Piece
• SVC Version 4.2.1 first introduced cache partitioning to the SVC code base. This
decision was made to provide flexible partitioning, rather than hard coding a
specific number of partitions. This flexibility is provided on a Managed Disk Group
(MDG) boundary. That is, the cache automatically partitions the available
resources on a MDG basis.
• Most users create a single MDG from the Logical Unit Numbers (LUN)s provided
by a single disk controller, or a subset of a controller/collection of the same
controllers, based on the characteristics of the LUNs themselves. For example,
RAID-5 compared to RAID-10, 10K RPM compared to15K RPM, and so on.
• The overall strategy is provided to protect the individual controller from
overloading or faults.
• If many controllers (or in this case, MDGs) are overloaded then the overall cache
can still suffer.
• Table 1 shows the upper limit of write cache data that any one partition, or
MDG, can occupy.
Note: Due to the relationship between partitions and MDGs, you must be
careful when creating large numbers of MDGs from a single controller. This
is especially true when the controller is a low or mid-range controller.
Enterprise controllers are likely to have some form of internal cache
partitioning and are unlikely to suffer from overload in the same manner as
entry or mid-range.
• Cache Partitioning
Cache is divided equal into
partitions
Each MDG receives one equal
part of SVC cache
protect controller from overload
Isolate performance problems
Page 16
Example
Excerpt from Read Piece
• Partition 1 is performing very little I/O, and is only
20% full (20% of its 30% limit - so a very small
percentage of the overall cache resource).
• Partition 2 is being written to heavily. When it
reaches the defined high capacity threshold within
its 30% limit, write data begins to destage. The
cache itself is below any of its overall thresholds.
However, we are destaging data for the second
partition.
• We see that the controller servicing partition 2 is
struggling, cannot cope with the write data that is
being destaged, and the partition is 100% full, and
it is occupying 30% of the available cache
resource. In this case, incoming write data is
slowed down to the same rate as the controller
itself is completing writes.
• Partition 3 begins to perform heavy I/O, goes
above its high capacity threshold limit, and starts
destaging. This controller, however, is capable of
handling the I/O being sent to it, and therefore, the
partition stays around its threshold level. The
overall cache is still under threshold, so only
partitions 2 and 3 are destaging, partition 2 is
being limited, partition 3 is destaging well within its
capabilities.
Page 17
Example
Excerpt from Read Piece
• Partition 4 begins to perform heavy I/O, when it reaches just over a third of its partition limit,
and the overall cache is now over its first threshold limit. Destage begins for all partitions that
have write data - in this case, all four partitions. When the cache returns under the first
threshold, only the partitions that are over their individual threshold allocation limits continue
to destage.
• Cache Partitioning
Partition 1 – no problem
Partition 2 – Controller is not fast
enough to destage incoming data.
Cache Partition 100% Full
Performance degradation
Partition 3 – Controller is fast enough
to handle destage data – no
performance degradation
• When overall SVC Cache reaches
first threshold limit
Destage starts for all Partitions
Page 18
Deutsche Webseiten
• BVQ Webseite und BVQ Wiki
http://www.bvq-software.de/
http://bvqwiki.sva.de
• BVQ Videos auf dem YouTube SVA Kanal
http://www.youtube.com/user/SVAGmbH
• BVQ Webseite von SVA GmbH
http://www.sva.de/sva_prod_bvq.php
Internationale Webseiten
• Developer Works BVQ Community Blog
https://www.ibm.com/developerworks/...
http://tinyurl.com/bvqblog
• Developer Works Documents and Presentations
https://www.ibm.com/developerworks/...
http://tinyurl.com/BVQ-Documents
Page 19
Weitere Informationen zu BVQ finden sie unter der
Webseite
www.bvq-software.de
Bei Interesse an BVQ wenden Sie sich bitte an die
folgende E-Mail Adresse
mailto:bvq@sva.de
BVQ ist ein Produkt der
SVA System Vertrieb Alexander GmbH

More Related Content

What's hot (20)

PDF
The have no fear guide to virtualizing databases
SolarWinds
 
PDF
Distributed Caching Essential Lessons (Ts 1402)
Yury Kaliaha
 
PPTX
Scott Schnoll - Exchange server 2013 high availability and site resilience
Nordic Infrastructure Conference
 
PDF
5 Steps to PostgreSQL Performance
Command Prompt., Inc
 
PDF
VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troublesh...
VMworld
 
PDF
Scaling Out Tier Based Applications
Yury Kaliaha
 
PPTX
Planning & Best Practice for Microsoft Virtualization
Lai Yoong Seng
 
PPTX
Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Str...
xKinAnx
 
PPTX
Hardware planning & sizing for sql server
Davide Mauri
 
PDF
VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld
 
PDF
MariaDB High Availability Webinar
MariaDB plc
 
PPTX
Metro Cluster High Availability or SRM Disaster Recovery?
David Pasek
 
PPTX
Exchange Server 2013 High Availability - Site Resilience
Microsoft TechNet - Belgium and Luxembourg
 
PPTX
VMware virtual SAN 6 overview
solarisyougood
 
PPT
Deploying Maximum HA Architecture With PostgreSQL
Denish Patel
 
PDF
Why new hardware may not make SQL Server faster
SolarWinds
 
PPTX
Ibm spectrum scale fundamentals workshop for americas part 7 spectrumscale el...
xKinAnx
 
PPTX
EDBT2015: Transactional Replication in Hybrid Data Store Architectures
tatemura
 
PPTX
New lessons in connection management
Toon Koppelaars
 
PPTX
Demystifying Benchmarks: How to Use Them To Better Evaluate Databases
Clustrix
 
The have no fear guide to virtualizing databases
SolarWinds
 
Distributed Caching Essential Lessons (Ts 1402)
Yury Kaliaha
 
Scott Schnoll - Exchange server 2013 high availability and site resilience
Nordic Infrastructure Conference
 
5 Steps to PostgreSQL Performance
Command Prompt., Inc
 
VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troublesh...
VMworld
 
Scaling Out Tier Based Applications
Yury Kaliaha
 
Planning & Best Practice for Microsoft Virtualization
Lai Yoong Seng
 
Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Str...
xKinAnx
 
Hardware planning & sizing for sql server
Davide Mauri
 
VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld
 
MariaDB High Availability Webinar
MariaDB plc
 
Metro Cluster High Availability or SRM Disaster Recovery?
David Pasek
 
Exchange Server 2013 High Availability - Site Resilience
Microsoft TechNet - Belgium and Luxembourg
 
VMware virtual SAN 6 overview
solarisyougood
 
Deploying Maximum HA Architecture With PostgreSQL
Denish Patel
 
Why new hardware may not make SQL Server faster
SolarWinds
 
Ibm spectrum scale fundamentals workshop for americas part 7 spectrumscale el...
xKinAnx
 
EDBT2015: Transactional Replication in Hybrid Data Store Architectures
tatemura
 
New lessons in connection management
Toon Koppelaars
 
Demystifying Benchmarks: How to Use Them To Better Evaluate Databases
Clustrix
 

Viewers also liked (9)

PPT
Masters stretched svc-cluster-2012-04-13 v2
solarisyougood
 
PPT
IBM SAN Volume Controller Performance Analysis
brettallison
 
PPTX
La transformación digital del sector financiero
Alex Rayón Jerez
 
PDF
La transformación digital
Jose Luis Calvo Salanova
 
PPT
Ds8000 Practical Performance Analysis P04 20060718
brettallison
 
PDF
Los 10 Mandamientos de la Transformación Digital
Momik Studio
 
PDF
La Transformación digital y cultural del BBVA
Silvia Dvorak
 
PDF
Transformacion Digital en la Banca
Manuel Serrano Ortega
 
PDF
Estrategia de Transformación Digital
Momik Studio
 
Masters stretched svc-cluster-2012-04-13 v2
solarisyougood
 
IBM SAN Volume Controller Performance Analysis
brettallison
 
La transformación digital del sector financiero
Alex Rayón Jerez
 
La transformación digital
Jose Luis Calvo Salanova
 
Ds8000 Practical Performance Analysis P04 20060718
brettallison
 
Los 10 Mandamientos de la Transformación Digital
Momik Studio
 
La Transformación digital y cultural del BBVA
Silvia Dvorak
 
Transformacion Digital en la Banca
Manuel Serrano Ortega
 
Estrategia de Transformación Digital
Momik Studio
 
Ad

Similar to SVC / Storwize: cache partition analysis (BVQ howto) (20)

PPTX
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
DataStax
 
PDF
SVC / Storwize: cost effective storage planning (BVQ use case)
Michael Pirker
 
PDF
Operating System File Management disk_management.pdf
SuryaBasnet3
 
PDF
Josh Krischer - How to get more for less (4 november 2010 Storage Expo)
VNU Exhibitions Europe
 
PPT
Unit 4 DBMS.ppt
HARRSHITHAASCSE
 
PPTX
Computer Memory Hierarchy Computer Architecture
Haris456
 
PPTX
UNIT II INTELLIGENT STORAGE SYSTEMS AND RAID.pptx
Dss
 
PDF
SVC / Storwize analysis cost effective storage planning (use case)
Michael Pirker
 
PPT
lecture-17.ppt
AshokRachapalli1
 
PPTX
Chapter 3
Er. Nawaraj Bhandari
 
PDF
Can You Afford Cheap Storage?
David Shafer
 
PPTX
Memory Organization
Dilum Bandara
 
PPTX
In-memory Caching in HDFS: Lower Latency, Same Great Taste
DataWorks Summit
 
PDF
White Paper: EMC FAST Cache — A Detailed Review
EMC
 
PPTX
9_Storage_Devices.pptx
AbdulhseynAayev1
 
PPTX
Deploying ssd in the data center 2014
Howard Marks
 
PPT
12-6810-12.ppt
Lonewolf379705
 
PPTX
Computer System Architecture Lecture Note 8.1 primary Memory
Budditha Hettige
 
PPTX
What every data programmer needs to know about disks
iammutex
 
PPTX
2015 deploying flash in the data center
Howard Marks
 
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
DataStax
 
SVC / Storwize: cost effective storage planning (BVQ use case)
Michael Pirker
 
Operating System File Management disk_management.pdf
SuryaBasnet3
 
Josh Krischer - How to get more for less (4 november 2010 Storage Expo)
VNU Exhibitions Europe
 
Unit 4 DBMS.ppt
HARRSHITHAASCSE
 
Computer Memory Hierarchy Computer Architecture
Haris456
 
UNIT II INTELLIGENT STORAGE SYSTEMS AND RAID.pptx
Dss
 
SVC / Storwize analysis cost effective storage planning (use case)
Michael Pirker
 
lecture-17.ppt
AshokRachapalli1
 
Can You Afford Cheap Storage?
David Shafer
 
Memory Organization
Dilum Bandara
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
DataWorks Summit
 
White Paper: EMC FAST Cache — A Detailed Review
EMC
 
9_Storage_Devices.pptx
AbdulhseynAayev1
 
Deploying ssd in the data center 2014
Howard Marks
 
12-6810-12.ppt
Lonewolf379705
 
Computer System Architecture Lecture Note 8.1 primary Memory
Budditha Hettige
 
What every data programmer needs to know about disks
iammutex
 
2015 deploying flash in the data center
Howard Marks
 
Ad

More from Michael Pirker (9)

PDF
IBM SVC Storwize analysis and monitoring: BVQ storage in Balance (Deutsch)
Michael Pirker
 
PDF
IBM SVC Storwize analysis and monitoring: BVQ storage in Balance (English)
Michael Pirker
 
PDF
IBM SVC / Storwize: Reduction of storage cost made easy
Michael Pirker
 
PDF
IBM SVC / Storwize: Unlock cost savings potentials with BVQ
Michael Pirker
 
PDF
Bvq storage in_balance_ru
Michael Pirker
 
PDF
20140415 bvq storage in_balance 日本語
Michael Pirker
 
PDF
BVQ walkthrough
Michael Pirker
 
PDF
Bvq use case storage tier analysis with bvq (ba).pptx
Michael Pirker
 
PDF
SVC / Storwize: Exchange analysis unwanted thin provisoning effect
Michael Pirker
 
IBM SVC Storwize analysis and monitoring: BVQ storage in Balance (Deutsch)
Michael Pirker
 
IBM SVC Storwize analysis and monitoring: BVQ storage in Balance (English)
Michael Pirker
 
IBM SVC / Storwize: Reduction of storage cost made easy
Michael Pirker
 
IBM SVC / Storwize: Unlock cost savings potentials with BVQ
Michael Pirker
 
Bvq storage in_balance_ru
Michael Pirker
 
20140415 bvq storage in_balance 日本語
Michael Pirker
 
BVQ walkthrough
Michael Pirker
 
Bvq use case storage tier analysis with bvq (ba).pptx
Michael Pirker
 
SVC / Storwize: Exchange analysis unwanted thin provisoning effect
Michael Pirker
 

Recently uploaded (20)

PPTX
Drive Operational Excellence with Proven Continuous Improvement Strategies
Group50 Consulting
 
PPTX
Revolutionizing Shopping: Voice Commerce in Retail and eCommerce
RUPAL AGARWAL
 
PDF
Dr. Enrique Segura Ense Group - A Philanthropist And Entrepreneur
Dr. Enrique Segura Ense Group
 
PDF
NewBase 07 July 2025 Energy News issue - 1800 by Khaled Al Awadi_compressed.pdf
Khaled Al Awadi
 
PDF
Securiport - A Global Leader
Securiport
 
PDF
Concept Topology in Architectural Build Addendum.pdf
Brij Consulting, LLC
 
PDF
Blind Spots in Business: Unearthing Hidden Challenges in Today's Organizations
Crimson Business Consulting
 
PPTX
Customer screenshots from Quark Publishing Platform
Gareth Oakes
 
PPTX
Why-Your-BPO-Startup-Must-Track-Attrition-from-Day-One.pptx.pptx
Orage technologies
 
PDF
David Badaro Explains 5 Steps to Solving Complex Business Issues
David Badaro
 
PDF
Explore Unique Wash Basin Designs: Black, Standing & Colored Options
Mozio
 
PDF
Top 10 Common Mistakes Entrepreneurs Make When Applying for Business Subsidie...
shailjapariharoffici
 
PDF
Keppel Investor Day 2025 Presentation Slides GCAT.pdf
KeppelCorporation
 
PDF
LeadershipHQ Overview Flyer 2025-2026 Global
Sonia McDonald
 
PPTX
How do we fix the Messed Up Corporation’s System diagram?
YukoSoma
 
PDF
Chembond Chemicals Limited Presentation 2025
Chembond Chemicals Limited
 
PDF
How to Make Your Pre Seed Startup Grant Fundable
ideatoipo
 
PPTX
2025 July - ABM for B2B in Hubspot - Demand Gen HUG.pptx
mjenkins13
 
PDF
Rostyslav Chayka: Управління командою за допомогою AI (UA)
Lviv Startup Club
 
PPTX
IP Leaks Can Derail Years Of Innovation In Seconds
Home
 
Drive Operational Excellence with Proven Continuous Improvement Strategies
Group50 Consulting
 
Revolutionizing Shopping: Voice Commerce in Retail and eCommerce
RUPAL AGARWAL
 
Dr. Enrique Segura Ense Group - A Philanthropist And Entrepreneur
Dr. Enrique Segura Ense Group
 
NewBase 07 July 2025 Energy News issue - 1800 by Khaled Al Awadi_compressed.pdf
Khaled Al Awadi
 
Securiport - A Global Leader
Securiport
 
Concept Topology in Architectural Build Addendum.pdf
Brij Consulting, LLC
 
Blind Spots in Business: Unearthing Hidden Challenges in Today's Organizations
Crimson Business Consulting
 
Customer screenshots from Quark Publishing Platform
Gareth Oakes
 
Why-Your-BPO-Startup-Must-Track-Attrition-from-Day-One.pptx.pptx
Orage technologies
 
David Badaro Explains 5 Steps to Solving Complex Business Issues
David Badaro
 
Explore Unique Wash Basin Designs: Black, Standing & Colored Options
Mozio
 
Top 10 Common Mistakes Entrepreneurs Make When Applying for Business Subsidie...
shailjapariharoffici
 
Keppel Investor Day 2025 Presentation Slides GCAT.pdf
KeppelCorporation
 
LeadershipHQ Overview Flyer 2025-2026 Global
Sonia McDonald
 
How do we fix the Messed Up Corporation’s System diagram?
YukoSoma
 
Chembond Chemicals Limited Presentation 2025
Chembond Chemicals Limited
 
How to Make Your Pre Seed Startup Grant Fundable
ideatoipo
 
2025 July - ABM for B2B in Hubspot - Demand Gen HUG.pptx
mjenkins13
 
Rostyslav Chayka: Управління командою за допомогою AI (UA)
Lviv Startup Club
 
IP Leaks Can Derail Years Of Innovation In Seconds
Home
 

SVC / Storwize: cache partition analysis (BVQ howto)

  • 2. Page 2 • This document shows Examples of BVQ MDisk Group Cache Partition Analysis. • I have added some text from this IBM RedPaper Thanks to Barry White Each page with text taken from this RedPaper is marked with the IBM RedPaper logo
  • 3. Page 3 Terminology • Cache Cache is a combination of the actual physical cache medium and the algorithms that are managing the data within the cache. • Demote-Ready A track that is in the demote-ready list is either old read cache data, or write cache data that is successfully destaged to disk. • Destage The act of removing modified data from the cache and writing it to disk • Modified Data Write data in cache that is not destaged to disk • Non-owner Node The non-preferred node for a given VDisk • Owner Node The preferred node for a given VDisk • Partner Node The other node in a SVC IO Group • Page The unit of data held in the cache. In the SVC, this is 4 KB. The data in the cache is managed at the page level. A page belongs to one track. • Track The unit of locking and destage granularity in the cache. In the SVC, a track is 32 KB in size (eight pages). (A track might only be partially populated with valid pages.)
  • 4. Page 4 Node Performance / SVC Write Cache Statistics Write Cache is at a very high level – due to the fact that we see a mean value of all mdisk groups here it is very likely that one or several cache partitions are overloaded. This overlaod will causes performance issues! CPU %CPU % Write Cache Full % Write Cache Full % Read Cache Full % Read Cache Full % Write Cache up to 80% full Peaks up to 90% too high R/W Cache Max 77.5% OK CPU 20% bis 35% - OK
  • 5. Page 5 XIV Gen1 444400ms/00ms/00ms/00ms/opopopop 88880%0%0%0%----100%100%100%100% 1700ms/1700ms/1700ms/1700ms/opopopop MDG Cache Partition Examples Overview DS3512 1500ms/1500ms/1500ms/1500ms/opopopop 88880%0%0%0%----100%100%100%100% DS3512 2400ms/2400ms/2400ms/2400ms/opopopop 88880%0%0%0%----100%100%100%100% XIV Gen2 600ms/600ms/600ms/600ms/opopopop 88880%0%0%0%----100%100%100%100% 1700ms/1700ms/1700ms/1700ms/opopopop 1750ms/1750ms/1750ms/1750ms/opopopop Overview of all MDisk Group Cache Partitions of the Customer
  • 6. Page 6 MDG Group Cache Partition 3512 • This Cache Partition is very heavyly used – it sometimes reaches 100% max Fullness but the write destage rate stays in normal ranges when this happens. Cache Partition Fullness min avg max Cache Partition Fullness min avg max Track AccessTrack Access Track lockTrack lock Write DestageWrite Destage
  • 7. Page 7 MDG Group Cache Partition 3512-02 • This Cache Partition looks overloaded – long periods of 100% full and even more then 100% - Write destage rates have phases of extreme high activity. High Destage Rates Panik Destage! High Destage Rates Panik Destage! Long Periods of 100% Max Write Cache Full Long Periods of 100% Max Write Cache Full
  • 8. Page 8 MDG Group Cache Partition XIV01 • Looks like this XIV has generally got still some performance reserve. But there are peaks up to 90% - we should try to figure out, where they are coming from. Most likely some Monster Volumes that write big amounts of data into the SVC. These Volumes should not start in the moment.
  • 9. Page 9 MDG Group Cache Partition XIV02 This Peak is Reference!This Peak is Reference! • Looks like the XIV is very hard working but not yet overloaded. What happens when we add more load to this system? Write Cache Full will start to raise and problems will start when we reach the 100% limit more often. There is one peak up to 90% - this is the upper level not the “flat line” at 83%
  • 10. Page 10 Performant System – no Cache issue • This is the system of another customer with only onbe Mdisk Group in the SVC. This one MDISK Group can use all the existing Node Cache. We also have very perfromant storage systems in the backend.
  • 11. Page 11 Write Life Cycle to SVC Excerpt from Read Piece • When a write is issued to the SVC, it passes through the upper layers in the software stack and add into the cache. Before the cache returns completion to the host, the write must mirror the partner node. After the I/O is written to the partner node, and acknowledged back to the owner node, the I/O is completed back to the host1. • The LRU algorithm places a pointer to this data at the top of the LRU. As subsequent I/O is requested, they are placed at the top of the LRU list. Over time, our initial write moves down the list and eventually reaches the bottom. • When certain conditions are met, the cache decides that it needs to free a certain amount of data. This data is taken from the bottom of the LRU list, and in the case of write data, is committed to disk. This destage operation is only performed by the owner node. • When the owner node receives confirmation that the write to disk is successful, the control blocks associated with the track are modified to mark that the track now contains read cache data, and is added to the demote-ready list. Subsequent reads of recently written data are returned from cache. The owner node notifies the partner node that the write data is complete, and the partner node discards the data. The data is discarded on the non-owner (partner) node, because reads are not expected to occur to non-owner nodes. • Write put in Cache as Tracks copy to partner node accnowledge back • Cache LTU Algorithm New Data Top of list Tracks move down the list when new data arrives in cache • Cache becomes full Destage needs to be freed. Destage LRU Tracks Only Owner node destages data! • Destage Operation write Track to disk system receive ACK track modified to read cache Track added to demote ready list. Partner Node will be informed. Partner Node discards data because mirror not needed for read cache data
  • 12. Page 12 Read Life Cycle Excerpt from Read Piece • When a read is issued to the SVC, it passes through the upper layers in the software stack and the cache checks to see if the required data resides in cache. • If a read is made to a track already in the cache (and that track is populated with enough data to satisfy the read), the read is completed instantly, and the track is moved to the top of the demote-ready list. • If a read is satisfied with data held in the demote-ready list, the control blocks associated with the track is modified to denote that the data is at the top of the LRU list. Any reference to the demote-ready list is removed. • There is a distinction made between actual host read I/O requests, and the speculative nature of old writes that are turned into read cache data. It is for this reason that the demote-ready list is emptied first (before the LRU list) when the cache algorithms decide they need more free space. • Read put in Cache as Tracks copy to partner node accnowledge back • Cache LTU Algorithm New Data Top of list Tracks move down the list when new data arrives in cache • Cache becomes full Destage needs to be freed. Destage LRU Tracks Only Owner node destages data! • Destage Operation write Track to disk system receive ACK track modified to read cache Track added to demote ready list. Partner Node will be informed. Partner Node discards data because mirror not needed for read cache data
  • 13. Page 13 Cache algorithm life cycle Excerpt from Read Piece • The cache algorithms attempt to maintain a steady state of optimal population that is not too full, and not too empty. To achieve this, the cache maintains a count of how much data it contains, and how this relates to the available capacity. • As the cache reaches a predefined high capacity threshold level it starts to free space at a rate known as trickle. Data is removed from the bottom of the LRU at a slow rate. If the data is a write, it is destaged. If the data is a read, it is discarded. • Destage operations are therefore the limiting factor in how quickly the cache is emptied, because writes are at the mercy of the latency of the actual disk writes. In the case of the SVC this is not as bad as it sounds, as the disks in this case are controller LUNs. Almost every controller supported by the SVC has some form of internal cache. When the I/O rate being submitted by the SVC to a controller is within acceptable limits for that controller, you expect writes to complete within a few milliseconds. • However, problems can arise because the SVC can generally sustain much greater data rates than most storage controllers can sustain. This includes large enterprise controllers with very large caches. • SVC Version 4.2.0 added additional monitoring of the response time being measured for destage operations. This response time is used to ramp up, or down, the number of concurrent destage operations the SVC node submits. This allows the SVC to dynamically match the characteristics of the environment that it is deployed. Up to 1024 destage operations can be submitted in each batch, and it is this batch that is monitored and dynamically adjusted. • Cache maintain a steady state of optimal population • High Capacity Threshold Free space (rate is Trickle) From bottom of LRU Destage for write data Discard for read data • Backend Performance is key • SVC measures response time of stiorage system and dynamically adjusts number of destage operations per batch
  • 14. Page 14 Cache algorithm life cycle Excerpt from Read Piece • If the SVC incoming I/O rate continues, and the trickle of data from the LRU does not reduce the cache below the high capacity threshold, trickle continues. • If the trickle rate is not keeping the cache usage in equilibrium, and the cache usage continues to grow, a second high capacity threshold is reached. • This will result in two simultaneous operations: Any data in the demote-ready list is discarded. This is done in batches of 1024 tracks of data. However, because this is a discard operation, it does not suffer from any latency issues and a large amount of data is discarded quickly. This might drop the cache usage below both high capacity thresholds. The LRU list begins to drop entries off the bottom at a rate much faster than trickle. • The combination of these two operations usually results in the cache usage reaching an equilibrium, and the cache maintains itself between the first and second high usage thresholds. The incoming I/O rate continues until the cache reaches the third and final threshold, and the destage rate increases to reach its maximum • Note: The destage rate, and number of concurrent destaged tracks are two different attributes. The rate determines how long to wait between each batch. The number of concurrent tracks determines how many elements to build into a batch. • Note: If the back-end disk controllers cannot cope with the amount of data being sent from the SVC, the cache might reach 100% full. This results in a one-in, one- out situation where the host I/O is only serviced as quickly as the back-end controllers can complete the I/O, essentially negating the benefits of the SVC cache. Too much I/O is being driven from the host for the environment in which the SVC is deployed. • Cache what to do, when Cache cannot be reduced quick enough
  • 15. Page 15 Cache Partitioning Excerpt from Read Piece • SVC Version 4.2.1 first introduced cache partitioning to the SVC code base. This decision was made to provide flexible partitioning, rather than hard coding a specific number of partitions. This flexibility is provided on a Managed Disk Group (MDG) boundary. That is, the cache automatically partitions the available resources on a MDG basis. • Most users create a single MDG from the Logical Unit Numbers (LUN)s provided by a single disk controller, or a subset of a controller/collection of the same controllers, based on the characteristics of the LUNs themselves. For example, RAID-5 compared to RAID-10, 10K RPM compared to15K RPM, and so on. • The overall strategy is provided to protect the individual controller from overloading or faults. • If many controllers (or in this case, MDGs) are overloaded then the overall cache can still suffer. • Table 1 shows the upper limit of write cache data that any one partition, or MDG, can occupy. Note: Due to the relationship between partitions and MDGs, you must be careful when creating large numbers of MDGs from a single controller. This is especially true when the controller is a low or mid-range controller. Enterprise controllers are likely to have some form of internal cache partitioning and are unlikely to suffer from overload in the same manner as entry or mid-range. • Cache Partitioning Cache is divided equal into partitions Each MDG receives one equal part of SVC cache protect controller from overload Isolate performance problems
  • 16. Page 16 Example Excerpt from Read Piece • Partition 1 is performing very little I/O, and is only 20% full (20% of its 30% limit - so a very small percentage of the overall cache resource). • Partition 2 is being written to heavily. When it reaches the defined high capacity threshold within its 30% limit, write data begins to destage. The cache itself is below any of its overall thresholds. However, we are destaging data for the second partition. • We see that the controller servicing partition 2 is struggling, cannot cope with the write data that is being destaged, and the partition is 100% full, and it is occupying 30% of the available cache resource. In this case, incoming write data is slowed down to the same rate as the controller itself is completing writes. • Partition 3 begins to perform heavy I/O, goes above its high capacity threshold limit, and starts destaging. This controller, however, is capable of handling the I/O being sent to it, and therefore, the partition stays around its threshold level. The overall cache is still under threshold, so only partitions 2 and 3 are destaging, partition 2 is being limited, partition 3 is destaging well within its capabilities.
  • 17. Page 17 Example Excerpt from Read Piece • Partition 4 begins to perform heavy I/O, when it reaches just over a third of its partition limit, and the overall cache is now over its first threshold limit. Destage begins for all partitions that have write data - in this case, all four partitions. When the cache returns under the first threshold, only the partitions that are over their individual threshold allocation limits continue to destage. • Cache Partitioning Partition 1 – no problem Partition 2 – Controller is not fast enough to destage incoming data. Cache Partition 100% Full Performance degradation Partition 3 – Controller is fast enough to handle destage data – no performance degradation • When overall SVC Cache reaches first threshold limit Destage starts for all Partitions
  • 18. Page 18 Deutsche Webseiten • BVQ Webseite und BVQ Wiki http://www.bvq-software.de/ http://bvqwiki.sva.de • BVQ Videos auf dem YouTube SVA Kanal http://www.youtube.com/user/SVAGmbH • BVQ Webseite von SVA GmbH http://www.sva.de/sva_prod_bvq.php Internationale Webseiten • Developer Works BVQ Community Blog https://www.ibm.com/developerworks/... http://tinyurl.com/bvqblog • Developer Works Documents and Presentations https://www.ibm.com/developerworks/... http://tinyurl.com/BVQ-Documents
  • 19. Page 19 Weitere Informationen zu BVQ finden sie unter der Webseite www.bvq-software.de Bei Interesse an BVQ wenden Sie sich bitte an die folgende E-Mail Adresse mailto:[email protected] BVQ ist ein Produkt der SVA System Vertrieb Alexander GmbH