SlideShare a Scribd company logo
Database Engineering and Operations
B Y A s h w i n N e l l o r e
Yahoo
2
 Advertising Products
 Publisher Products
 Platforms
 Internal Products
Engineering
3
 Database as a Service
 Continuous Delivery
 Code Reviews
 Performance Analyzer (Open Sourced)
 Performance Analytics
Database as a Service
4
 Self Service on Private Cloud
 Multitenant and Dedicated Solutions
 Data Store Guidance
 User Management
 Backups
 Migrations
 Interleaved with dependent systems
Continuous Delivery
5
 Custom Configuration Management
 Github
 Jenkins Pipeline
 Database version control
 Automated Tests for syntax errors
 Code Reviews
 Developer Notifications
Performance Analyzer
6
 Lightweight and Agentless Java Web Application
 Self contained and easy to deploy anywhere
 Rich User Interface
 Gather and store performance metrics
 Detect anomalies and raise alerts
 Real time performance data access
 New metrics and alerts can be defined and deployed during runtime.
 Highly agile and extensible software development
 No license required
Dashboard: Alerts From Past 24 Hours
7
 After login, dashboard will display alerts from past 24 hours and metrics
for current database health, for all database servers under
management.
 Active alerts are colored in red.
 Built-in alerts are summarized and displayed in the list.
 List is sortable on all columns.
 List can be further restricted to a single server group.
 Forensic data gathered when an alert was detected can be viewed or
downloaded from the same page.
Dashboard: Alerts From Past 24 Hours
8
Dashboard: Current Health Status
9
 Display most recent performance metrics for all managed servers in a
single screen
 Results can be limited to a single server group.
 List is sortable and color coded to prioritize action and response
 Metrics Included:
› QPS
› CPU, Load Average and IO Waits
› Free Memory
› Slow Query Count
› Active and Total Threads
› Connection Rates and Failures
› Replication lags
› Deadlocks
› Time used for last round of metrics scan.
Dashboard: Current Health Status
10
Real Time Top
11
 Inspired by MyTop/InnoTop
 Display selected OS metrics and MySQL metrics in real time.
 Display MySQL process list in real time.
 OS metrics:
› Uptime, Load Average, CPU, Memory, Swap, TCP Connections.
 MySQL metrics
› General: Uptime, QPS, Commands, Replication
› Network/Threads: Connections, Threads, Network IO
› InnoDB: Row operation, IO, Buffer Pool
Real Time Top
12
Real Time - Details
13
 User friendly and safe tool to access various performance related
information schema tables and SHOW commands.
 For metrics or status related information, the changes can be calculated
and displayed automatically, or triggered manually.
 Context help and context menu can help to digest the information or
navigate to other places for further researches.
 Features supported:
› Process list
› Global status and changes, can be filtered by partial keyword
› Configuration variables, the history, and comparison with other MySQL servers.
› Replication Status
› Parsed InnoDB engine status
› InnoDB status
› User Statistics when available, and the changes to identify hot users, tables, etc.
› Explain plan, including JSON format, either triggered from process list or input
manually.
Real Time Details And Process List
14
 Tabs to access data from various information schema tables and
SHOW commands.
 Context Menu to run EXPLAIN on any SELECT query
 Thread level detailed info from Performance Schema screen
Explain Plan and JSON Output
15
 Parsed and displayed in tree structure for easy understanding the rich
information.
 Bonus: comparing two plan formats can give us better understanding of the
old format.
Global Status
16
 Keyword filtering to view only concerned status variables.
 Auto refresh or manual refresh to see changes and change rates.
 Context help to assist understanding of the status variables.
Configuration Management
17
 Configuration consistency checks and variances when analyzing
performance issues
 Lookup by partial keyword with links to MySQL references
 Change History Tracking.
 Compare parameters between database servers
InnoDB Statistics
18
 Analyze performance issues, such as locks and mutexes
 Mutex statistics to understand contentions
User Statistics
19
 When available, user statistics provide very useful time metrics,
especially at per user level to identify hot users.
 Table statistics can also help to identify hot tables.
Metrics Gathering And Display
20
 Metrics are gathered from all managed servers based on configurable
interval.
 Metrics are stored in either embedded Java DerbyDB for very small
deployment or MySQL database for more formal deployment.
concerned metrics are grouped and metrics from a single group are
stored in a single table.
 Metrics sources:
› information_schema, especially global status, for MySQL,
› SNMP for OS level data when available
› User defined.
 Predefined metrics:
› MySQL common status, command, InnoDB, replication status
› InnoDB Mutex (optional)
› SNMP: system, disk, network, storage
› Additional metrics can be defined and associated with individual server group or server,
using global status variables, or customized SQL statements.
Metrics Charts – Common Global Status
21
 Periodically poll global status, InnoDB mutex and user defined metrics
 Metrics are stored in built-in embedded Java DB for a small deployment
or in MySQL DB for a large deployment
Metrics Charts – OS using SNMP
22
 OS level metrics are polled from SNMP
 Metrics include CPU, Load Average, Context switches, Interrupts, IO
Waits, Disk, Memory Usage, network and storage usages, etc.
Metrics Charts: Single Chart Or Comparison
23
 Display chart for any available metric.
 Compare two metrics of the same server during the same period to
identify correlations, which frequently help to identify root cause during
troubleshooting.
 Auto play option to display the second metrics sequentially
Metrics Comparison between A Group Of Servers
24
 Metrics can be viewed and compared on a pair of servers or multiple
servers of the same group.
 This feature can be used to understand how loads are balanced, or
capacity differences between two servers.
 Above sample is a master/slave comparison. Replication cannot catch up
the very high update rates on the master.
User Defined Metrics (UDM)
25
 Customized metrics can be added either using status variables from global status, which
are not included in the built in metrics, or using customized SQL statement.
 Manual setup is required to associate concerned servers or server groups with any UDM.
 Current implementation will store all metrics defined within one UDM in a single table.
Anomaly Detections and Alerts
26
 Anomalies will be checked for a set of predefined metrics against
thresholds. Thresholds can be adjusted at server group level or host
level.
 When anomalies are detected, forensic data will be gathered and
logged, such as process lists, InnoDB engine status, innodb locks, etc.
 Alert detail reports can be viewed and downloaded from dashboard and
Alert page.
 Alerts will be logged and notifications can be sent out using email and
web notifications.
 Predefined alerts:
› CPU, Load Average, IO Waits, Running Threads, Replication Status and lag, Slow
Query Count, Connection Failure, Deadlocks and Disk Usages
 Additional customized alerts can be defined and attached to concerned
database server, using either a SQL statement, or against metrics
already defined, or just against any global status variable.
Alerts and Settings
27
 All alerts for past 24 hours will be displayed in dashboard after login.
 Alerts for all servers, an individual server group, or a single host, can be
accessed from Alert page.
 Thresholds can be configured at server group or host level.
Alert Notifications
28
 Alert notifications will be sent to email if configured, with minimum
information.
 Web notification is also supported on modern browsers when the
application is open.
Alert Reports
29
 For most of the alerts, an alert report will be generated with some forensic
information.
 The information includes aggregated and original data from process list
and InnoDB engine status, etc.
Deadlock Detection
30
 Deadlock detection is done by comparing INNODB_DEADLOCKS status
variable (available in Percona server).
 When detected, an alert will be raised and logged. Detail can be found
either from InnoDB engine status, or associated alert reports.
User Defined Alerts
31
 Customized alerts can be defined using SQL statements, global status
variables or metrics gathered by the analyzer.
 Customized alerts will not be applied to all servers automatically. Requires
manually setup to associate them with concerned servers or server groups.
Profiling and Tuning
32
 A simple and safe interface to run explain plan, MySQL profiling, and
execute MySQL SELECT statement.
Performance Schema – Top Queries
33
 Top queries by various criteria
Performance Schema – Hot Tables
34
 Table performance metrics are always powerful tools to identify IO
bottleneck, lock contentions and SQL inefficiency.
Internal Analytics
35
 Metrics logged over time into Cassandra
 Capex Planning
 Proactive Performance Diagnosis

More Related Content

What's hot (13)

PDF
Sql server replication step by step
laonap166
 
PDF
SVCC-2014
John Brinnand
 
PPT
Session 8 Tp 8
githe26200
 
PPT
Transaction management and concurrency control
Dhani Ahmad
 
DOCX
Pbl report blood management system (5th sem)
CryptoGenix
 
PPTX
C4 Database Management Onboarding
Ideba
 
PPTX
Fast Start Failover DataGuard
Borsaniya Vaibhav
 
PPTX
Sayed database system_architecture
Sayed Ahmed
 
PDF
Research on Power Quality Real-Time Monitoring System For High Voltage Switch...
IJRESJOURNAL
 
DOCX
PRTG
Majd Khriema
 
PDF
Data migration system in heterogeneous database
eSAT Journals
 
PDF
Data migration system in heterogeneous database
eSAT Publishing House
 
PDF
Catalogic DPX: Dashboard Reporting with Microsoft Power BI
Catalogic Software
 
Sql server replication step by step
laonap166
 
SVCC-2014
John Brinnand
 
Session 8 Tp 8
githe26200
 
Transaction management and concurrency control
Dhani Ahmad
 
Pbl report blood management system (5th sem)
CryptoGenix
 
C4 Database Management Onboarding
Ideba
 
Fast Start Failover DataGuard
Borsaniya Vaibhav
 
Sayed database system_architecture
Sayed Ahmed
 
Research on Power Quality Real-Time Monitoring System For High Voltage Switch...
IJRESJOURNAL
 
Data migration system in heterogeneous database
eSAT Journals
 
Data migration system in heterogeneous database
eSAT Publishing House
 
Catalogic DPX: Dashboard Reporting with Microsoft Power BI
Catalogic Software
 

Viewers also liked (13)

PPTX
Torres de hanoi
niurbelys
 
PPTX
Induccion laboral
OSMAR JIMENEZ
 
PDF
Full Report - CORC CYP Mental Health Outcomes 2016
Craig Hamilton
 
PDF
เอกภพ สรรพสิ่ง และมนุษยชาติ
nsumato
 
DOCX
Olkhovskiy_Resume_2016_1
Eugene Olkhovskiy
 
PDF
Remote - a book review
Daryl Hemeon
 
PDF
C. Trigilia Università in declino - Un’indagine sugli atenei italiani da Nor...
Giuseppe De Nicolao
 
DOCX
Estudio de semejanza y congruencia de poligonos
samuelhonduras
 
PDF
VPPC 2013, Modeling for Control and Optimal Design of a Power Steering Pump a...
Silvas Emilia
 
PDF
3_final report
Steven St. Germain ☁
 
PPTX
Conceptos Básicos usados en Diarios Digitales
Gabriela Carolina Araque Oviedo
 
PPTX
Arboleda autos
Carlos Enrique Arboleda Ydrogo
 
DOCX
Lesson plan family
Angeles Labardini
 
Torres de hanoi
niurbelys
 
Induccion laboral
OSMAR JIMENEZ
 
Full Report - CORC CYP Mental Health Outcomes 2016
Craig Hamilton
 
เอกภพ สรรพสิ่ง และมนุษยชาติ
nsumato
 
Olkhovskiy_Resume_2016_1
Eugene Olkhovskiy
 
Remote - a book review
Daryl Hemeon
 
C. Trigilia Università in declino - Un’indagine sugli atenei italiani da Nor...
Giuseppe De Nicolao
 
Estudio de semejanza y congruencia de poligonos
samuelhonduras
 
VPPC 2013, Modeling for Control and Optimal Design of a Power Steering Pump a...
Silvas Emilia
 
3_final report
Steven St. Germain ☁
 
Conceptos Básicos usados en Diarios Digitales
Gabriela Carolina Araque Oviedo
 
Lesson plan family
Angeles Labardini
 
Ad

Similar to Database Engineering and Operations at Yahoo (20)

PDF
Data Virtualization Deployments: How to Manage Very Large Deployments
Denodo
 
PPT
SmartCloud Monitoring and Capacity Planning
IBM Danmark
 
PDF
Mafiree Services 2016 (1)
linyashaalu
 
PPTX
Dot Net performance monitoring
Kranthi Paidi
 
PPTX
InfrastructureDevOps.pptx it is most sui
pmishra37
 
PDF
Scada pdf
Vidya Sisale
 
PPT
Chapter 6 - Architectural Design.pptbbbb
nejsra584
 
PPT
Architectural Design.pptArchitectural Design.ppt
akdwn7
 
PPTX
GemFire In-Memory Data Grid
Kiril Menshikov (Kirils Mensikovs)
 
DOCX
MYSQL_Basic_Performance_Tuning_Guidelines_-_V2
Shelton Reese
 
PDF
What is Continuous Monitoring in DevOps.pdf
flufftailshop
 
PPTX
Digital Switching System UNIT6 ROI.pptx
ganesh shety
 
PDF
End to-end root cause analysis minimize the time to incident resolution
Cleo Filho
 
PPTX
Ssn#14 reporting services part ii
Antonios Chatzipavlis
 
DOCX
Cisco network management
IT Tech
 
PDF
VAROPS
Israel Marcus
 
PDF
What is Continuous Monitoring in DevOps.pdf
kalichargn70th171
 
PDF
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
Denodo
 
PDF
Increased IT infrastructure effectiveness by 80% with Microsoft system center...
Aspire Systems
 
PDF
Oracle data capture c dc
Amit Sharma
 
Data Virtualization Deployments: How to Manage Very Large Deployments
Denodo
 
SmartCloud Monitoring and Capacity Planning
IBM Danmark
 
Mafiree Services 2016 (1)
linyashaalu
 
Dot Net performance monitoring
Kranthi Paidi
 
InfrastructureDevOps.pptx it is most sui
pmishra37
 
Scada pdf
Vidya Sisale
 
Chapter 6 - Architectural Design.pptbbbb
nejsra584
 
Architectural Design.pptArchitectural Design.ppt
akdwn7
 
GemFire In-Memory Data Grid
Kiril Menshikov (Kirils Mensikovs)
 
MYSQL_Basic_Performance_Tuning_Guidelines_-_V2
Shelton Reese
 
What is Continuous Monitoring in DevOps.pdf
flufftailshop
 
Digital Switching System UNIT6 ROI.pptx
ganesh shety
 
End to-end root cause analysis minimize the time to incident resolution
Cleo Filho
 
Ssn#14 reporting services part ii
Antonios Chatzipavlis
 
Cisco network management
IT Tech
 
What is Continuous Monitoring in DevOps.pdf
kalichargn70th171
 
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
Denodo
 
Increased IT infrastructure effectiveness by 80% with Microsoft system center...
Aspire Systems
 
Oracle data capture c dc
Amit Sharma
 
Ad

Recently uploaded (20)

PPT
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PDF
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PDF
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 

Database Engineering and Operations at Yahoo

  • 1. Database Engineering and Operations B Y A s h w i n N e l l o r e
  • 2. Yahoo 2  Advertising Products  Publisher Products  Platforms  Internal Products
  • 3. Engineering 3  Database as a Service  Continuous Delivery  Code Reviews  Performance Analyzer (Open Sourced)  Performance Analytics
  • 4. Database as a Service 4  Self Service on Private Cloud  Multitenant and Dedicated Solutions  Data Store Guidance  User Management  Backups  Migrations  Interleaved with dependent systems
  • 5. Continuous Delivery 5  Custom Configuration Management  Github  Jenkins Pipeline  Database version control  Automated Tests for syntax errors  Code Reviews  Developer Notifications
  • 6. Performance Analyzer 6  Lightweight and Agentless Java Web Application  Self contained and easy to deploy anywhere  Rich User Interface  Gather and store performance metrics  Detect anomalies and raise alerts  Real time performance data access  New metrics and alerts can be defined and deployed during runtime.  Highly agile and extensible software development  No license required
  • 7. Dashboard: Alerts From Past 24 Hours 7  After login, dashboard will display alerts from past 24 hours and metrics for current database health, for all database servers under management.  Active alerts are colored in red.  Built-in alerts are summarized and displayed in the list.  List is sortable on all columns.  List can be further restricted to a single server group.  Forensic data gathered when an alert was detected can be viewed or downloaded from the same page.
  • 8. Dashboard: Alerts From Past 24 Hours 8
  • 9. Dashboard: Current Health Status 9  Display most recent performance metrics for all managed servers in a single screen  Results can be limited to a single server group.  List is sortable and color coded to prioritize action and response  Metrics Included: › QPS › CPU, Load Average and IO Waits › Free Memory › Slow Query Count › Active and Total Threads › Connection Rates and Failures › Replication lags › Deadlocks › Time used for last round of metrics scan.
  • 11. Real Time Top 11  Inspired by MyTop/InnoTop  Display selected OS metrics and MySQL metrics in real time.  Display MySQL process list in real time.  OS metrics: › Uptime, Load Average, CPU, Memory, Swap, TCP Connections.  MySQL metrics › General: Uptime, QPS, Commands, Replication › Network/Threads: Connections, Threads, Network IO › InnoDB: Row operation, IO, Buffer Pool
  • 13. Real Time - Details 13  User friendly and safe tool to access various performance related information schema tables and SHOW commands.  For metrics or status related information, the changes can be calculated and displayed automatically, or triggered manually.  Context help and context menu can help to digest the information or navigate to other places for further researches.  Features supported: › Process list › Global status and changes, can be filtered by partial keyword › Configuration variables, the history, and comparison with other MySQL servers. › Replication Status › Parsed InnoDB engine status › InnoDB status › User Statistics when available, and the changes to identify hot users, tables, etc. › Explain plan, including JSON format, either triggered from process list or input manually.
  • 14. Real Time Details And Process List 14  Tabs to access data from various information schema tables and SHOW commands.  Context Menu to run EXPLAIN on any SELECT query  Thread level detailed info from Performance Schema screen
  • 15. Explain Plan and JSON Output 15  Parsed and displayed in tree structure for easy understanding the rich information.  Bonus: comparing two plan formats can give us better understanding of the old format.
  • 16. Global Status 16  Keyword filtering to view only concerned status variables.  Auto refresh or manual refresh to see changes and change rates.  Context help to assist understanding of the status variables.
  • 17. Configuration Management 17  Configuration consistency checks and variances when analyzing performance issues  Lookup by partial keyword with links to MySQL references  Change History Tracking.  Compare parameters between database servers
  • 18. InnoDB Statistics 18  Analyze performance issues, such as locks and mutexes  Mutex statistics to understand contentions
  • 19. User Statistics 19  When available, user statistics provide very useful time metrics, especially at per user level to identify hot users.  Table statistics can also help to identify hot tables.
  • 20. Metrics Gathering And Display 20  Metrics are gathered from all managed servers based on configurable interval.  Metrics are stored in either embedded Java DerbyDB for very small deployment or MySQL database for more formal deployment. concerned metrics are grouped and metrics from a single group are stored in a single table.  Metrics sources: › information_schema, especially global status, for MySQL, › SNMP for OS level data when available › User defined.  Predefined metrics: › MySQL common status, command, InnoDB, replication status › InnoDB Mutex (optional) › SNMP: system, disk, network, storage › Additional metrics can be defined and associated with individual server group or server, using global status variables, or customized SQL statements.
  • 21. Metrics Charts – Common Global Status 21  Periodically poll global status, InnoDB mutex and user defined metrics  Metrics are stored in built-in embedded Java DB for a small deployment or in MySQL DB for a large deployment
  • 22. Metrics Charts – OS using SNMP 22  OS level metrics are polled from SNMP  Metrics include CPU, Load Average, Context switches, Interrupts, IO Waits, Disk, Memory Usage, network and storage usages, etc.
  • 23. Metrics Charts: Single Chart Or Comparison 23  Display chart for any available metric.  Compare two metrics of the same server during the same period to identify correlations, which frequently help to identify root cause during troubleshooting.  Auto play option to display the second metrics sequentially
  • 24. Metrics Comparison between A Group Of Servers 24  Metrics can be viewed and compared on a pair of servers or multiple servers of the same group.  This feature can be used to understand how loads are balanced, or capacity differences between two servers.  Above sample is a master/slave comparison. Replication cannot catch up the very high update rates on the master.
  • 25. User Defined Metrics (UDM) 25  Customized metrics can be added either using status variables from global status, which are not included in the built in metrics, or using customized SQL statement.  Manual setup is required to associate concerned servers or server groups with any UDM.  Current implementation will store all metrics defined within one UDM in a single table.
  • 26. Anomaly Detections and Alerts 26  Anomalies will be checked for a set of predefined metrics against thresholds. Thresholds can be adjusted at server group level or host level.  When anomalies are detected, forensic data will be gathered and logged, such as process lists, InnoDB engine status, innodb locks, etc.  Alert detail reports can be viewed and downloaded from dashboard and Alert page.  Alerts will be logged and notifications can be sent out using email and web notifications.  Predefined alerts: › CPU, Load Average, IO Waits, Running Threads, Replication Status and lag, Slow Query Count, Connection Failure, Deadlocks and Disk Usages  Additional customized alerts can be defined and attached to concerned database server, using either a SQL statement, or against metrics already defined, or just against any global status variable.
  • 27. Alerts and Settings 27  All alerts for past 24 hours will be displayed in dashboard after login.  Alerts for all servers, an individual server group, or a single host, can be accessed from Alert page.  Thresholds can be configured at server group or host level.
  • 28. Alert Notifications 28  Alert notifications will be sent to email if configured, with minimum information.  Web notification is also supported on modern browsers when the application is open.
  • 29. Alert Reports 29  For most of the alerts, an alert report will be generated with some forensic information.  The information includes aggregated and original data from process list and InnoDB engine status, etc.
  • 30. Deadlock Detection 30  Deadlock detection is done by comparing INNODB_DEADLOCKS status variable (available in Percona server).  When detected, an alert will be raised and logged. Detail can be found either from InnoDB engine status, or associated alert reports.
  • 31. User Defined Alerts 31  Customized alerts can be defined using SQL statements, global status variables or metrics gathered by the analyzer.  Customized alerts will not be applied to all servers automatically. Requires manually setup to associate them with concerned servers or server groups.
  • 32. Profiling and Tuning 32  A simple and safe interface to run explain plan, MySQL profiling, and execute MySQL SELECT statement.
  • 33. Performance Schema – Top Queries 33  Top queries by various criteria
  • 34. Performance Schema – Hot Tables 34  Table performance metrics are always powerful tools to identify IO bottleneck, lock contentions and SQL inefficiency.
  • 35. Internal Analytics 35  Metrics logged over time into Cassandra  Capex Planning  Proactive Performance Diagnosis

Editor's Notes

  • #2: ----- Meeting Notes (4/2/14 15:00) ----- Talk about Yahoo ----- Meeting Notes (4/2/14 23:08) ----- Good afternoon everyone! We're here to talk about MySQL Performance Monitoring and Tuning at Yahoo. Without further adieu, lets proceed
  • #3: ----- Meeting Notes (4/3/14 11:46) ----- Spring MVC + YUI + D3.js library We chose SPA because we don't want to jump to different pages. We can open it in a different tab or a browser window but context help will not be available. Setup: We just create a user with process and replication client privilege.
  • #4: ----- Meeting Notes (4/3/14 11:46) ----- Spring MVC + YUI + D3.js library We chose SPA because we don't want to jump to different pages. We can open it in a different tab or a browser window but context help will not be available. Setup: We just create a user with process and replication client privilege.
  • #5: ----- Meeting Notes (4/3/14 11:46) ----- Spring MVC + YUI + D3.js library We chose SPA because we don't want to jump to different pages. We can open it in a different tab or a browser window but context help will not be available. Setup: We just create a user with process and replication client privilege.
  • #6: ----- Meeting Notes (4/3/14 11:46) ----- Spring MVC + YUI + D3.js library We chose SPA because we don't want to jump to different pages. We can open it in a different tab or a browser window but context help will not be available. Setup: We just create a user with process and replication client privilege.
  • #7: ----- Meeting Notes (4/3/14 11:46) ----- Spring MVC + YUI + D3.js library We chose SPA because we don't want to jump to different pages. We can open it in a different tab or a browser window but context help will not be available. Setup: We just create a user with process and replication client privilege.
  • #8: ----- Meeting Notes (4/3/14 10:03) ----- The metrics include current system as well as db state like CPU, OS waits, slow query logs, replication lag, load avg, free memory, threads and connections/sec ----- Meeting Notes (4/3/14 11:46) ----- Number of slow queries per minute. Last alert column lets us know the last alert and what was it related to - could be CPU, memory, repl lag etc.
  • #9: ----- Meeting Notes (4/3/14 10:03) ----- The metrics include current system as well as db state like CPU, OS waits, slow query logs, replication lag, load avg, free memory, threads and connections/sec ----- Meeting Notes (4/3/14 11:46) ----- Number of slow queries per minute. Last alert column lets us know the last alert and what was it related to - could be CPU, memory, repl lag etc.
  • #10: ----- Meeting Notes (4/3/14 10:03) ----- The metrics include current system as well as db state like CPU, OS waits, slow query logs, replication lag, load avg, free memory, threads and connections/sec ----- Meeting Notes (4/3/14 11:46) ----- Number of slow queries per minute. Last alert column lets us know the last alert and what was it related to - could be CPU, memory, repl lag etc.
  • #11: ----- Meeting Notes (4/3/14 10:03) ----- The metrics include current system as well as db state like CPU, OS waits, slow query logs, replication lag, load avg, free memory, threads and connections/sec ----- Meeting Notes (4/3/14 11:46) ----- Number of slow queries per minute. Last alert column lets us know the last alert and what was it related to - could be CPU, memory, repl lag etc.
  • #12: ----- Meeting Notes (4/3/14 10:03) ----- The metrics include current system as well as db state like CPU, OS waits, slow query logs, replication lag, load avg, free memory, threads and connections/sec ----- Meeting Notes (4/3/14 11:46) ----- Number of slow queries per minute. Last alert column lets us know the last alert and what was it related to - could be CPU, memory, repl lag etc.
  • #13: ----- Meeting Notes (4/3/14 10:03) ----- The metrics include current system as well as db state like CPU, OS waits, slow query logs, replication lag, load avg, free memory, threads and connections/sec ----- Meeting Notes (4/3/14 11:46) ----- Number of slow queries per minute. Last alert column lets us know the last alert and what was it related to - could be CPU, memory, repl lag etc.
  • #14: ----- Meeting Notes (4/3/14 10:03) ----- The metrics include current system as well as db state like CPU, OS waits, slow query logs, replication lag, load avg, free memory, threads and connections/sec ----- Meeting Notes (4/3/14 11:46) ----- Number of slow queries per minute. Last alert column lets us know the last alert and what was it related to - could be CPU, memory, repl lag etc.
  • #15: ----- Meeting Notes (4/3/14 10:03) ----- We've a Realtime tab, under which there are a host of options to choose from. Show process list is usually the first thing we run on our systems on a command line. Here we look at the db state and we can limit it to just show the active processes as well. ----- Meeting Notes (4/3/14 11:46) ----- There's context help available to run explain on the statement, export the data into csv format as well.
  • #16: ----- Meeting Notes (4/3/14 11:46) ----- Easy to read
  • #17: ----- Meeting Notes (4/3/14 10:03) ----- The beauty of this is that, we can refresh the page automatically every 5 secs or manually and look at the values of the variables that are changing continuously to help troubleshoot issues. The restart tab can be used to exit the comparison phase and go back to the initial view.
  • #19: ----- Meeting Notes (4/3/14 10:03) ----- Under InnoDB statistics, we can look at 'Transactions' , 'Mutexes', 'Locks' and 'Buffer Pool Statistics'. We pull this info from ----- Meeting Notes (4/3/14 11:46) ----- several innodb_trx, innodb_locks, innodb_ tables. Contentions could be around table and index contentions
  • #20: ----- Meeting Notes (4/3/14 10:03) ----- User statistics has to be enabled on the server. We pull this data and show it in the tool to look at user statistics, client statistics, table and index statistics to identify hot objects or rarely used indexes and connection statistics. This is upto MySQL 5.5 and for Percona/MariaDB. 5.6 onwards we can use performance schema.
  • #21: ----- Meeting Notes (4/3/14 12:08) ----- During the process to gather metrics, this tool also checks if any abnormalities based on predefined threshold. Once any abnornality is detected, this tool will dump processlist, innodb engine status and locks, with some aggregations and analysis.
  • #22: ----- Meeting Notes (4/3/14 10:03) ----- We pull slow queries, threads, connections, temp tables, buffer disk reads, buffer flushes, innodb r/w, app data, log, selects, innodb rows read, FTS, Joins and sorts ----- Meeting Notes (4/3/14 11:46) ----- It gives a visual overview of what is happening on the system.
  • #23: ----- Meeting Notes (4/3/14 10:03) ----- User CPU, Sys CPU, Load Avg, IO Wait, Disk Reads, Disk writes, IOPS and memory usage. We can choose a timeline to trend the data and analyze any anomalies during a particular period. We can also revisit this to perform RCAs for any db incidents that we've resolved in the past.
  • #24: ----- Meeting Notes (4/3/14 11:46) ----- Metrics comparison is used to identify the coorelation between metrics we gather. For example, we can start from one metric like slow query count, or CPU usage, or thread_running which shows some performance issue, then compare others metrics one by one, to visually check if any similar patterns.
  • #25: ----- Meeting Notes (4/3/14 12:08) ----- Since the buffer pool is only partially warmed up and we turned on the traffic which is pretty heavy, we saw high accumulation of queries with slower response times. What we learned from this case is to ensure that the buffer pool is fully warmed up before turn on live traffic.
  • #26: ----- Meeting Notes (4/3/14 12:08) ----- Since the buffer pool is only partially warmed up and we turned on the traffic which is pretty heavy, we saw high accumulation of queries with slower response times. What we learned from this case is to ensure that the buffer pool is fully warmed up before turn on live traffic.
  • #27: ----- Meeting Notes (4/3/14 12:08) ----- During the process to gather metrics, this tool also checks if any abnormalities based on predefined threshold. Once any abnornality is detected, this tool will dump processlist, innodb engine status and locks, with some aggregations and analysis.
  • #28: ----- Meeting Notes (4/3/14 12:08) ----- Since the buffer pool is only partially warmed up and we turned on the traffic which is pretty heavy, we saw high accumulation of queries with slower response times. What we learned from this case is to ensure that the buffer pool is fully warmed up before turn on live traffic.
  • #29: ----- Meeting Notes (4/3/14 12:08) ----- Since the buffer pool is only partially warmed up and we turned on the traffic which is pretty heavy, we saw high accumulation of queries with slower response times. What we learned from this case is to ensure that the buffer pool is fully warmed up before turn on live traffic.
  • #30: ----- Meeting Notes (4/3/14 12:08) ----- Since the buffer pool is only partially warmed up and we turned on the traffic which is pretty heavy, we saw high accumulation of queries with slower response times. What we learned from this case is to ensure that the buffer pool is fully warmed up before turn on live traffic.
  • #31: ----- Meeting Notes (4/3/14 12:08) ----- Since the buffer pool is only partially warmed up and we turned on the traffic which is pretty heavy, we saw high accumulation of queries with slower response times. What we learned from this case is to ensure that the buffer pool is fully warmed up before turn on live traffic.
  • #32: ----- Meeting Notes (4/3/14 12:08) ----- Since the buffer pool is only partially warmed up and we turned on the traffic which is pretty heavy, we saw high accumulation of queries with slower response times. What we learned from this case is to ensure that the buffer pool is fully warmed up before turn on live traffic.
  • #36: ----- Meeting Notes (4/3/14 11:46) ----- Spring MVC + YUI + D3.js library We chose SPA because we don't want to jump to different pages. We can open it in a different tab or a browser window but context help will not be available. Setup: We just create a user with process and replication client privilege.