SlideShare a Scribd company logo
Building and Deploying Large Scale
        Real Time News System with
       MySQL and Distributed Cache
Presented	
  to	
  MySQL	
  Conference	
  
Apr.	
  13,	
  2011	
  
Who am I?
                          Pag
                          e2


  Tao Cheng <tao.cheng@teamaol.com>, AOL Real
   Time News (RTN).
  Worked on Mail and Browser clients in the ‘90 and
   then moved to web backend servers since.
  Not an expert but am happy to share my experience
   and brainstorm solutions.




Presentation for
[CLIENT]
Agenda

  AOL Real Time News (RTN): what it is?
  Requirements
  Technical solutions with focus on MySQL
  Deployment Topology
  Operational Monitoring
  Metrics Collection
Agenda

  Tips for query tuning and optimization
  Heuristic Query Optimization Algorithm
  Lessons learned
  Q & A
Real Time News : background
                           Pag
                           e5


AOL deployed its large scale Real Time News (RTN)
system in 2007.
This system ingests and processes news from 30,000
sources on every second around the clock. Today, its
data store, MySQL, has accumulated over several
billions of rows and terabytes of data.
However, news are delivered to end users in close to
real time fashion. This presentation shares how it is
done and the lessons learned.


Presentation for
AOLU Un-University
Brief Intro: sample features
                                  Pag
                                  e6


  Data presentation: return most recent news in
     flat view – most recent news about an entity. An entity could
      be a person, a company, a sports team, etc.
     topic clusters – most recent news grouped by topics. A topic is
      a group of news about an event, headline news, etc.
  News filtering by
     source types such as news, blogs, press releases, regional, etc.

     relevancy level (high, medium, low, etc) to the entities .

  Data Delivery: push (to subscribers) and pull
  Search by entities, categories (National, Sports,
    Finance, etc), topics, document ID, etc.
Presentation for
[CLIENT]
Requirements for Phase I (2006)
                                 Pag
                                 e7


  Commodity hardware: 4 CPU, 16 GB MEM, 600 GB
   disk space.
  Data ingestion rate = 250K docs/day; average
   document size = 5 KB.
  Data retention period: 7 days to forever
  Est. data set size: (1.25 GB/day or 456 GB/year) +
   space for indexes, schema change, and optimization.
  Response time: < 30 milli-second/query
  Throughputs: > 400 queries/sec/server
  Up time: 99.999%
Presentation for
[CLIENT]
Solutions: MySQL + Bucky
                                      Pag
                                      e8


  MySQL
     Serve raw/distinct queries

     Back fill

  Bucky Technology (AOL’s distributed cache &
    computing framework)
      Write ahead cache: pre-compute query results and push them
       into cache.
      Messaging (optional): push data directly to subscribers
           Updatesare pushed to data consumers or browsers via AIM
            Complex.
  Updates go to both database and cache.

Presentation for
[CLIENT]
Architecture Diagram (over-simplified)
                                                        Pag
                                                        e9




     WWW

                                       AIM	
           push

   Relegence	
  




    Ingestor	
       Distributed	
  
                        Cache	
  
                                                 Gateway	
           pull
                                                               WWW
                     Distributed	
  
                        Cache	
                  Gateway	
  



   Asset	
  DB	
  




Presentation for
[CLIENT]
Data Model: SOR v.s. Query DB
                                  Pag
                                  e 10


  Separate query from storage to keep tables small and
   query fast.
  System of Record (SOR): has all raw data
      The authoritative data store; designed for data storage
      Normalized schema: for simple key look-up; no table join.

  Query DB – de-normalized for query speed
     avoid JOIN, reduce # of trips to DB, increase throughputs.

  Read/write small chunk of data at a time so database
   can get requests out quickly and process more.
  Use replication to achieve linear scalability for read.

Presentation for
[CLIENT]
Design Strategies: partitioning (Why)
                                  Pag
                                  e 11


  Dataset too big to fit on one host
  Performance consideration: divide and conquer
     Write: more masters (Nx) to take writes

     Read: smaller tables + more (NxM) slaves to handle read.

  Fault tolerance – distribute the risk and reduce the
   impact of system failure
  Easier Maintenance – size does matter
      Faster nightly backup, disaster recovery, schema change, etc.
      Faster optimization –need optimization to reclaim disk space
       after deletion, rebuild indexes to improve query speed.


Presentation for
[CLIENT]
Design Strategies: partitioning (How)
                                    Pag
                                    e 12


  Partition on most used keys (look at query patterns)
     Document table – on document ID

     Entity table – on entity ID

  Simple hash on IDs – no partition map; thus no
   competition of read/write locks on yet another table
  Managing growth: add another partition set
      New documents are written into both old and new partition
       sets for a few weeks. Then, stop writing into the old partitions.
      Queries go to the new partitions first and then the old ones if
       in-sufficient results found.
  Works great in our case but might not for everyone.
Presentation for
[CLIENT]
Schema design: De-normalization
                                       Pag
                                       e 13


  Make query tables small:
     put only essential attributes in the de-normalized tables

     store long text attributes in separate tables.

  De-normalization: how to store and match attributes
     Single value attributes (1:1) : document ID, short string, date
      time, etc. – one column, one row.
     Multi-value attributes (1:many): tricky but feasible
          Use  multiple rows with composite index/key: (c1, c2, etc.)
          One row one column: CSV string, e.g., “id1, id2, id3” – SQL: “val
           like ‘%id2%’”
          One row but multiple columns, e.g., group1, group2, etc. – SQL:
           group1=val1 OR group2=val2 ...

Presentation for
[CLIENT]
Tips for indexing
                                Pag
                                e 14


  Simple key – for metadata retrieval
  Composite key – find matching documents
     Start with low cardinality and most used columns

     Order matter: (c1, c2, c3) != (c2, c3, c1)

  InnoDB – all secondary indexes contain primary key
     Make primary key short to keep index size small

     Queries using secondary index references primary key too.

  Integer v.s. String – comparison of numeric values is
   faster => index hash values of long string instead.
  Index length – title:varchar(255) => idx_title(32)
  Enforce referential integrity on application side.
Presentation for
[CLIENT]
MySQL configuration
                                  Pag
                                  e 15


  Storage engine: InnoDB – row level locking
  Table space – one file per table
     Easier to maintain (schema change, optimization, etc.)

  Character set: ‘UTF-8’
     Disable persistent connection (5.0.x)

     skip-character-set-client-handshake

  Enable slow query log to identify bad queries.
  System variables for memory buffer size
     innodb_buffer_pool_size: data and indexes

     Sort_buffer_size, max_heap_table_size, tmp_table_size

     Query cache size=0; tables are updated constantly
Presentation for
[CLIENT]
Runtime statistics (per server)
                                 Pag
                                 e 16


  Average write rate:
     daily: < 40 tps

     max at 400 tps during recovery

     Perform best when write rate < 100 tps

  Query rate: 20~80 qps
  Query response time – shorter when indexes and
    data are in memory
      75%: ~3 ms when qps < 15; ~2 ms when qps ~= 60
      95%: 6~8 ms when qps < 15; 3~4 ms when qps ~= 60

      CPU Idle %: > 99%.



Presentation for
[CLIENT]
Pag
                   e 17




Presentation for
[CLIENT]
Deployment Topology Consideration
                                   Pag
                                   e 18


•  Minimum configuration: host/DC redundency
   •  DC1: host 1 (master), host 3 (slave)

   •  DC2: host 2 (failover master), host 4 (slave)

•  Data locality: significant when network latency is a
    concern (100 Mbps)
    •    3,000 qps when DB is on remote host.
    •    15,000 qps when DB is on local host.
•  Linking dependent servers across data centers
   •  Push cross link up as far as possible (Topology 3): link to
      dependent servers in the same data center.


Presentation for
[CLIENT]
Deployment Topology 1: minimum config
                             Pag
                             e 19
   Date Center 1


       DB          DB



                          Data      WWW
                        Consumer




       DB          DB


   Date Center 2


Presentation for
[CLIENT]
Topology 2: link across DCs (bad)
                                   Pag
                                   e 20


                                        Data
                   DB   V                        V
       DB                             Consumer
                        I                        I
                        P                        P
                                        Data
                   DB                 Consumer       G
                                                     S
                                                     L   WWW
                            GSLB
                                                     B

                                        Data
                   DB   V                        V
                                      Consumer
                        I                        I
       DB               P                        P
                                        Data
                   DB
                                      Consumer

Presentation for
[CLIENT]
Topology 3: link to same DC (better)
                             Pag
                             e 21


                                Data
                   DB   V                V
       DB                     Consumer
                        I                I
                        P                P
                                Data
                   DB         Consumer       G
                                             S
                                             L   WWW
                                             B

                                Data
                   DB   V                V
                              Consumer
                        I                I
       DB               P                P
                                Data
                   DB
                              Consumer

Presentation for
[CLIENT]
Topology 4: use local UNIX socket
                              Pag
                              e 22


                            Data
                     DB                V
       DB                 Consumer
                                       I
                                       P
                            Data
                     DB   Consumer         G
                                           S
                                           L   WWW
                                           B

                            Data
                     DB   Consumer     V
                                       I
       DB                              P
                            Data
                     DB
                          Consumer

Presentation for
[CLIENT]
Production Monitoring
                            Pag
                            e 23


  Operational Monitoring: logcheck, Scout/NOC alert,
   etc.
  DB monitoring on replication failure, latency, read/
   write rate, performance metrics.




Presentation for
[CLIENT]
Metrics Collection
                                   Pag
                                   e 24


  Graphing collected metrics: visualize and collate
    operational metrics.
      Help analyzing and fine tuning server performance.
      Help trace production issues and identify point of failure.

  What metrics are important?
     Host: CPU, MEM, disk I/O, network I/O, # of processes, CPU
      swap/paging
     Server: Throughputs, response time

  Comparison: line up charts (throughputs, response
    time, CPU, disk i/o) in the same time window.

Presentation for
[CLIENT]
Pag
                   e 25




Presentation for
[CLIENT]
Pag
                   e 26




Presentation for
[CLIENT]
Pag
                   e 27




Presentation for
[CLIENT]
Tuning and Optimizing Queries
                                 Pag
                                 e 28


  Explain: mysql> explain SELECT ... FROM …
  Watch out for tmp table usage, table scan, etc.
  SQL_NO_CACHE
  MySQL Query profiler
     mysql> set profiling=1;

  Linux OS Cache: leave enough memory on host
  USE INDEX hint to choose INDEX explicitly
     use wisely: most of the time, MySQL chooses the right index
      for you. But, when table size grows, index cardinality might
      change.

Presentation for
[CLIENT]
Important MySQL statistics
                               Pag
                               e 29


  SHOW GLOBAL STATUS…
     Qcache_free_blocks

     Qcache_free_memory

     Qcache_hits

     Qcache_inserts

     Qcache_lowmem_prunes

     Qcache_not_cached

     Qcache_queries_in_cache

     Select_scan

     Sort_scan




Presentation for
[CLIENT]
Important MySQL statistics (cont.)
                               Pag
                               e 30

      Table_locks_waited
      Innodb_row_lock_current_waits

      Innodb_row_lock_time

      Innodb_row_lock_time_avg

      Innodb_row_lock_time_max

      Innodb_row_lock_waits

      Select_scan

      Slave_open_temp_tables




Presentation for
[CLIENT]
Heuristic Query Optimization Algorithm
                                    Pag
                                    e 31


  Primary for complex cluster queries: find latest N
   topics and related stories.
  Strategy: reduce the number of records database
   needs to load from disk to perform a query.
      Pick a default query range. If in-sufficient docs are returned,
       expand query range proportionally.
      If none return => sparse data => drop the range and retry.

      Save query range for future references.

  Result: reduce number of rows needed to process
    from millions to hundreds => cut query time down
    from minutes to less than 10 ms.
Presentation for
[CLIENT]
Query	
  range	
  
                                               Cluster	
  query	
  
               look	
  up	
  
                                             NumOfTripToDB	
  =0	
  

                                  no	
  
              Has query                    Use default
               range?                        range
                                                                    Compute docs to range ratio and
                                                                  prorate it to a range that would return
                                                                        sufficient amount of docs.

       Bound query with the
        range and send it to
                DB                                                                                              yes	
  
                                                                                 NumOfTrip
                                                                                 ToDB	
  >=2?	
  
                        NumOfTripToDB++	
  



             Suf@icient	
                                                                             yes	
  
              results	
                                                          numOfResults                             Send original
               from	
                                                               == 0?                                 query to DB
               query	
  
              engine?	
  
                                            Query	
  
                                            Engine	
  

          yes	
  

      Compute docs to range
       ratio and save it back                                Return query
      to the look up table for                             results to clients.
             future use.
Presentation for
[CLIENT]
Lessons Learned
                           Pag
                           e 33


  Always load test well ahead of launch (2 weeks) to
   avoid fire drill.
  Don’t rely on cache solely. Database needs to be able
   to serve reasonable amount of queries on its own.
  Separate cache from applications to avoid cold start.
  Keep transaction/query simple and return fast.
  Avoid table join; limit it to 2 if really needed.
  Avoid stored procedure: results are not cached; need
   DBA when altering implementation.

Presentation for
[CLIENT]
Lessons Learned (cont.)
                           Pag
                           e 34


  Avoid using ‘offset’ in LIMIT clause; use application
    based pagination instead.
  Avoid ‘SQL_CALC_FOUND_ROWS’ in SELECT
  If possible, exclude text/blob columns from query
    results to avoid disk I/O.
  Store text/blob in separate table to speed up backup,
    optimization, and schema change.
  Separate real time v.s. archive data for better
    performance and easier maintenance.
  Keep table size under control ( < 100 GB) ; optimized
    periodically.
Presentation for
[CLIENT]
Lessons Learned (cont.)
                                  Pag
                                  e 35


  Put SQL statement (templates) in resource files so
   you can tune it without binary change.
  Set up replication in dev & qa to catch replication
   issues earlier
      Transactional (MySQL 5.0.x) v.s. data/mixed (5.1 or above)
      Auto-increment + (INSERT.. ON DUPLICATE UPDATE…)

      Date time column: default to NOW()

      Oversized data: increase max_allowed_packet

      Replication lag: transactions that involve index update/
       deletion often take longer to complete.
  Host and data center redundancy is important –
    don’t put all eggs in one basket.
Presentation for
[CLIENT]
RTN 3 Redesign
                                   Pag
                                   e 36


  Free Text Search with SOLR
     Real time v.s. archive shards.

     1 minute latency w/o Ramdisk.

  Asset DB partitioned – 5 rows/doc -> 25 rows/doc
  Avoid (System) Virtual Machine; instead, stack high
    end hosts with processes that use different system
    resources (CPU, MEM, disk space, etc)
      Better network and system resource utilization – cost effective.
      Data Locality

  More processors (< 12 ) help when under load.

Presentation for
[CLIENT]
Q&A
                        Pag
                        e 37


  Questions or comments?




Presentation for
[CLIENT]
Pag
                   e 38


  THANK YOU !!




Presentation for
[CLIENT]

More Related Content

What's hot (17)

PPTX
Oracle 11g data warehouse introdution
Aditya Trivedi
 
PPTX
The IBM Netezza datawarehouse appliance
IBM Danmark
 
PPT
An Introduction to Netezza
Vijaya Chandrika
 
PDF
From Raw Data to Analytics with No ETL
Cloudera, Inc.
 
PDF
NewSQL Database Overview
Steve Min
 
PPTX
Experience SQL Server 2017: The Modern Data Platform
Bob Ward
 
PDF
Whats New Sql Server 2008 R2
Eduardo Castro
 
PDF
Bigtable and Dynamo
Iraklis Psaroudakis
 
PPTX
IBM Pure Data System for Analytics (Netezza)
Girish Srivastava
 
PDF
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
EDB
 
PPTX
Bigdata netezza-ppt-apr2013-bhawani nandan prasad
Bhawani N Prasad
 
PDF
Netezza vs teradata
Asis Mohanty
 
PPT
An overview of snowflake
Sivakumar Ramar
 
PPTX
Hadoop & Greenplum: Why Do Such a Thing?
Ed Kohlwey
 
PDF
Architecture of exadata database machine – Part II
Paresh Nayak,OCP®,Prince2®
 
PPTX
Polyglot Database - Linuxcon North America 2016
Dave Stokes
 
PPTX
What Your Database Query is Really Doing
Dave Stokes
 
Oracle 11g data warehouse introdution
Aditya Trivedi
 
The IBM Netezza datawarehouse appliance
IBM Danmark
 
An Introduction to Netezza
Vijaya Chandrika
 
From Raw Data to Analytics with No ETL
Cloudera, Inc.
 
NewSQL Database Overview
Steve Min
 
Experience SQL Server 2017: The Modern Data Platform
Bob Ward
 
Whats New Sql Server 2008 R2
Eduardo Castro
 
Bigtable and Dynamo
Iraklis Psaroudakis
 
IBM Pure Data System for Analytics (Netezza)
Girish Srivastava
 
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
EDB
 
Bigdata netezza-ppt-apr2013-bhawani nandan prasad
Bhawani N Prasad
 
Netezza vs teradata
Asis Mohanty
 
An overview of snowflake
Sivakumar Ramar
 
Hadoop & Greenplum: Why Do Such a Thing?
Ed Kohlwey
 
Architecture of exadata database machine – Part II
Paresh Nayak,OCP®,Prince2®
 
Polyglot Database - Linuxcon North America 2016
Dave Stokes
 
What Your Database Query is Really Doing
Dave Stokes
 

Viewers also liked (17)

PPTX
Data Science and Data Visualization (All about Data Analysis) by Pooja Ajmera
Pooja Ajmera
 
PPTX
Choosing a Data Visualization Tool for Data Scientists_Final
Heather Choi
 
PPTX
Microsoft NERD Talk - R and Tableau - 2-4-2013
Tanya Cashorali
 
PPTX
Using Salesforce, ERP, Tableau & R in Sales Forecasting
Senturus
 
PDF
Performance data visualization with r and tableau
Enkitec
 
PDF
R Markdown Tutorial For Beginners
Rsquared Academy
 
PDF
RMySQL Tutorial For Beginners
Rsquared Academy
 
PDF
Open Source Software for Data Scientists -- BigConf 2014
Charlie Greenbacker
 
PPTX
Big Data: The 4 Layers Everyone Must Know
Bernard Marr
 
PPTX
Big Data Analytics
Global Business Solutions SME
 
PPTX
Big Data Analytics
Ghulam Imaduddin
 
PDF
Big Data Visualization
Raffael Marty
 
PPTX
Tableau Software - Business Analytics and Data Visualization
lesterathayde
 
PDF
Big Data Architecture
Guido Schmutz
 
PDF
Lista 2 (1)
Tuane Paixão
 
PPTX
An Interactive Introduction To R (Programming Language For Statistics)
Dataspora
 
PPT
Big Data Analytics 2014
Stratebi
 
Data Science and Data Visualization (All about Data Analysis) by Pooja Ajmera
Pooja Ajmera
 
Choosing a Data Visualization Tool for Data Scientists_Final
Heather Choi
 
Microsoft NERD Talk - R and Tableau - 2-4-2013
Tanya Cashorali
 
Using Salesforce, ERP, Tableau & R in Sales Forecasting
Senturus
 
Performance data visualization with r and tableau
Enkitec
 
R Markdown Tutorial For Beginners
Rsquared Academy
 
RMySQL Tutorial For Beginners
Rsquared Academy
 
Open Source Software for Data Scientists -- BigConf 2014
Charlie Greenbacker
 
Big Data: The 4 Layers Everyone Must Know
Bernard Marr
 
Big Data Analytics
Global Business Solutions SME
 
Big Data Analytics
Ghulam Imaduddin
 
Big Data Visualization
Raffael Marty
 
Tableau Software - Business Analytics and Data Visualization
lesterathayde
 
Big Data Architecture
Guido Schmutz
 
Lista 2 (1)
Tuane Paixão
 
An Interactive Introduction To R (Programming Language For Statistics)
Dataspora
 
Big Data Analytics 2014
Stratebi
 
Ad

Similar to Building and deploying large scale real time news system with my sql and distributed cache mysql_conf (20)

PDF
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
PDF
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Lviv Startup Club
 
PDF
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
ScyllaDB
 
PPTX
MinneBar 2013 - Scaling with Cassandra
Jeff Smoley
 
PDF
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
DataStax Academy
 
PDF
Building a High Performance Analytics Platform
Santanu Dey
 
PPTX
Using SAS GRID v 9 with Isilon F810
Boni Bruno
 
PPTX
Compare Clustering Methods for MS SQL Server
AlexDepo
 
PDF
Cassandra's Odyssey @ Netflix
Roopa Tangirala
 
PDF
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Alluxio, Inc.
 
PDF
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Trivadis
 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
PPTX
NewSQL - Deliverance from BASE and back to SQL and ACID
Tony Rogerson
 
PDF
Amazon Elastic Map Reduce - Ian Meyers
huguk
 
PDF
Software Developer Portfolio: Backend Architecture & Performance Optimization
kiwoong (daniel) kim
 
PPT
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Bhupesh Bansal
 
PPT
Hadoop and Voldemort @ LinkedIn
Hadoop User Group
 
PPTX
Software architecture for data applications
Ding Li
 
PDF
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
DataStax Academy
 
PDF
Cassandra Summit 2015 - A Change of Seasons
Eiti Kimura
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Lviv Startup Club
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
ScyllaDB
 
MinneBar 2013 - Scaling with Cassandra
Jeff Smoley
 
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
DataStax Academy
 
Building a High Performance Analytics Platform
Santanu Dey
 
Using SAS GRID v 9 with Isilon F810
Boni Bruno
 
Compare Clustering Methods for MS SQL Server
AlexDepo
 
Cassandra's Odyssey @ Netflix
Roopa Tangirala
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Alluxio, Inc.
 
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Trivadis
 
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
NewSQL - Deliverance from BASE and back to SQL and ACID
Tony Rogerson
 
Amazon Elastic Map Reduce - Ian Meyers
huguk
 
Software Developer Portfolio: Backend Architecture & Performance Optimization
kiwoong (daniel) kim
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop User Group
 
Software architecture for data applications
Ding Li
 
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
DataStax Academy
 
Cassandra Summit 2015 - A Change of Seasons
Eiti Kimura
 
Ad

Recently uploaded (20)

PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 

Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

  • 1. Building and Deploying Large Scale Real Time News System with MySQL and Distributed Cache Presented  to  MySQL  Conference   Apr.  13,  2011  
  • 2. Who am I? Pag e2   Tao Cheng <[email protected]>, AOL Real Time News (RTN).   Worked on Mail and Browser clients in the ‘90 and then moved to web backend servers since.   Not an expert but am happy to share my experience and brainstorm solutions. Presentation for [CLIENT]
  • 3. Agenda   AOL Real Time News (RTN): what it is?   Requirements   Technical solutions with focus on MySQL   Deployment Topology   Operational Monitoring   Metrics Collection
  • 4. Agenda   Tips for query tuning and optimization   Heuristic Query Optimization Algorithm   Lessons learned   Q & A
  • 5. Real Time News : background Pag e5 AOL deployed its large scale Real Time News (RTN) system in 2007. This system ingests and processes news from 30,000 sources on every second around the clock. Today, its data store, MySQL, has accumulated over several billions of rows and terabytes of data. However, news are delivered to end users in close to real time fashion. This presentation shares how it is done and the lessons learned. Presentation for AOLU Un-University
  • 6. Brief Intro: sample features Pag e6   Data presentation: return most recent news in   flat view – most recent news about an entity. An entity could be a person, a company, a sports team, etc.   topic clusters – most recent news grouped by topics. A topic is a group of news about an event, headline news, etc.   News filtering by   source types such as news, blogs, press releases, regional, etc.   relevancy level (high, medium, low, etc) to the entities .   Data Delivery: push (to subscribers) and pull   Search by entities, categories (National, Sports, Finance, etc), topics, document ID, etc. Presentation for [CLIENT]
  • 7. Requirements for Phase I (2006) Pag e7   Commodity hardware: 4 CPU, 16 GB MEM, 600 GB disk space.   Data ingestion rate = 250K docs/day; average document size = 5 KB.   Data retention period: 7 days to forever   Est. data set size: (1.25 GB/day or 456 GB/year) + space for indexes, schema change, and optimization.   Response time: < 30 milli-second/query   Throughputs: > 400 queries/sec/server   Up time: 99.999% Presentation for [CLIENT]
  • 8. Solutions: MySQL + Bucky Pag e8   MySQL   Serve raw/distinct queries   Back fill   Bucky Technology (AOL’s distributed cache & computing framework)   Write ahead cache: pre-compute query results and push them into cache.   Messaging (optional): push data directly to subscribers   Updatesare pushed to data consumers or browsers via AIM Complex.   Updates go to both database and cache. Presentation for [CLIENT]
  • 9. Architecture Diagram (over-simplified) Pag e9 WWW AIM   push Relegence   Ingestor   Distributed   Cache   Gateway   pull WWW Distributed   Cache   Gateway   Asset  DB   Presentation for [CLIENT]
  • 10. Data Model: SOR v.s. Query DB Pag e 10   Separate query from storage to keep tables small and query fast.   System of Record (SOR): has all raw data   The authoritative data store; designed for data storage   Normalized schema: for simple key look-up; no table join.   Query DB – de-normalized for query speed   avoid JOIN, reduce # of trips to DB, increase throughputs.   Read/write small chunk of data at a time so database can get requests out quickly and process more.   Use replication to achieve linear scalability for read. Presentation for [CLIENT]
  • 11. Design Strategies: partitioning (Why) Pag e 11   Dataset too big to fit on one host   Performance consideration: divide and conquer   Write: more masters (Nx) to take writes   Read: smaller tables + more (NxM) slaves to handle read.   Fault tolerance – distribute the risk and reduce the impact of system failure   Easier Maintenance – size does matter   Faster nightly backup, disaster recovery, schema change, etc.   Faster optimization –need optimization to reclaim disk space after deletion, rebuild indexes to improve query speed. Presentation for [CLIENT]
  • 12. Design Strategies: partitioning (How) Pag e 12   Partition on most used keys (look at query patterns)   Document table – on document ID   Entity table – on entity ID   Simple hash on IDs – no partition map; thus no competition of read/write locks on yet another table   Managing growth: add another partition set   New documents are written into both old and new partition sets for a few weeks. Then, stop writing into the old partitions.   Queries go to the new partitions first and then the old ones if in-sufficient results found.   Works great in our case but might not for everyone. Presentation for [CLIENT]
  • 13. Schema design: De-normalization Pag e 13   Make query tables small:   put only essential attributes in the de-normalized tables   store long text attributes in separate tables.   De-normalization: how to store and match attributes   Single value attributes (1:1) : document ID, short string, date time, etc. – one column, one row.   Multi-value attributes (1:many): tricky but feasible   Use multiple rows with composite index/key: (c1, c2, etc.)   One row one column: CSV string, e.g., “id1, id2, id3” – SQL: “val like ‘%id2%’”   One row but multiple columns, e.g., group1, group2, etc. – SQL: group1=val1 OR group2=val2 ... Presentation for [CLIENT]
  • 14. Tips for indexing Pag e 14   Simple key – for metadata retrieval   Composite key – find matching documents   Start with low cardinality and most used columns   Order matter: (c1, c2, c3) != (c2, c3, c1)   InnoDB – all secondary indexes contain primary key   Make primary key short to keep index size small   Queries using secondary index references primary key too.   Integer v.s. String – comparison of numeric values is faster => index hash values of long string instead.   Index length – title:varchar(255) => idx_title(32)   Enforce referential integrity on application side. Presentation for [CLIENT]
  • 15. MySQL configuration Pag e 15   Storage engine: InnoDB – row level locking   Table space – one file per table   Easier to maintain (schema change, optimization, etc.)   Character set: ‘UTF-8’   Disable persistent connection (5.0.x)   skip-character-set-client-handshake   Enable slow query log to identify bad queries.   System variables for memory buffer size   innodb_buffer_pool_size: data and indexes   Sort_buffer_size, max_heap_table_size, tmp_table_size   Query cache size=0; tables are updated constantly Presentation for [CLIENT]
  • 16. Runtime statistics (per server) Pag e 16   Average write rate:   daily: < 40 tps   max at 400 tps during recovery   Perform best when write rate < 100 tps   Query rate: 20~80 qps   Query response time – shorter when indexes and data are in memory   75%: ~3 ms when qps < 15; ~2 ms when qps ~= 60   95%: 6~8 ms when qps < 15; 3~4 ms when qps ~= 60   CPU Idle %: > 99%. Presentation for [CLIENT]
  • 17. Pag e 17 Presentation for [CLIENT]
  • 18. Deployment Topology Consideration Pag e 18 •  Minimum configuration: host/DC redundency •  DC1: host 1 (master), host 3 (slave) •  DC2: host 2 (failover master), host 4 (slave) •  Data locality: significant when network latency is a concern (100 Mbps) •  3,000 qps when DB is on remote host. •  15,000 qps when DB is on local host. •  Linking dependent servers across data centers •  Push cross link up as far as possible (Topology 3): link to dependent servers in the same data center. Presentation for [CLIENT]
  • 19. Deployment Topology 1: minimum config Pag e 19 Date Center 1 DB DB Data WWW Consumer DB DB Date Center 2 Presentation for [CLIENT]
  • 20. Topology 2: link across DCs (bad) Pag e 20 Data DB V V DB Consumer I I P P Data DB Consumer G S L WWW GSLB B Data DB V V Consumer I I DB P P Data DB Consumer Presentation for [CLIENT]
  • 21. Topology 3: link to same DC (better) Pag e 21 Data DB V V DB Consumer I I P P Data DB Consumer G S L WWW B Data DB V V Consumer I I DB P P Data DB Consumer Presentation for [CLIENT]
  • 22. Topology 4: use local UNIX socket Pag e 22 Data DB V DB Consumer I P Data DB Consumer G S L WWW B Data DB Consumer V I DB P Data DB Consumer Presentation for [CLIENT]
  • 23. Production Monitoring Pag e 23   Operational Monitoring: logcheck, Scout/NOC alert, etc.   DB monitoring on replication failure, latency, read/ write rate, performance metrics. Presentation for [CLIENT]
  • 24. Metrics Collection Pag e 24   Graphing collected metrics: visualize and collate operational metrics.   Help analyzing and fine tuning server performance.   Help trace production issues and identify point of failure.   What metrics are important?   Host: CPU, MEM, disk I/O, network I/O, # of processes, CPU swap/paging   Server: Throughputs, response time   Comparison: line up charts (throughputs, response time, CPU, disk i/o) in the same time window. Presentation for [CLIENT]
  • 25. Pag e 25 Presentation for [CLIENT]
  • 26. Pag e 26 Presentation for [CLIENT]
  • 27. Pag e 27 Presentation for [CLIENT]
  • 28. Tuning and Optimizing Queries Pag e 28   Explain: mysql> explain SELECT ... FROM …   Watch out for tmp table usage, table scan, etc.   SQL_NO_CACHE   MySQL Query profiler   mysql> set profiling=1;   Linux OS Cache: leave enough memory on host   USE INDEX hint to choose INDEX explicitly   use wisely: most of the time, MySQL chooses the right index for you. But, when table size grows, index cardinality might change. Presentation for [CLIENT]
  • 29. Important MySQL statistics Pag e 29   SHOW GLOBAL STATUS…   Qcache_free_blocks   Qcache_free_memory   Qcache_hits   Qcache_inserts   Qcache_lowmem_prunes   Qcache_not_cached   Qcache_queries_in_cache   Select_scan   Sort_scan Presentation for [CLIENT]
  • 30. Important MySQL statistics (cont.) Pag e 30   Table_locks_waited   Innodb_row_lock_current_waits   Innodb_row_lock_time   Innodb_row_lock_time_avg   Innodb_row_lock_time_max   Innodb_row_lock_waits   Select_scan   Slave_open_temp_tables Presentation for [CLIENT]
  • 31. Heuristic Query Optimization Algorithm Pag e 31   Primary for complex cluster queries: find latest N topics and related stories.   Strategy: reduce the number of records database needs to load from disk to perform a query.   Pick a default query range. If in-sufficient docs are returned, expand query range proportionally.   If none return => sparse data => drop the range and retry.   Save query range for future references.   Result: reduce number of rows needed to process from millions to hundreds => cut query time down from minutes to less than 10 ms. Presentation for [CLIENT]
  • 32. Query  range   Cluster  query   look  up   NumOfTripToDB  =0   no   Has query Use default range? range Compute docs to range ratio and prorate it to a range that would return sufficient amount of docs. Bound query with the range and send it to DB yes   NumOfTrip ToDB  >=2?   NumOfTripToDB++   Suf@icient   yes   results   numOfResults Send original from   == 0? query to DB query   engine?   Query   Engine   yes   Compute docs to range ratio and save it back Return query to the look up table for results to clients. future use. Presentation for [CLIENT]
  • 33. Lessons Learned Pag e 33   Always load test well ahead of launch (2 weeks) to avoid fire drill.   Don’t rely on cache solely. Database needs to be able to serve reasonable amount of queries on its own.   Separate cache from applications to avoid cold start.   Keep transaction/query simple and return fast.   Avoid table join; limit it to 2 if really needed.   Avoid stored procedure: results are not cached; need DBA when altering implementation. Presentation for [CLIENT]
  • 34. Lessons Learned (cont.) Pag e 34   Avoid using ‘offset’ in LIMIT clause; use application based pagination instead.   Avoid ‘SQL_CALC_FOUND_ROWS’ in SELECT   If possible, exclude text/blob columns from query results to avoid disk I/O.   Store text/blob in separate table to speed up backup, optimization, and schema change.   Separate real time v.s. archive data for better performance and easier maintenance.   Keep table size under control ( < 100 GB) ; optimized periodically. Presentation for [CLIENT]
  • 35. Lessons Learned (cont.) Pag e 35   Put SQL statement (templates) in resource files so you can tune it without binary change.   Set up replication in dev & qa to catch replication issues earlier   Transactional (MySQL 5.0.x) v.s. data/mixed (5.1 or above)   Auto-increment + (INSERT.. ON DUPLICATE UPDATE…)   Date time column: default to NOW()   Oversized data: increase max_allowed_packet   Replication lag: transactions that involve index update/ deletion often take longer to complete.   Host and data center redundancy is important – don’t put all eggs in one basket. Presentation for [CLIENT]
  • 36. RTN 3 Redesign Pag e 36   Free Text Search with SOLR   Real time v.s. archive shards.   1 minute latency w/o Ramdisk.   Asset DB partitioned – 5 rows/doc -> 25 rows/doc   Avoid (System) Virtual Machine; instead, stack high end hosts with processes that use different system resources (CPU, MEM, disk space, etc)   Better network and system resource utilization – cost effective.   Data Locality   More processors (< 12 ) help when under load. Presentation for [CLIENT]
  • 37. Q&A Pag e 37   Questions or comments? Presentation for [CLIENT]
  • 38. Pag e 38   THANK YOU !! Presentation for [CLIENT]