SlideShare a Scribd company logo
MongoDB at
 Qihoo 360
                王超


    2013.3.23
• 背景
• 发展历程
 a) 初涉 - 千万
 b) 挑战 - 亿级
 c) 试炼 - 百亿
• 展望未来
So Exciting!

 • High Performance
 • Scalability
 • Schema free
初涉


2012.01
 • MongoDB 2.0.2
 • 2 * mongos + 3 * mongod + 3 * config
 • mongod(1 primary + 1 secondary + 1 arbiter)


 • 3 Servers(Xen)
 • 32G RAM
 • SAS 15K – RAID 5
千万级数据规模
         - Keeping data in RAM

•   QPS < 500
•   R:W ~ 4:1
•   Opcounters < 20 Million
•   Document < 50 Million
问题
• 每天一万多个Timeout (3s)


排查
• Profiling Levels(1), slowms
• mongostat
• iostat
现象
• iostat,w/s, wkB/s 有规律的间隔出现
• 与slow log能对应上,批量呈现
• 产生I/O时,mongostat状态如下
 – insert/query/update 持续几秒内有所下降
 – lock > 80%
 – flushes = 1
--syncdelay
      mmap, flush memory data into disk
     • 默认60秒
     • 不建议太长,早晚要刷入磁盘,出来混迟早要
       还的!
     • RWLOCK, global Lock(ver 2.0.x)

or
db.runCommand( { setParameter: 1, syncdelay: N} )
• 缩短syncdelay为5秒
• 减少了60%的timeout



继续观察…
slow query总伴随着moveChunk出现
• 调整balancer启动时间,避免高峰期工作
db.settings.update
(
         { "_id" : "balancer" },
         { $set : { "activeWindow" : { start : "00:00", stop : "8:00" } } },
         true
)




• Mongos Connection Pool /
  VersionManager Bug,偶尔超时
超时问题总结

• Syncdelay
• moveChunk, activeWindow
• BUG - Connection Pool / VersionManager
挑战


亿级数据规模             - 2012.04

•   6 Servers, 64G RAM
•   SAS 15K – RAID 5
•   Opcounters > 50 Million
•   Document > 100 Million
问题
• Timeout (3s) again!
  – 平均latency上涨 (毫秒->百毫秒)
  – 平均lock > 50%
  – 缺页非常严重


• 0:00-8:00已无法均衡白天产生的数据
归根结底

• 数据超出了内存
• 纯随机读写
如何让数据重返内存?

• 节省空间使用
• 增加内存资源
业务应用场景
老的结构:
       – _id:     BSON string, hash(160 bit)
       – cnum:    Array
       ……
{
    _id: “d0be2dc421be4fcd0172e5afceea3970e2f3d940”,
    cnum: [0, 1, 2],
    ……
}
压缩后的结构:
  • _id:        BSON Binary, hash(160 bit)
      40 bytes -> 20 bytes
  • cnum:       Int32
      Array -> 位运算
  ……


空间节省一半
其他的好处…
TIPS:
注意document长度对QPS的影响
 – 6000万数据
 – 随机读写,数据小于内存


测试结果:
 – 3K: r/s > 6000, w/s > 500
 – 1K: r/s > 11000, w/s > 1500
预热数据
何时预热?
 –   机器重启
 –   增加secondary
 –   增加shard



预热工具
 – dd / cat 不好使
 – vmtouch:http://hoytech.com/vmtouch/
    • 内置touch command (version 2.2)
0:00-8:00已无法均衡白天产生的数据

原因:
• IOPS瓶颈
 – shardkey: sha1, 数据散列在磁盘
解决
• moveChunk 加入限速功能
• balancer开始时间恢复为 0:00-24:00


内存问题?
预估两个月后,数据会再度超出内存
SSD in MongoDB

• No Raid! HBA直连,性能发挥到最好!
• PageFault? Memory? 浮云!
• Low latency

diao丝->高富帅
MongoDB at Qihoo 360
试炼


百亿级数据规模

•   100+ Servers, 64G RAM, SSD * 5
•   Cluster: 20+
•   Opcounters: 2+ Billion
•   Document: 30+ Billion
高枕无忧?NO!

•   NUMA架构
•   连接的选择
•   跨IDC应用
•   如何在线迁移业务
NUMA架构

现象:
• 内存无规律换入换出,pgscand/s、
  pgscank/s 飙升(sar –B)
• 某核CPU使用率 100%
• mongostat Lock > 90%
• 持续阻塞时间十秒左右(64G内存)
原因:
• 使用默认内存访问策略时,单NUMA节点(特别是0节点)内存使用超
  出单节点内存大小时, 上述问题与linux的行为有关。
• 关闭swap问题依旧存在

解决:
numactl --interleave=all ./xxx
echo 0 > /proc/sys/vm/zone_reclaim_mode


参考:
http://docs.mongodb.org/manual/administration/production-notes/#production-numa
http://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/
http://blog.jcole.us/2012/04/16/a-brief-update-on-numa-and-mysql/
连接的选择

• 使用长连接的一次事故
 – mongos/mongod crash
 – 启动即挂



pthread_create failed,达到系统最大上限
为什么?

• 一个连接一个线程的网络模型
• php driver < 1.2.10版本有连接泄露(超时异常时),client
  设置timeout为100ms
• Client与每个mongos都建立连接,导致mongod连接X倍
• mongos/mongod 服务器复用

Mongod Conns(Threads) Nummber:
        N Web Servers * N FastCGI Process * N mongos

   e.g. 100 * 128 * 2 ~= 25K Conns(Threads) > maxConns
如何解决?

• Fix bug - php driver
• 调整系统参数
  ulimit [open files| max user processes]
  /proc/sys/kernel/threads-max
  /proc/sys/kernel/pid_max
  /proc/sys/vm/max_map_count

• 改代码去掉maxConns限制
• Client只连接一个mongos (Zookeeper解决可靠性问题)
• 做好连接的预先规划
短连接

• 创建关闭连接的开销
• 创建关闭线程的开销(no threads cache)
• Mongos Connection Pool /
  VersionManager Bug,触发超时逻辑
跨IDC应用-单集群

特点:
•   多机房容灾,架构、部署简单
•   IDC之间依赖光纤
•   区分主次机房
•   适用于读多写少
    – 基于就近选择策略(2.2版本官方自带)
MongoDB at Qihoo 360
跨IDC应用-多集群
• 数据同步 – QBUS(分布式消息队列)
相比单集群优点:

• 集群独立,调整灵活
• 光纤断不影响写入,仅影响新数据同步的
  实时性
• IDC瘫痪无需干预,业务切域名
 (单集群模式时,主IDC瘫痪需要手动切primary)
如何在线迁移业务?

• oplog实时同步程序(2.2版本自带)
 – 从Secondary copy数据
 (mongodump太慢,同步完oplog就跟不到了)
In Future


展望未来
     • WEB化集群管理
     • 数据压缩
     • 多线程数据同步、迁移
        – 新增secondary
        – 新增shard



期待:
collection lock! or document lock?
Q&A
          Thanks
                                  We Are Hiring...
Weibo: http://weibo.com/chancey
Email: chanceycn@gmail.com




    Qihoo 360

More Related Content

What's hot (20)

PDF
Java线上应用问题排查方法和工具(空望)
ykdsg
 
PPTX
HBase@taobao for 技术沙龙
bluedavy lin
 
PPTX
Sun jdk 1.6内存管理 -使用篇
bluedavy lin
 
PPTX
了解内存
Feng Yu
 
PPTX
Linux内存管理
zijia
 
PPTX
Track2 -刘继伟--openstack in gamewave
OpenCity Community
 
PDF
聊聊我接触的集群管理
rfyiamcool
 
PPT
云计算环境中Ssd在cassandra测试的性能表现
july19850903
 
PDF
FtnApp 的缩略图实践
Frank Xu
 
PPTX
并发编程交流
bluedavy lin
 
PPT
Redis 常见使用模式分析
vincent253
 
PPTX
Flash存储设备在淘宝的应用实践
Feng Yu
 
PPT
Redis分享
yiihsia
 
PPTX
利用新硬件提升数据库性能
Feng Yu
 
PDF
淘宝主备数据库自动切换
mysqlops
 
PPTX
Sun JDK 1.6内存管理 -调优篇
bluedavy lin
 
PDF
NoSQL误用和常见陷阱分析
iammutex
 
PPT
Lamp优化实践
zhliji2
 
PPTX
cdn的那些事儿
rfyiamcool
 
PPTX
Java常见问题排查
bluedavy lin
 
Java线上应用问题排查方法和工具(空望)
ykdsg
 
HBase@taobao for 技术沙龙
bluedavy lin
 
Sun jdk 1.6内存管理 -使用篇
bluedavy lin
 
了解内存
Feng Yu
 
Linux内存管理
zijia
 
Track2 -刘继伟--openstack in gamewave
OpenCity Community
 
聊聊我接触的集群管理
rfyiamcool
 
云计算环境中Ssd在cassandra测试的性能表现
july19850903
 
FtnApp 的缩略图实践
Frank Xu
 
并发编程交流
bluedavy lin
 
Redis 常见使用模式分析
vincent253
 
Flash存储设备在淘宝的应用实践
Feng Yu
 
Redis分享
yiihsia
 
利用新硬件提升数据库性能
Feng Yu
 
淘宝主备数据库自动切换
mysqlops
 
Sun JDK 1.6内存管理 -调优篇
bluedavy lin
 
NoSQL误用和常见陷阱分析
iammutex
 
Lamp优化实践
zhliji2
 
cdn的那些事儿
rfyiamcool
 
Java常见问题排查
bluedavy lin
 

Viewers also liked (9)

PDF
Mdb dn 2016_12_single_view
Daniel M. Farrell
 
PPTX
Webinar: Creating a Single View: Securing Your Deployment
MongoDB
 
PDF
How MongoDB Achieved a 360-Degree View of Sales & Marketing Alignment
Full Circle Insights
 
PPTX
Webinar: How Financial Firms Create a Single Customer View with MongoDB
MongoDB
 
PPTX
Webinar: An Enterprise Architect’s View of MongoDB
MongoDB
 
PPTX
Single view with_mongo_db_(lo)
MongoDB
 
PPTX
Prepare for Peak Holiday Season with MongoDB
MongoDB
 
PDF
Single View of the Customer
MongoDB
 
PDF
10-Step Methodology to Building a Single View with MongoDB
Mat Keep
 
Mdb dn 2016_12_single_view
Daniel M. Farrell
 
Webinar: Creating a Single View: Securing Your Deployment
MongoDB
 
How MongoDB Achieved a 360-Degree View of Sales & Marketing Alignment
Full Circle Insights
 
Webinar: How Financial Firms Create a Single Customer View with MongoDB
MongoDB
 
Webinar: An Enterprise Architect’s View of MongoDB
MongoDB
 
Single view with_mongo_db_(lo)
MongoDB
 
Prepare for Peak Holiday Season with MongoDB
MongoDB
 
Single View of the Customer
MongoDB
 
10-Step Methodology to Building a Single View with MongoDB
Mat Keep
 
Ad

Similar to MongoDB at Qihoo 360 (20)

PDF
How do we manage more than one thousand of Pegasus clusters - backend part
acelyc1112009
 
PDF
Hacking Nginx at Taobao
Joshua Zhu
 
PPTX
My sql 5.6新特性深入剖析——innodb引擎
frogd
 
PDF
豆瓣网技术架构变迁
reinhardx
 
PPTX
Nosql三步曲
84zhu
 
PDF
分布式Key Value Store漫谈
Tim Y
 
PDF
分布式Key-value漫谈
lovingprince58
 
PDF
Apache trafficserver
Din Dindin
 
PDF
JVM及其调优
zhongbing liu
 
PDF
Chasingice
冰 白
 
PPTX
Kafka in Depth
YI-CHING WU
 
PPTX
如何盡量避免 Throttling 在 K8s 中 (How to reduce throttling in k8s)
Kiwi Lee
 
PDF
大规模高性能计算集群优化.pdf
chachachat
 
PDF
服务器基准测试-叶金荣@CYOU-20121130
Jinrong Ye
 
PPTX
MySQL压力测试经验
Jinrong Ye
 
PPT
从林书豪到全明星 - 虎扑网技术架构如何化解流量高峰
Scourgen Hong
 
PPT
构建可扩展的微博系统
lonegunman
 
PPT
java title
lonegunman
 
PDF
主库自动切换 V2.0
jinqing zhu
 
How do we manage more than one thousand of Pegasus clusters - backend part
acelyc1112009
 
Hacking Nginx at Taobao
Joshua Zhu
 
My sql 5.6新特性深入剖析——innodb引擎
frogd
 
豆瓣网技术架构变迁
reinhardx
 
Nosql三步曲
84zhu
 
分布式Key Value Store漫谈
Tim Y
 
分布式Key-value漫谈
lovingprince58
 
Apache trafficserver
Din Dindin
 
JVM及其调优
zhongbing liu
 
Chasingice
冰 白
 
Kafka in Depth
YI-CHING WU
 
如何盡量避免 Throttling 在 K8s 中 (How to reduce throttling in k8s)
Kiwi Lee
 
大规模高性能计算集群优化.pdf
chachachat
 
服务器基准测试-叶金荣@CYOU-20121130
Jinrong Ye
 
MySQL压力测试经验
Jinrong Ye
 
从林书豪到全明星 - 虎扑网技术架构如何化解流量高峰
Scourgen Hong
 
构建可扩展的微博系统
lonegunman
 
java title
lonegunman
 
主库自动切换 V2.0
jinqing zhu
 
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 

MongoDB at Qihoo 360