版权声明:可以任意转载,转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明 (作者:张华 发表于:2018-08-30)
问题
cinder-volumes在删除volume时很慢, 并且此时create volume的其他操作也无反应, 也无日志.
理论
green thread的概念就是之间不共享数据, 每个green thread有自己的私有的数据对象, 并且非阻塞, 一个green thread的I/O没准备就下其他的green thread跟上. 从而实现在一个进程上高效地跑大量非阻塞的greenthread.
所以greenthread的哲学就是不共享数据对象的, 但greenthread也可以通过tpool.Proxy来共享数据对象, 例如: 使用eventlet.pools.Pool机制还构建httplib2.Http实例池在不同greenthread之间作一定程度共享 (见我五年前的一篇博客 - https://blog.csdn.net/quqi99/article/details/9114577 ). 但是这种共享有一个问题, 当pool里的native thread抛错了但不显示返回的话似乎会造成native thread无法yield从而导致所有green thread也被阻塞. 下面测试程序也可以说明这一点:
import thread
import eventlet
import time
orig = time
from eventlet import tpool
eventlet.monkey_patch()
class MyException(Exception):
pass
class FOO(object):
def foo(self, char, starting_ident):
id=thread.get_ident()
print "native {} exec foo({})".format(char, id)
try:
raise MyException()
finally:
# REPLACE with pass to reproduce failure
# return
pass
def stuff(char):
print "entering green thread"
while True:
print "green exec foo({})".format(char)
f = tpool.Proxy(FOO())
f.foo(char, thread.get_ident())
print "green finished foo({})".format(char)
time.sleep(1)
if __name__ == "__main__":
g = eventlet.greenthread.spawn(stuff, 'A')
g = eventlet.greenthread.spawn(stuff, 'B')
print "done"
while True:
time.sleep(1)
rados.Rados实例来自python-rbd, 它本身又会spawn native thread去连接rados.
每个green thread都实例化一个rados.Rados进而启动一个native thread的话. 这些native thread之间是通过python解释器进程来同步数据的, native thread也不是非阻塞的. 所以当一个native thread在运行长任务不yield的话, 其他的green threads都没有机会运行, 所以此时无法对image做任何操作. 所以有这个patch (https://review.openstack.org/#/c/175555/ ) 引入了tpool.Proxy.
但是这个patch (https://review.openstack.org/#/c/197710/)又将它revert了, 理由是spawn出来的non-block native thread都去导入python module会造成死锁(According to Python documentation, code can
lead to a deadlock if the spawned thread directly or indirectly attempts
to import a module. ). 所以在_connect_to_rados这块又回到之前每个green thread都会实例化一个rados.Rados进而启动一个native thread的老方式, 但可以通过配置rados_connect_timeout缓解之前的问题( 一个green thread在超时时间内还连不就yield让其他green thread运行).
但是只是将_connect_to_rados这块改回老方式了, 还有其他使用tppool的方式, 如_get_usage_info就是通过tpool的方式:
def RBDProxy(self):
return tpool.Proxy(self.rbd.RBD())
def _get_usage_info(self):
total_provisioned = 0
with RADOSClient(self) as client:
for t in self.RBDProxy().list(client.ioctx):
with RBDVolumeProxy(self, t, read_only=True) as v:
...
class RBDVolumeProxy(object):
def __init__(self, driver, name, pool=None, snapshot=None,
read_only=False, remote=None, timeout=None):
client, ioctx = driver._connect_to_rados(pool, remote, timeout)
if snapshot is not None:
snapshot = utils.convert_str(snapshot)
try:
self.volume = driver.rbd.Image(ioctx,
utils.convert_str(name),
snapshot=snapshot,
read_only=read_only)
self.volume = tpool.Proxy(self.volume)
except driver.rbd.Error:
LOG.exception("error opening rbd image %s", name)
driver._disconnect_from_rados(client, ioctx)
raise
如上代码:
- RBDProxy会通过tpool.Proxy采用非阻塞的native thread方式spawn rbd.RBD.
- RBDVolumeProxy也会通过tpool.Proxy采用非阻塞的native thread方式spawn rbd.RBD.
- _get_usage_info会周期性运行, 此时如果删除一个volume, 那么RBDVolumeProxy可能就会找不着image, 从而报下面的错: Image volume-1f3aa3d5-5639-4a68-be07-14f3214320c6 is not found. _get_usage_info /usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py
- 要知道 RBDVolumeProxy是一个native thread, 里面出了ImageNotFound异常又没有显示返回就会造成我们上面说的greenthread的问题. 这个green thread阻塞, 其他的green thread还能正常yield, 但其他green thread接着也会遇到这个ImageNotFound异常, 这样被阻塞的green thread越来越多 (如何验证了, 可以在上述测试代码里再加一个greenthread里跑一个死循环, 你会发现其他green thread死了不影响这个死循环的green thread)
- 解决方法就是最好设置rbd_exclusive_cinder_pool=True避免调用上面的_get_usage_info
- 我提了一个bug - https://bugs.launchpad.net/cinder/+bug/1789828
具体日志如下:
2018-08-29 06:57:41.604 1586622 DEBUG cinder.volume.drivers.rbd [req-f885ea4a-3e98-4aad-bb3f-102240aebca1 10cbcda6a7854fa79cfc37dc1945cb6d 5d5d0f0ab738467f8ca813dd41432afa - a51502c6e125414fbba0cc95decd86c5 a51502c6e125414fbba0cc95decd86c5] deleting rbd volume volume-2d5951be-25bf-4313-a706-593664f6cd2e delete_volume /usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py:977
2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd [req-f885ea4a-3e98-4aad-bb3f-102240aebca1 10cbcda6a7854fa79cfc37dc1945cb6d 5d5d0f0ab738467f8ca813dd41432afa - a51502c6e125414fbba0cc95decd86c5 a51502c6e125414fbba0cc95decd86c5] error opening rbd image volume-005a04e7-a113-4ebb-bd77-1d9d3221d8f2: ImageNotFound: [errno 2] error opening image volume-005a04e7-a113-4ebb-bd77-1d9d3221d8f2 at snapshot None
2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd Traceback (most recent call last):
2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py", line 147, in __init__
2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd read_only=read_only)
2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd File "rbd.pyx", line 1392, in rbd.Image.__init__ (/build/ceph-B2ToPL/ceph-12.2.4/obj-x86_64-linux-gnu/src/pybind/rbd/pyrex/rbd.c:13545)
2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd ImageNotFound: [errno 2] error opening image volume-005a04e7-a113-4ebb-bd77-1d9d3221d8f2 at snapshot None
2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd
2018-08-29 06:57:41.612 1586622 DEBUG cinder.volume.drivers.rbd [req-f885ea4a-3e98-4aad-bb3f-102240aebca1 10cbcda6a7854fa79cfc37dc1945cb6d 5d5d0f0ab738467f8ca813dd41432afa - a51502c6e125414fbba0cc95decd86c5 a51502c6e125414fbba0cc95decd86c5] Image volume-005a04e7-a113-4ebb-bd77-1d9d3221d8f2 is not found. _get_usage_info /usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py:409
测试
1, 创建100个volume的脚本
#!/bin/bash -eu
. ~/stsstack-bundles/novarcv3_project
openstack project list| grep " admin "| awk '{print $2}'| xargs -l openstack quota set --volumes 200
TOKEN="`openstack token issue| grep ' id '| awk '{print $4}'`"
c_ep="`curl -s -XGET -H "X-Auth-Token: $TOKEN" "$OS_AUTH_URL/auth/catalog"| jq --raw-output '.catalog[] | select(.name | contains("cinderv3")).endpoints[] | select(.interface | contains("admin")).url'`"
echo "Cinder endpoint is $c_ep"
for i in {0..100}; do
(payload="`cat << EOF | python | sed 's/"/\\"/g'
import json
vol = {"volume": { "size": 1, "name": "vol"}}
print json.dumps(vol)
EOF`"
curl -s -X POST -H "X-Auth-Token: $TOKEN" -H "Content-Type: application/json" -d "$payload" ${c_ep}/volumes) &
done
2, 删除这100个volume
openstack volume list| egrep -v "^\+-+|ID"| awk '{print $2}'| xargs openstack volume delete
3, ceph-mon节点上观察volume的个数
watch -n 1 'rbd -p cinder-ceph ls| wc -l'
4, cinder-volume节点上观察cinder-volume进程下创建的线程数
watch -n 1 'ps -eLf| egrep "[c]inder-volume"| wc -l'
20240402 -
eventlet实现了异步非阻塞 socket I/O, 但是需要在导入任何其他模块之前就运行‘eventlet.patcher.monkey_patch()’来避免它使用标准库(socket, time, threading等), 否则可能会阻塞一些线程,见:
https://github.com/lathiat/prometheus-openstack-exporter/commit/2c53de710594019ac9b3099c1a2af10980a443ac
20240925 - greenthread问题总结
每个greenthread本来应该有自己的数据副本,不应该共享对象。当一定要共享时:
1, 例如在greenthread之间共享socket时,由于它不是greensocket,就会有问题,此时将greenthread改回common thread还能利用thread自身语言级别的同步来避免问题。见:https://blog.csdn.net/quqi99/article/details/9114577
2, 一定要在greenhread之间non-green-object的话(eg: httplib2.Http),最好使用eventlet.pools.Pool, 它会确保这些object不会同时给greenthread用。
3, 即使用eventlet.pools.Pool来共享一些non-green-object(eg: tpool.Proxy)时也可能出问题,因为tpool.Proxy这个object下面又调用了native thread去连rados, 一旦native thread无法yield将导致所有greenthread被阻塞 - https://blog.csdn.net/quqi99/article/details/82220564
4, https://github.com/canonical/prometheus-openstack-exporter/issues/130
eventlet实现了异步非阻塞 socket I/O, 但是需要在导入任何其他模块之前就运行‘eventlet.patcher.monkey_patch()’来避免它使用标准库(socket, time, threading, SimpleHTTPServer, etc), 这样当有其他greenthread block时,就会有greenthread来yield让出时间片。
例:如果python cinderclient没有先运行上面的eventlet.patcher.monkey_patch(),就会出现在native thread中运行eventlet.sleep(这个也可用GreenPool替代, 并且不要用native的sleep(0))的情况,就会报: greenlet.error: cannot switch to a different thread
看这个代码实例 ( https://github.com/lathiat/prometheus-openstack-exporter/commit/0e57e4b7816492fd3bf152518798bde5e12de43f ), 它没有在导入任何其他模块之前就运行eventlet.patcher.monkey_patch(), 可能promethues-openstack-exporter中的DataGaterer task就会影响HTTPServer的connection从而造成访问prometheus-openstack-exporter慢 - https://github.com/canonical/prometheus-openstack-exporter/issues/130
除了prometheus-openstack-exporter, 一些其他的openstack projects, 像cinderclient, glance中都有发现使用了native的sleep(0), 它们都用了下面的代码,这样有的代码可以使用了greenthread有的又没使用混乱中就会产生问题。完整代码见:https://github.com/openstack/glance/commit/70e9690830653ed92d15c4a696cae8778c635206
+try:
+ from eventlet import sleep
+except ImportError:
+ from time import sleep
oslo.messaging也有这个问题,nova-compute通过oslo.messaging来给rabbit发heartbeat时也发生了'cannot switch to a different thread', 见:https://bugs.launchpad.net/oslo.messaging/+bug/1934937
1, 对于non-wsgi services应该设置 heartbeat_in_pthread=false
2, 这个patch根据是否'eventlet.wsgi' in sys.modules来自定设置heartbeat_in_pthread是true还是false, eventlet.wsgi在sys.modules说明是green环境就设置heartbeat_in_pthread=true - https://review.opendev.org/c/openstack/oslo.messaging/+/927624/1/oslo_messaging/_drivers/impl_rabbit.py
一个性能测试的方法:
1, 让到nova-api的请求hang 120秒
iptables -I OUTPUT -p tcp -m state --state NEW --dport 8774 -j DROP
2, 用siege产生一些并发
while true; do siege http://172.16.0.30:9183/metrics -t 5s -c 5 -d 0.1; done
20250220 - delete dirty volumes
#delete volume which the instance is
openstac volume list
openstack volume attachment list --os-volume-api-version 3.27 --volume-id ae972c52-a137-4305-b7e5-20312f416065
openstack volume attachment delete --os-volume-api-version 3.27 <aaa>
openstack volume delete ae972c52-a137-4305-b7e5-20312f416065
openstack volume list |grep -E 'avai|dev|attaching' |awk '{print $2}' |xargs -i openstack volume attachment list --os-volume-api-version 3.27 --volume-id {} |grep attached |awk '{print $2}' |xargs -i openstack volume attachment delete --os-volume-api-version 3.27 {}
openstack volume list |grep -E 'avai|dev|attaching' |awk '{print $2}' |xargs -i openstack volume delete {}
20250618 - nova-compute无响应
nova-compute也有这问题, 如果listCaps不放在tpoolProxy里的话会block all greenthreads, 进而cause nova-compute to freeze for hours - https://review.opendev.org/c/openstack/nova/+/939317