为什么cinder-volume在删除volume时无反应 (by quqi99)

quqi99

已于 2025-06-18 15:22:27 修改

阅读量1.6k

点赞数

CC 4.0 BY-SA版权

分类专栏： ceph evenlet greenthread 文章标签：数据库

于 2018-08-30 14:14:00 首次发布

本文链接：https://blog.csdn.net/quqi99/article/details/82220564

ceph 同时被 3 个专栏收录

7 篇文章

订阅专栏

evenlet

1 篇文章

订阅专栏

greenthread

1 篇文章

订阅专栏

博客围绕cinder-volumes删除volume慢及操作无反应问题展开。介绍了green thread概念，分析其共享数据对象时可能导致阻塞的原因，如native thread抛错不返回等。还提及相关patch的修改及问题，给出解决方法，如设置rbd_exclusive_cinder_pool=True，最后介绍了测试步骤。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

问题

cinder-volumes在删除volume时很慢, 并且此时create volume的其他操作也无反应, 也无日志.

理论

green thread的概念就是之间不共享数据, 每个green thread有自己的私有的数据对象, 并且非阻塞, 一个green thread的I/O没准备就下其他的green thread跟上. 从而实现在一个进程上高效地跑大量非阻塞的greenthread.
所以greenthread的哲学就是不共享数据对象的, 但greenthread也可以通过tpool.Proxy来共享数据对象, 例如: 使用eventlet.pools.Pool机制还构建httplib2.Http实例池在不同greenthread之间作一定程度共享 (见我五年前的一篇博客 - https://blog.csdn.net/quqi99/article/details/9114577 ). 但是这种共享有一个问题, 当pool里的native thread抛错了但不显示返回的话似乎会造成native thread无法yield从而导致所有green thread也被阻塞. 下面测试程序也可以说明这一点:

import thread
import eventlet
import time
orig = time
from eventlet import tpool

eventlet.monkey_patch()

class MyException(Exception):
    pass

class FOO(object):
    def foo(self, char, starting_ident):
        id=thread.get_ident()
        print "native {} exec foo({})".format(char, id)
        try:
            raise MyException()
        finally:
            # REPLACE with pass to reproduce failure
#             return
            pass

def stuff(char):
    print "entering green thread"
    while True:
        print "green exec foo({})".format(char)
        f = tpool.Proxy(FOO())
        f.foo(char, thread.get_ident())
        print "green finished foo({})".format(char)
        time.sleep(1)

if __name__ == "__main__":
    g = eventlet.greenthread.spawn(stuff, 'A')
    g = eventlet.greenthread.spawn(stuff, 'B')
    print "done"
    while True:
        time.sleep(1)

rados.Rados实例来自python-rbd, 它本身又会spawn native thread去连接rados.
每个green thread都实例化一个rados.Rados进而启动一个native thread的话. 这些native thread之间是通过python解释器进程来同步数据的, native thread也不是非阻塞的. 所以当一个native thread在运行长任务不yield的话, 其他的green threads都没有机会运行, 所以此时无法对image做任何操作. 所以有这个patch (https://review.openstack.org/#/c/175555/ ) 引入了tpool.Proxy.
但是这个patch (https://review.openstack.org/#/c/197710/)又将它revert了, 理由是spawn出来的non-block native thread都去导入python module会造成死锁(According to Python documentation, code can
lead to a deadlock if the spawned thread directly or indirectly attempts
to import a module. ). 所以在_connect_to_rados这块又回到之前每个green thread都会实例化一个rados.Rados进而启动一个native thread的老方式, 但可以通过配置rados_connect_timeout缓解之前的问题( 一个green thread在超时时间内还连不就yield让其他green thread运行).
但是只是将_connect_to_rados这块改回老方式了, 还有其他使用tppool的方式, 如_get_usage_info就是通过tpool的方式:

    def RBDProxy(self):
        return tpool.Proxy(self.rbd.RBD())
       
    def _get_usage_info(self):
        total_provisioned = 0
        with RADOSClient(self) as client:
            for t in self.RBDProxy().list(client.ioctx):
                with RBDVolumeProxy(self, t, read_only=True) as v:
                ...
class RBDVolumeProxy(object):
    def __init__(self, driver, name, pool=None, snapshot=None,
                 read_only=False, remote=None, timeout=None):
        client, ioctx = driver._connect_to_rados(pool, remote, timeout)
        if snapshot is not None:
            snapshot = utils.convert_str(snapshot)

        try:
            self.volume = driver.rbd.Image(ioctx,
                                           utils.convert_str(name),
                                           snapshot=snapshot,
                                           read_only=read_only)
            self.volume = tpool.Proxy(self.volume)
        except driver.rbd.Error:
            LOG.exception("error opening rbd image %s", name)
            driver._disconnect_from_rados(client, ioctx)
            raise

如上代码:

RBDProxy会通过tpool.Proxy采用非阻塞的native thread方式spawn rbd.RBD.
RBDVolumeProxy也会通过tpool.Proxy采用非阻塞的native thread方式spawn rbd.RBD.
_get_usage_info会周期性运行, 此时如果删除一个volume, 那么RBDVolumeProxy可能就会找不着image, 从而报下面的错: Image volume-1f3aa3d5-5639-4a68-be07-14f3214320c6 is not found. _get_usage_info /usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py
要知道 RBDVolumeProxy是一个native thread, 里面出了ImageNotFound异常又没有显示返回就会造成我们上面说的greenthread的问题. 这个green thread阻塞, 其他的green thread还能正常yield, 但其他green thread接着也会遇到这个ImageNotFound异常, 这样被阻塞的green thread越来越多 (如何验证了, 可以在上述测试代码里再加一个greenthread里跑一个死循环, 你会发现其他green thread死了不影响这个死循环的green thread)
解决方法就是最好设置rbd_exclusive_cinder_pool=True避免调用上面的_get_usage_info
我提了一个bug - https://bugs.launchpad.net/cinder/+bug/1789828
具体日志如下:

2018-08-29 06:57:41.604 1586622 DEBUG cinder.volume.drivers.rbd [req-f885ea4a-3e98-4aad-bb3f-102240aebca1 10cbcda6a7854fa79cfc37dc1945cb6d 5d5d0f0ab738467f8ca813dd41432afa - a51502c6e125414fbba0cc95decd86c5 a51502c6e125414fbba0cc95decd86c5] deleting rbd volume volume-2d5951be-25bf-4313-a706-593664f6cd2e delete_volume /usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py:977
2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd [req-f885ea4a-3e98-4aad-bb3f-102240aebca1 10cbcda6a7854fa79cfc37dc1945cb6d 5d5d0f0ab738467f8ca813dd41432afa - a51502c6e125414fbba0cc95decd86c5 a51502c6e125414fbba0cc95decd86c5] error opening rbd image volume-005a04e7-a113-4ebb-bd77-1d9d3221d8f2: ImageNotFound: [errno 2] error opening image volume-005a04e7-a113-4ebb-bd77-1d9d3221d8f2 at snapshot None
2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd Traceback (most recent call last):
2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd   File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py", line 147, in __init__
2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd     read_only=read_only)
2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd   File "rbd.pyx", line 1392, in rbd.Image.__init__ (/build/ceph-B2ToPL/ceph-12.2.4/obj-x86_64-linux-gnu/src/pybind/rbd/pyrex/rbd.c:13545)
2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd ImageNotFound: [errno 2] error opening image volume-005a04e7-a113-4ebb-bd77-1d9d3221d8f2 at snapshot None
2018-08-29 06:57:41.610 1586622 ERROR cinder.volume.drivers.rbd
2018-08-29 06:57:41.612 1586622 DEBUG cinder.volume.drivers.rbd [req-f885ea4a-3e98-4aad-bb3f-102240aebca1 10cbcda6a7854fa79cfc37dc1945cb6d 5d5d0f0ab738467f8ca813dd41432afa - a51502c6e125414fbba0cc95decd86c5 a51502c6e125414fbba0cc95decd86c5] Image volume-005a04e7-a113-4ebb-bd77-1d9d3221d8f2 is not found. _get_usage_info /usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py:409

测试

1, 创建100个volume的脚本

#!/bin/bash -eu
. ~/stsstack-bundles/novarcv3_project
openstack project list| grep " admin "| awk '{print $2}'| xargs -l openstack quota set --volumes 200

TOKEN="`openstack token issue| grep ' id '| awk '{print $4}'`"
c_ep="`curl -s -XGET -H "X-Auth-Token: $TOKEN" "$OS_AUTH_URL/auth/catalog"| jq --raw-output '.catalog[] | select(.name | contains("cinderv3")).endpoints[] | select(.interface | contains("admin")).url'`"
echo "Cinder endpoint is $c_ep"

for i in {0..100}; do
(payload="`cat << EOF | python | sed 's/"/\\"/g'
import json
vol = {"volume": { "size": 1, "name": "vol"}}
print json.dumps(vol)
EOF`"
curl -s -X POST -H "X-Auth-Token: $TOKEN" -H "Content-Type: application/json" -d "$payload" ${c_ep}/volumes) &
done

2, 删除这100个volume

openstack volume list| egrep -v "^\+-+|ID"| awk '{print $2}'| xargs openstack volume delete

3, ceph-mon节点上观察volume的个数

watch -n 1 'rbd -p cinder-ceph ls| wc -l'

4, cinder-volume节点上观察cinder-volume进程下创建的线程数

watch -n 1 'ps -eLf| egrep "[c]inder-volume"| wc -l'

20240402 -

eventlet实现了异步非阻塞 socket I/O, 但是需要在导入任何其他模块之前就运行‘eventlet.patcher.monkey_patch()’来避免它使用标准库(socket, time, threading等）, 否则可能会阻塞一些线程，见：
https://github.com/lathiat/prometheus-openstack-exporter/commit/2c53de710594019ac9b3099c1a2af10980a443ac

20240925 - greenthread问题总结

每个greenthread本来应该有自己的数据副本，不应该共享对象。当一定要共享时：
1, 例如在greenthread之间共享socket时，由于它不是greensocket，就会有问题，此时将greenthread改回common thread还能利用thread自身语言级别的同步来避免问题。见：https://blog.csdn.net/quqi99/article/details/9114577
2, 一定要在greenhread之间non-green-object的话(eg: httplib2.Http)，最好使用eventlet.pools.Pool, 它会确保这些object不会同时给greenthread用。
3, 即使用eventlet.pools.Pool来共享一些non-green-object(eg: tpool.Proxy)时也可能出问题，因为tpool.Proxy这个object下面又调用了native thread去连rados, 一旦native thread无法yield将导致所有greenthread被阻塞 - https://blog.csdn.net/quqi99/article/details/82220564
4, https://github.com/canonical/prometheus-openstack-exporter/issues/130

eventlet实现了异步非阻塞 socket I/O, 但是需要在导入任何其他模块之前就运行‘eventlet.patcher.monkey_patch()’来避免它使用标准库(socket, time, threading, SimpleHTTPServer, etc), 这样当有其他greenthread block时，就会有greenthread来yield让出时间片。
例：如果python cinderclient没有先运行上面的eventlet.patcher.monkey_patch()，就会出现在native thread中运行eventlet.sleep(这个也可用GreenPool替代, 并且不要用native的sleep(0)）的情况，就会报： greenlet.error: cannot switch to a different thread

看这个代码实例 ( https://github.com/lathiat/prometheus-openstack-exporter/commit/0e57e4b7816492fd3bf152518798bde5e12de43f ), 它没有在导入任何其他模块之前就运行eventlet.patcher.monkey_patch(), 可能promethues-openstack-exporter中的DataGaterer task就会影响HTTPServer的connection从而造成访问prometheus-openstack-exporter慢 - https://github.com/canonical/prometheus-openstack-exporter/issues/130

除了prometheus-openstack-exporter, 一些其他的openstack projects, 像cinderclient, glance中都有发现使用了native的sleep(0), 它们都用了下面的代码，这样有的代码可以使用了greenthread有的又没使用混乱中就会产生问题。完整代码见：https://github.com/openstack/glance/commit/70e9690830653ed92d15c4a696cae8778c635206
+try:
+    from eventlet import sleep
+except ImportError:
+    from time import sleep

oslo.messaging也有这个问题，nova-compute通过oslo.messaging来给rabbit发heartbeat时也发生了'cannot switch to a different thread', 见：https://bugs.launchpad.net/oslo.messaging/+bug/1934937
1, 对于non-wsgi services应该设置 heartbeat_in_pthread=false
2, 这个patch根据是否'eventlet.wsgi' in sys.modules来自定设置heartbeat_in_pthread是true还是false, eventlet.wsgi在sys.modules说明是green环境就设置heartbeat_in_pthread=true - https://review.opendev.org/c/openstack/oslo.messaging/+/927624/1/oslo_messaging/_drivers/impl_rabbit.py

一个性能测试的方法：
1, 让到nova-api的请求hang 120秒
iptables -I OUTPUT -p tcp -m state --state NEW --dport 8774 -j DROP
2, 用siege产生一些并发
while true; do siege http://172.16.0.30:9183/metrics -t 5s -c 5 -d 0.1; done

20250220 - delete dirty volumes

#delete volume which the instance is                                  
openstac volume list                                                  
openstack volume attachment list --os-volume-api-version 3.27 --volume-id ae972c52-a137-4305-b7e5-20312f416065
openstack volume attachment delete --os-volume-api-version 3.27 <aaa> 
openstack volume delete ae972c52-a137-4305-b7e5-20312f416065          
                                                                      
openstack volume list |grep -E 'avai|dev|attaching' |awk '{print $2}' |xargs -i openstack volume attachment list --os-volume-api-version 3.27 --volume-id {} |grep attached |awk '{print $2}' |xargs -i openstack volume attachment delete --os-volume-api-version 3.27 {}
openstack volume list |grep -E 'avai|dev|attaching' |awk '{print $2}' |xargs -i openstack volume delete {}

20250618 - nova-compute无响应

nova-compute也有这问题, 如果listCaps不放在tpoolProxy里的话会block all greenthreads, 进而cause nova-compute to freeze for hours - https://review.opendev.org/c/openstack/nova/+/939317