活动介绍

neuron: cannot import name 'MachAr' from 'numpy' (D:\Anaconda\envs\SNN\lib\site-packages\numpy\__init__.py) spikingjelly.clock_driven.cu_kernel_opt: cannot import name 'MachAr' from 'numpy' (D:\Anaconda\envs\SNN\lib\site-packages\numpy\__init__.py) spikingjelly.clock_driven.tensor_cache: cannot import name 'MachAr' from 'numpy' (D:\Anaconda\envs\SNN\lib\site-packages\numpy\__init__.py) spikingjelly.clock_driven.spike_op: cannot import name 'MachAr' from 'numpy' (D:\Anaconda\envs\SNN\lib\site-packages\numpy\__init__.py)

时间: 2025-06-15 14:04:42 浏览: 41
<think>好的,我现在需要解决用户在Python中遇到的导入错误:cannot import name 'MachAr' from 'numpy'。用户提到他们使用的是SpikingJelly SNN和Anaconda环境。首先,我得先弄清楚这个错误的原因。 首先,MachAr是numpy中的一个函数,可能在较新版本中被移除了或者改名了。用户可能在代码中引用了numpy的MachAr,但当前安装的numpy版本里没有这个函数。根据之前的引用内容,比如引用[1]和引用[2],当numpy版本过高时,某些函数的位置或名称会发生变化,导致导入错误。比如_validate_lengths在旧版本和新版本中的位置不同,用户需要根据版本调整导入方式。类似的问题可能出现在MachAr上。 接下来,我需要确认MachAr在numpy中的情况。查阅numpy的文档,MachAr属于numpy的旧版模块,可能在较新的版本中被弃用或移动到其他子模块。比如,可能在numpy 1.20之后的版本中,MachAr被移除了。所以,如果用户安装的是较新的numpy版本,比如1.20以上,就可能出现这个错误。 解决方案可能包括两个方向:降级numpy到兼容的版本,或者修改代码中的导入语句以适应新版本的numpy。根据引用[3]中的情况,类似的问题有时是由于依赖库版本不匹配,比如需要特定版本的numpy-mkl库,但这里的问题可能与MachAr相关,而不是MKL。 首先检查MachAr在numpy中的存在情况。比如,在numpy 1.19.5中,MachAr位于numpy模块中,可以通过from numpy import MachAr导入。而在numpy 1.20及以上版本中,MachAr可能被移到其他子模块,比如numpy.core或完全被移除,改用其他函数替代。 所以,可能的解决步骤: 1. 检查当前numpy版本。用户可以使用conda list numpy查看已安装的版本。 2. 如果版本较新(如>=1.20),尝试降级到1.19.x版本,因为MachAr可能在1.20中被弃用。 3. 或者,如果代码允许,修改导入语句,比如尝试从numpy.core导入MachAr,或者使用替代方法。 另外,考虑到用户使用的是SpikingJelly库,可能需要检查该库是否对numpy版本有特定要求。如果SpikingJelly依赖较新的numpy版本,降级可能不可行,这时需要找到MachAr的替代方式或者修改库的源代码。 具体步骤建议: - 降级numpy到1.19.5:conda install numpy=1.19.5,或者通过pip install numpy==1.19.5。 - 如果降级后问题解决,说明是版本兼容性问题。 - 如果无法降级,可能需要查找MachAr在新版本中的替代方案,或者修改代码中的导入路径。例如,某些情况下,函数可能被移动到numpy.core._machar中,或者使用其他函数如numpy.finfo来代替MachAr的功能。 同时,参考之前的引用[1]和[2],用户可能需要修改导入部分的代码,根据numpy版本进行条件导入。例如: 尝试在代码中将导入语句改为: ```python import numpy as np from distutils.version import LooseVersion as Version old_numpy = Version(np.__version__) < Version('1.20') if old_numpy: from numpy import MachAr else: # 新版本中的替代方法,或者从其他子模块导入 from numpy.core import MachAr ``` 不过,需要确认在新版本中MachAr是否仍然存在。如果不存在,可能需要其他解决方法,比如使用numpy.finfo或查找是否MachAr的功能已经被其他函数替代。 如果用户无法修改库的源代码(比如SpikingJelly的代码),可能需要强制使用兼容的numpy版本。或者联系库的维护者,反馈该问题,以获取支持新numpy版本的更新。 总结,解决方案步骤: 1. 检查并降级numpy到1.19.x。 2. 如果不行,检查代码中导入MachAr的部分,根据numpy版本调整导入方式。 3. 查找是否有替代函数可用,避免使用MachAr。 4. 如果问题出现在第三方库中,考虑更新该库或提交issue。</think>要解决Python中导入`MachAr`时出现的错误`cannot import name 'MachAr' from 'numpy'`,可以按照以下步骤操作: ### 错误原因分析 `MachAr`是NumPy中用于分析浮点数属性的函数。在NumPy 1.20及以上版本中,`MachAr`可能已被弃用或移除[^1],导致导入失败。该错误常见于依赖旧版NumPy的代码或库(如部分科学计算工具包)。 --- ### 解决方案 #### 方法1:降级NumPy版本 1. **查看当前NumPy版本**: ```bash conda list numpy # 或 pip show numpy ``` 2. **安装兼容版本**: ```bash conda install numpy=1.19.5 # 通过Anaconda # 或 pip install numpy==1.19.5 # 通过pip ``` **注意**:若其他库依赖高版本NumPy,降级可能导致兼容性问题。建议在虚拟环境中操作。 #### 方法2:修改代码导入逻辑 若无法降级,可尝试根据NumPy版本动态导入`MachAr`: ```python import numpy as np from distutils.version import LooseVersion as Version old_numpy = Version(np.__version__) < Version('1.20') if old_numpy: from numpy import MachAr else: # 替代方案(需验证是否可行) from numpy.core._machar import MachAr # 部分版本可能在此路径 ``` **注意**:需确认新版本中`MachAr`的实际路径,若已完全移除,则需进一步调整。 #### 方法3:使用替代函数 如果`MachAr`的功能可通过其他函数实现(如`numpy.finfo`),可修改代码逻辑: ```python import numpy as np # 使用finfo获取浮点数信息 float_info = np.finfo(np.float64) print(float_info.eps) # 输出机器精度 ``` --- ### 验证解决效果 重新运行代码,确认错误是否消失。若仍存在问题,可检查: 1. 虚拟环境中NumPy版本是否正确。 2. 代码中所有`MachAr`引用是否已替换。 ---
阅读全文

相关推荐

(unsloth) root@autodl-container-578c43a3a7-817faf39:~/autodl-tmp/llm_train/src# vllm serve ./ckpts/qwen3-0.6b --enable-auto-tool-choice --tool-call-parser hermes INFO 07-31 23:06:38 [__init__.py:235] Automatically detected platform cuda. INFO 07-31 23:06:41 [api_server.py:1755] vLLM API server version 0.10.0 INFO 07-31 23:06:41 [cli_args.py:261] non-default args: {'model_tag': './ckpts/qwen3-0.6b', 'enable_auto_tool_choice': True, 'tool_call_parser': 'hermes', 'model': './ckpts/qwen3-0.6b'} INFO 07-31 23:06:49 [config.py:1604] Using max model len 40960 WARNING 07-31 23:06:49 [config.py:1084] bitsandbytes quantization is not fully optimized yet. The speed can be slower than non-quantized models. INFO 07-31 23:06:49 [config.py:2434] Chunked prefill is enabled with max_num_batched_tokens=2048. INFO 07-31 23:06:54 [__init__.py:235] Automatically detected platform cuda. INFO 07-31 23:06:57 [core.py:572] Waiting for init message from front-end. INFO 07-31 23:06:57 [core.py:71] Initializing a V1 LLM engine (v0.10.0) with config: model='./ckpts/qwen3-0.6b', speculative_config=None, tokenizer='./ckpts/qwen3-0.6b', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=40960, download_dir=None, load_format=LoadFormat.BITSANDBYTES, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=bitsandbytes, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=./ckpts/qwen3-0.6b, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null} INFO 07-31 23:06:58 [parallel_state.py:1102] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 WARNING 07-31 23:06:58 [topk_topp_sampler.py:59] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer. INFO 07-31 23:06:58 [gpu_model_runner.py:1843] Starting to load model ./ckpts/qwen3-0.6b... INFO 07-31 23:06:58 [gpu_model_runner.py:1875] Loading model from scratch... INFO 07-31 23:06:58 [cuda.py:290] Using Flash Attention backend on V1 engine. INFO 07-31 23:06:58 [bitsandbytes_loader.py:733] Loading weights with BitsAndBytes quantization. May take a while ... Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 31.45it/s] Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s] ERROR 07-31 23:06:59 [core.py:632] EngineCore failed to start. ERROR 07-31 23:06:59 [core.py:632] Traceback (most recent call last): ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 623, in run_engine_core ERROR 07-31 23:06:59 [core.py:632] engine_core = EngineCoreProc(*args, **kwargs) ERROR 07-31 23:06:59 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 441, in __init__ ERROR 07-31 23:06:59 [core.py:632] super().__init__(vllm_config, executor_class, log_stats, ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 77, in __init__ ERROR 07-31 23:06:59 [core.py:632] self.model_executor = executor_class(vllm_config) ERROR 07-31 23:06:59 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 53, in __init__ ERROR 07-31 23:06:59 [core.py:632] self._init_executor() ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 49, in _init_executor ERROR 07-31 23:06:59 [core.py:632] self.collective_rpc("load_model") ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc ERROR 07-31 23:06:59 [core.py:632] answer = run_method(self.driver_worker, method, args, kwargs) ERROR 07-31 23:06:59 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/utils/__init__.py", line 2985, in run_method ERROR 07-31 23:06:59 [core.py:632] return func(*args, **kwargs) ERROR 07-31 23:06:59 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^ ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 201, in load_model ERROR 07-31 23:06:59 [core.py:632] self.model_runner.load_model(eep_scale_up=eep_scale_up) ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1876, in load_model ERROR 07-31 23:06:59 [core.py:632] self.model = model_loader.load_model( ERROR 07-31 23:06:59 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/model_loader/base_loader.py", line 49, in load_model ERROR 07-31 23:06:59 [core.py:632] self.load_weights(model, model_config) ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/model_loader/bitsandbytes_loader.py", line 741, in load_weights ERROR 07-31 23:06:59 [core.py:632] loaded_weights = model.load_weights(qweight_iterator) ERROR 07-31 23:06:59 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/models/qwen3.py", line 321, in load_weights ERROR 07-31 23:06:59 [core.py:632] return loader.load_weights(weights) ERROR 07-31 23:06:59 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 291, in load_weights ERROR 07-31 23:06:59 [core.py:632] autoloaded_weights = set(self._load_module("", self.module, weights)) ERROR 07-31 23:06:59 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 249, in _load_module ERROR 07-31 23:06:59 [core.py:632] yield from self._load_module(prefix, ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 222, in _load_module ERROR 07-31 23:06:59 [core.py:632] loaded_params = module_load_weights(weights) ERROR 07-31 23:06:59 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 404, in load_weights ERROR 07-31 23:06:59 [core.py:632] weight_loader(param, loaded_weight, shard_id) ERROR 07-31 23:06:59 [core.py:632] File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 806, in weight_loader ERROR 07-31 23:06:59 [core.py:632] assert param_data.shape == loaded_weight.shape ERROR 07-31 23:06:59 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-31 23:06:59 [core.py:632] AssertionError Process EngineCore_0: Traceback (most recent call last): File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 636, in run_engine_core raise e File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 623, in run_engine_core engine_core = EngineCoreProc(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 441, in __init__ super().__init__(vllm_config, executor_class, log_stats, File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 77, in __init__ self.model_executor = executor_class(vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 53, in __init__ self._init_executor() File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 49, in _init_executor self.collective_rpc("load_model") File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc answer = run_method(self.driver_worker, method, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/utils/__init__.py", line 2985, in run_method return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 201, in load_model self.model_runner.load_model(eep_scale_up=eep_scale_up) File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1876, in load_model self.model = model_loader.load_model( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/model_loader/base_loader.py", line 49, in load_model self.load_weights(model, model_config) File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/model_loader/bitsandbytes_loader.py", line 741, in load_weights loaded_weights = model.load_weights(qweight_iterator) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/models/qwen3.py", line 321, in load_weights return loader.load_weights(weights) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 291, in load_weights autoloaded_weights = set(self._load_module("", self.module, weights)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 249, in _load_module yield from self._load_module(prefix, File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 222, in _load_module loaded_params = module_load_weights(weights) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 404, in load_weights weight_loader(param, loaded_weight, shard_id) File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 806, in weight_loader assert param_data.shape == loaded_weight.shape ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s] [rank0]:[W731 23:07:00.371522631 ProcessGroupNCCL.cpp:1479] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) Traceback (most recent call last): File "/root/autodl-tmp/miniconda3/envs/unsloth/bin/vllm", line 8, in <module> sys.exit(main()) ^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 54, in main args.dispatch_function(args) File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 52, in cmd uvloop.run(run_server(args)) File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/uvloop/__init__.py", line 105, in run return runner.run(wrapper()) ^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper return await main ^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1791, in run_server await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1811, in run_server_worker async with build_async_engine_client(args, client_config) as engine_client: File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 158, in build_async_engine_client async with build_async_engine_client_from_engine_args( File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 194, in build_async_engine_client_from_engine_args async_llm = AsyncLLM.from_vllm_config( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 163, in from_vllm_config return cls( ^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 117, in __init__ self.engine_core = EngineCoreClient.make_async_mp_client( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 98, in make_async_mp_client return AsyncMPClient(*client_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 677, in __init__ super().__init__( File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 408, in __init__ with launch_core_engines(vllm_config, executor_class, File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/contextlib.py", line 144, in __exit__ next(self.gen) File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/utils.py", line 697, in launch_core_engines wait_for_engine_startup( File "/root/autodl-tmp/miniconda3/envs/unsloth/lib/python3.11/site-packages/vllm/v1/engine/utils.py", line 750, in wait_for_engine_startup raise RuntimeError("Engine core initialization failed. " RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

(vllm) zhzx@zhzx-S2600WF-LS:/media/zhzx/ssd2/Qwen3-32B$ vllm serve "/media/zhzx/ssd2/Qwen3-32B" --host 0.0.0.0 --port 8060 --dtype bfloat16 --tensor-parallel-size 2 --cpu-offload-gb 20 --gpu-memory-utilization 0.8 --max-model-len 8126 --api-key token-abc123 --enable-prefix-caching --trust-remote-code \ > INFO 05-10 20:06:33 [__init__.py:239] Automatically detected platform cuda. INFO 05-10 20:06:36 [api_server.py:1043] vLLM API server version 0.8.5.post1 INFO 05-10 20:06:36 [api_server.py:1044] args: Namespace(subparser='serve', model_tag='/media/zhzx/ssd2/Qwen3-32B', config='', host='0.0.0.0', port=8060, uvicorn_log_level='info', disable_uvicorn_access_log=False, allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key='token-abc123', lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, enable_ssl_refresh=False, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='', model='/media/zhzx/ssd2/Qwen3-32B', task='auto', tokenizer=None, hf_config_path=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, allowed_local_media_path=None, load_format='auto', download_dir=None, model_loader_extra_config={}, use_tqdm_on_load=True, config_format=<ConfigFormat.AUTO: 'auto'>, dtype='bfloat16', max_model_len=8126, guided_decoding_backend='auto', reasoning_parser=None, logits_processor_pattern=None, model_impl='auto', distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=2, data_parallel_size=1, enable_expert_parallel=False, max_parallel_loading_workers=None, ray_workers_use_nsight=False, disable_custom_all_reduce=False, block_size=None, gpu_memory_utilization=0.8, swap_space=4, kv_cache_dtype='auto', num_gpu_blocks_override=None, enable_prefix_caching=True, prefix_caching_hash_algo='builtin', cpu_offload_gb=20.0, calculate_kv_scales=False, disable_sliding_window=False, use_v2_block_manager=True, seed=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_token=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config={}, limit_mm_per_prompt={}, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=None, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=None, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', speculative_config=None, ignore_patterns=[], served_model_name=None, qlora_adapter_name_or_path=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, max_num_batched_tokens=None, max_num_seqs=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, num_lookahead_slots=0, scheduler_delay_factor=0.0, preemption_mode=None, num_scheduler_steps=1, multi_step_stream_outputs=True, scheduling_policy='fcfs', enable_chunked_prefill=None, disable_chunked_mm_input=False, scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', worker_extension_cls='', generation_config='auto', override_generation_config=None, enable_sleep_mode=False, additional_config=None, enable_reasoning=False, disable_cascade_attn=False, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, enable_server_load_tracking=False, dispatch_function=<function ServeSubcommand.cmd at 0x7f625a071bc0>) WARNING 05-10 20:06:36 [config.py:2972] Casting torch.float16 to torch.bfloat16. INFO 05-10 20:06:42 [config.py:717] This model supports multiple tasks: {'reward', 'embed', 'generate', 'classify', 'score'}. Defaulting to 'generate'. WARNING 05-10 20:06:42 [arg_utils.py:1658] Compute Capability < 8.0 is not supported by the V1 Engine. Falling back to V0. INFO 05-10 20:06:42 [config.py:1770] Defaulting to use mp for distributed inference INFO 05-10 20:06:42 [api_server.py:246] Started engine process with PID 73230 INFO 05-10 20:06:45 [__init__.py:239] Automatically detected platform cuda. INFO 05-10 20:06:47 [llm_engine.py:240] Initializing a V0 LLM engine (v0.8.5.post1) with config: model='/media/zhzx/ssd2/Qwen3-32B', speculative_config=None, tokenizer='/media/zhzx/ssd2/Qwen3-32B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=8126, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='auto', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=/media/zhzx/ssd2/Qwen3-32B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=True, WARNING 05-10 20:06:48 [multiproc_worker_utils.py:306] Reducing Torch parallelism from 16 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed. INFO 05-10 20:06:48 [cuda.py:240] Cannot use FlashAttention-2 backend for Volta and Turing GPUs. INFO 05-10 20:06:48 [cuda.py:289] Using XFormers backend. INFO 05-10 20:06:50 [__init__.py:239] Automatically detected platform cuda. (VllmWorkerProcess pid=73270) INFO 05-10 20:06:53 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks (VllmWorkerProcess pid=73270) INFO 05-10 20:06:53 [cuda.py:240] Cannot use FlashAttention-2 backend for Volta and Turing GPUs. (VllmWorkerProcess pid=73270) INFO 05-10 20:06:53 [cuda.py:289] Using XFormers backend. ERROR 05-10 20:06:53 [engine.py:448] Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Quadro RTX 6000 GPU has compute capability 7.5. You can use float16 instead by explicitly setting the dtype flag in CLI, for example: --dtype=half. ERROR 05-10 20:06:53 [engine.py:448] Traceback (most recent call last): ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 436, in run_mp_engine ERROR 05-10 20:06:53 [engine.py:448] engine = MQLLMEngine.from_vllm_config( ERROR 05-10 20:06:53 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 128, in from_vllm_config ERROR 05-10 20:06:53 [engine.py:448] return cls( ERROR 05-10 20:06:53 [engine.py:448] ^^^^ ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 82, in __init__ ERROR 05-10 20:06:53 [engine.py:448] self.engine = LLMEngine(*args, **kwargs) ERROR 05-10 20:06:53 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 275, in __init__ ERROR 05-10 20:06:53 [engine.py:448] self.model_executor = executor_class(vllm_config=vllm_config) ERROR 05-10 20:06:53 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 286, in __init__ ERROR 05-10 20:06:53 [engine.py:448] super().__init__(*args, **kwargs) ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 52, in __init__ ERROR 05-10 20:06:53 [engine.py:448] self._init_executor() ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py", line 124, in _init_executor ERROR 05-10 20:06:53 [engine.py:448] self._run_workers("init_device") ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers ERROR 05-10 20:06:53 [engine.py:448] driver_worker_output = run_method(self.driver_worker, sent_method, ERROR 05-10 20:06:53 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/utils.py", line 2456, in run_method ERROR 05-10 20:06:53 [engine.py:448] return func(*args, **kwargs) ERROR 05-10 20:06:53 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^ ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 604, in init_device ERROR 05-10 20:06:53 [engine.py:448] self.worker.init_device() # type: ignore ERROR 05-10 20:06:53 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker.py", line 177, in init_device ERROR 05-10 20:06:53 [engine.py:448] _check_if_gpu_supports_dtype(self.model_config.dtype) ERROR 05-10 20:06:53 [engine.py:448] File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker.py", line 546, in _check_if_gpu_supports_dtype ERROR 05-10 20:06:53 [engine.py:448] raise ValueError( ERROR 05-10 20:06:53 [engine.py:448] ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Quadro RTX 6000 GPU has compute capability 7.5. You can use float16 instead by explicitly setting the dtype flag in CLI, for example: --dtype=half. Process SpawnProcess-1: ERROR 05-10 20:06:53 [multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 73270 died, exit code: -15 INFO 05-10 20:06:53 [multiproc_worker_utils.py:124] Killing local vLLM worker processes Traceback (most recent call last): File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 450, in run_mp_engine raise e File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 436, in run_mp_engine engine = MQLLMEngine.from_vllm_config( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 128, in from_vllm_config return cls( ^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 82, in __init__ self.engine = LLMEngine(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 275, in __init__ self.model_executor = executor_class(vllm_config=vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 286, in __init__ super().__init__(*args, **kwargs) File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 52, in __init__ self._init_executor() File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py", line 124, in _init_executor self._run_workers("init_device") File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers driver_worker_output = run_method(self.driver_worker, sent_method, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/utils.py", line 2456, in run_method return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 604, in init_device self.worker.init_device() # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker.py", line 177, in init_device _check_if_gpu_supports_dtype(self.model_config.dtype) File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/worker/worker.py", line 546, in _check_if_gpu_supports_dtype raise ValueError( ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Quadro RTX 6000 GPU has compute capability 7.5. You can use float16 instead by explicitly setting the dtype flag in CLI, for example: --dtype=half. Traceback (most recent call last): File "/home/zhzx/miniconda3/envs/vllm/bin/vllm", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 53, in main args.dispatch_function(args) File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 27, in cmd uvloop.run(run_server(args)) File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/uvloop/__init__.py", line 109, in run return __asyncio.run( ^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/asyncio/runners.py", line 195, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/uvloop/__init__.py", line 61, in wrapper return await main ^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1078, in run_server async with build_async_engine_client(args) as engine_client: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 146, in build_async_engine_client async with build_async_engine_client_from_engine_args( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 269, in build_async_engine_client_from_engine_args raise RuntimeError( RuntimeError: Engine process failed to start. See stack trace for the root cause.

运行CUDA_VISIBLE_DEVICES=0,2 dbgpt start webserver --config /app/configs/dbgpt-local-vllm.toml报错如下: =========================== VLLMDeployModelParameters =========================== name: DeepSeek-R1-Distill-Qwen-32B provider: vllm verbose: False concurrency: 100 backend: None prompt_template: None context_length: None reasoning_model: None path: models/DeepSeek-R1-Distill-Qwen-32B device: auto trust_remote_code: True download_dir: None load_format: auto config_format: auto dtype: auto kv_cache_dtype: auto seed: 0 max_model_len: None distributed_executor_backend: None pipeline_parallel_size: 1 tensor_parallel_size: 1 max_parallel_loading_workers: None block_size: None enable_prefix_caching: None swap_space: 4.0 cpu_offload_gb: 0.0 gpu_memory_utilization: 0.9 max_num_batched_tokens: None max_num_seqs: 2 max_logprobs: 20 revision: None code_revision: None tokenizer_revision: None tokenizer_mode: auto quantization: fp8 max_seq_len_to_capture: 8192 worker_cls: auto extras: None ====================================================================== 2025-08-05 07:43:32 1249bbe41ac7 dbgpt.util.code.server[4248] INFO Code server is ready INFO 08-05 07:43:36 config.py:520] This model supports multiple tasks: {'score', 'classify', 'generate', 'reward', 'embed'}. Defaulting to 'generate'. WARNING 08-05 07:43:36 arg_utils.py:1107] Chunked prefill is enabled by default for models with max_model_len > 32K. Currently, chunked prefill might not work with some features or models. If you encounter any issues, please disable chunked prefill by setting --enable-chunked-prefill=False. INFO 08-05 07:43:36 config.py:1483] Chunked prefill is enabled with max_num_batched_tokens=2048. INFO 08-05 07:43:37 llm_engine.py:232] Initializing an LLM engine (v0.7.0) with config: model='/app/models/DeepSeek-R1-Distill-Qwen-32B', speculative_config=None, tokenizer='/app/models/DeepSeek-R1-Distill-Qwen-32B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/app/models/DeepSeek-R1-Distill-Qwen-32B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=True, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[2,1],"max_capture_size":2}, use_cached_outputs=False, INFO 08-05 07:43:37 cuda.py:225] Using Flash Attention backend. INFO 08-05 07:43:37 model_runner.py:1110] Starting to load model /app/models/DeepSeek-R1-Distill-Qwen-32B... /app/packages/dbgpt-core/src/dbgpt/util/model_utils.py:27: UserWarning: 'has_mps' is deprecated, please use 'torch.backends.mps.is_built()' if (hasattr(backends, "mps") and backends.mps.is_built()) or torch.has_mps: 2025-08-05 07:43:38 1249bbe41ac7 dbgpt.util.model_utils[4248] INFO Clear torch cache of device: cuda:0 2025-08-05 07:43:38 1249bbe41ac7 dbgpt.util.model_utils[4248] INFO Clear torch cache of device: cuda:1 2025-08-05 07:43:38 1249bbe41ac7 dbgpt.model.cluster.worker.embedding_worker[4248] INFO Load embeddings model: bge-large-zh-v1.5 2025-08-05 07:43:38 1249bbe41ac7 datasets[4248] INFO PyTorch version 2.5.1+cu121 available. 2025-08-05 07:43:38 1249bbe41ac7 datasets[4248] INFO Duckdb version 1.2.0 available. 2025-08-05 07:43:38 1249bbe41ac7 sentence_transformers.SentenceTransformer[4248] INFO Use pytorch device_name: cuda 2025-08-05 07:43:38 1249bbe41ac7 sentence_transformers.SentenceTransformer[4248] INFO Load pretrained SentenceTransformer: /app/models/bge-large-zh-v1.5 INFO: 127.0.0.1:42180 - "POST /api/controller/models HTTP/1.1" 200 OK 2025-08-05 07:43:39 1249bbe41ac7 dbgpt.model.cluster.worker.manager[4248] ERROR Error starting worker manager: model DeepSeek-R1-Distill-Qwen-32B@vllm(172.17.0.2:8002) start failed, Traceback (most recent call last): File "/app/packages/dbgpt-core/src/dbgpt/model/cluster/worker/manager.py", line 631, in _start_worker await self.run_blocking_func( File "/app/packages/dbgpt-core/src/dbgpt/model/cluster/worker/manager.py", line 146, in run_blocking_func return await loop.run_in_executor(self.executor, func, *args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/packages/dbgpt-core/src/dbgpt/model/cluster/worker/default_worker.py", line 122, in start self.model, self.tokenizer = self.ml.loader_with_params( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/packages/dbgpt-core/src/dbgpt/model/adapter/loader.py", line 70, in loader_with_params return llm_adapter.load_from_params(model_params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/packages/dbgpt-core/src/dbgpt/model/adapter/vllm_adapter.py", line 488, in load_from_params engine = AsyncLLMEngine.from_engine_args(engine_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//opt/.uv.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 642, in from_engine_args engine = cls( ^^^^ File "//opt/.uv.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 592, in __init__ self.engine = self._engine_class(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//opt/.uv.venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 265, in __init__ super().__init__(*args, **kwargs) File "//opt/.uv.venv/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 271, in __init__ self.model_executor = executor_class(vllm_config=vllm_config, ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//opt/.uv.venv/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 49, in __init__ self._init_executor() File "//opt/.uv.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 40, in _init_executor self.collective_rpc("load_model") File "//opt/.uv.venv/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 49, in collective_rpc answer = run_method(self.driver_worker, method, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//opt/.uv.venv/lib/python3.11/site-packages/vllm/utils.py", line 2208, in run_method return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "//opt/.uv.venv/lib/python3.11/site-packages/vllm/worker/worker.py", line 182, in load_model self.model_runner.load_model() File "//opt/.uv.venv/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1112, in load_model self.model = get_model(vllm_config=self.vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//opt/.uv.venv/lib/python3.11/site-packages/vllm/model_executor/model_loader/__init__.py", line 12, in get_model return loader.load_model(vllm_config=vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//opt/.uv.venv/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 376, in load_model model = _initialize_model(vllm_config=vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//opt/.uv.venv/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 118, in _initialize_model return model_class(vllm_config=vllm_config, prefix=prefix) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//opt/.uv.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 451, in __init__ self.model = Qwen2Model(vllm_config=vllm_config, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//opt/.uv.venv/lib/python3.11/site-packages/vllm/compilation/decorators.py", line 149, in __init__ old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs) File "//opt/.uv.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 305, in __init__ self.start_layer, self.end_layer, self.layers = make_layers( ^^^^^^^^^^^^ File "//opt/.uv.venv/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 555, in make_layers [PPMissingLayer() for _ in range(start_layer)] + [ ^ File "//opt/.uv.venv/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 556, in maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}")) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//opt/.uv.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 307, in <lambda> lambda prefix: Qwen2DecoderLayer(config=config, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//opt/.uv.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 206, in __init__ self.self_attn = Qwen2Attention( ^^^^^^^^^^^^^^^ File "//opt/.uv.venv/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 134, in __init__ self.qkv_proj = QKVParallelLinear( ^^^^^^^^^^^^^^^^^^ File "//opt/.uv.venv/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 728, in __init__ super().__init__(input_size=input_size, File "//opt/.uv.venv/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 311, in __init__ self.quant_method.create_weights( File "//opt/.uv.venv/lib/python3.11/site-packages/vllm/model_executor/layers/quantization/fp8.py", line 199, in create_weights weight = ModelWeightParameter(data=torch.empty( ^^^^^^^^^^^^ File "//opt/.uv.venv/lib/python3.11/site-packages/torch/utils/_device.py", line 106, in __torch_function__ return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 70.00 MiB. GPU 0 has a total capacity of 44.53 GiB of which 15.94 MiB is free. Process 1334940 has 44.51 GiB memory in use. Of the allocated memory 44.17 GiB is allocated by PyTorch, and 14.30 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) ;model bge-large-zh-v1.5@hf(172.17.0.2:8002) start successfully INFO: Shutting down

create LoRA network. base dim (rank): 64, alpha: 32 neuron dropout: p=None, rank dropout: p=None, module dropout: p=None create LoRA for Text Encoder: 72 modules. create LoRA for U-Net: 192 modules. enable LoRA for text encoder enable LoRA for U-Net Traceback (most recent call last): File "D:\lora_lian\sd-scripts\train_network.py", line 873, in <module> train(args) File "D:\lora_lian\sd-scripts\train_network.py", line 242, in train info = network.load_weights(args.network_weights) File "D:\lora_lian\sd-scripts\networks\lora.py", line 884, in load_weights info = self.load_state_dict(weights_sd, False) File "D:\lora_lian\python\lib\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for LoRANetwork: size mismatch for lora_unet_mid_block_attentions_0_proj_out.lora_up.weight: copying a param with shape torch.Size([1280, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 64, 1, 1]). Traceback (most recent call last): File "D:\lora_lian\python\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "D:\lora_lian\python\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\lora_lian\python\lib\site-packages\accelerate\commands\launch.py", line 1114, in <module> main() File "D:\lora_lian\python\lib\site-packages\accelerate\commands\launch.py", line 1110, in main launch_command(args) File "D:\lora_lian\python\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command simple_launcher(args) File "D:\lora_lian\python\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\\lora_lian\\python\\python.exe', './sd-scripts/train_network.py', '--config_file', 'D:\\lora_lian\\toml\\autosave\\20230709-112914.toml']' returned non-zero exit status 1. Training failed / 训练失败

bash: docker: command not found [root@ac6b15bb1f77 mas]# python -m vllm.entrypoints.openai.api_server \ > --model /models/z50051264/summary/Qwen2.5-7B-awq/ \ > --max-num-seqs=256 \ > --max-model-len=4096 \ > --max-num-batched-tokens=4096 \ > --tensor-parallel-size=1 \ > --block-size=128 \ > --host=0.0.0.0 \ > --port=8080 \ > --gpu-memory-utilization=0.9 \ > --trust-remote-code \ > --served-model-name=zzz \ > --quantization awq INFO 07-25 03:09:14 [__init__.py:39] Available plugins for group vllm.platform_plugins: INFO 07-25 03:09:14 [__init__.py:41] - ascend -> vllm_ascend:register INFO 07-25 03:09:14 [__init__.py:44] All plugins in this group will be loaded. Set VLLM_PLUGINS to control which plugins to load. INFO 07-25 03:09:14 [__init__.py:235] Platform plugin ascend is activated WARNING 07-25 03:09:15 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") INFO 07-25 03:09:18 [importing.py:63] Triton not installed or not compatible; certain GPU-related functions will not be available. WARNING 07-25 03:09:19 [registry.py:413] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP. WARNING 07-25 03:09:19 [registry.py:413] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration. WARNING 07-25 03:09:19 [registry.py:413] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration. WARNING 07-25 03:09:19 [registry.py:413] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM. WARNING 07-25 03:09:19 [registry.py:413] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM. WARNING 07-25 03:09:19 [registry.py:413] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM. INFO 07-25 03:09:20 [api_server.py:1395] vLLM API server version 0.9.2 INFO 07-25 03:09:20 [cli_args.py:325] non-default args: {'host': '0.0.0.0', 'port': 8080, 'model': '/models/z50051264/summary/Qwen2.5-7B-awq/', 'trust_remote_code': True, 'max_model_len': 4096, 'quantization': 'awq', 'served_model_name': ['zzz'], 'block_size': 128, 'max_num_batched_tokens': 4096, 'max_num_seqs': 256} INFO 07-25 03:09:34 [config.py:841] This model supports multiple tasks: {'generate', 'classify', 'embed', 'reward'}. Defaulting to 'generate'. INFO 07-25 03:09:34 [config.py:1472] Using max model len 4096 WARNING 07-25 03:09:35 [config.py:960] ascend quantization is not fully optimized yet. The speed can be slower than non-quantized models. INFO 07-25 03:09:35 [config.py:2285] Chunked prefill is enabled with max_num_batched_tokens=4096. INFO 07-25 03:09:35 [platform.py:174] PIECEWISE compilation enabled on NPU. use_inductor not supported - using only ACL Graph mode INFO 07-25 03:09:35 [utils.py:321] Calculated maximum supported batch sizes for ACL graph: 66 INFO 07-25 03:09:35 [utils.py:336] Adjusted ACL graph batch sizes for Qwen2ForCausalLM model (layers: 28): 67 → 66 sizes INFO 07-25 03:09:45 [__init__.py:39] Available plugins for group vllm.platform_plugins: INFO 07-25 03:09:45 [__init__.py:41] - ascend -> vllm_ascend:register INFO 07-25 03:09:45 [__init__.py:44] All plugins in this group will be loaded. Set VLLM_PLUGINS to control which plugins to load. INFO 07-25 03:09:45 [__init__.py:235] Platform plugin ascend is activated WARNING 07-25 03:09:46 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") INFO 07-25 03:09:50 [importing.py:63] Triton not installed or not compatible; certain GPU-related functions will not be available. INFO 07-25 03:09:50 [core.py:526] Waiting for init message from front-end. WARNING 07-25 03:09:50 [registry.py:413] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP. WARNING 07-25 03:09:50 [registry.py:413] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration. WARNING 07-25 03:09:50 [registry.py:413] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration. WARNING 07-25 03:09:50 [registry.py:413] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM. WARNING 07-25 03:09:50 [registry.py:413] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM. WARNING 07-25 03:09:50 [registry.py:413] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM. INFO 07-25 03:09:50 [core.py:69] Initializing a V1 LLM engine (v0.9.2) with config: model='/models/z50051264/summary/Qwen2.5-7B-awq/', speculative_config=None, tokenizer='/models/z50051264/summary/Qwen2.5-7B-awq/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=ascend, enforce_eager=False, kv_cache_dtype=auto, device_config=npu, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=zzz, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["all"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.unified_ascend_attention_with_output"],"use_inductor":false,"compile_sizes":[],"inductor_compile_config":{},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null} ERROR 07-25 03:09:53 [core.py:586] EngineCore failed to start. ERROR 07-25 03:09:53 [core.py:586] Traceback (most recent call last): ERROR 07-25 03:09:53 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 577, in run_engine_core ERROR 07-25 03:09:53 [core.py:586] engine_core = EngineCoreProc(*args, **kwargs) ERROR 07-25 03:09:53 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 404, in __init__ ERROR 07-25 03:09:53 [core.py:586] super().__init__(vllm_config, executor_class, log_stats, ERROR 07-25 03:09:53 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 75, in __init__ ERROR 07-25 03:09:53 [core.py:586] self.model_executor = executor_class(vllm_config) ERROR 07-25 03:09:53 [core.py:586] File "/vllm-workspace/vllm/vllm/executor/executor_base.py", line 53, in __init__ ERROR 07-25 03:09:53 [core.py:586] self._init_executor() ERROR 07-25 03:09:53 [core.py:586] File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 47, in _init_executor ERROR 07-25 03:09:53 [core.py:586] self.collective_rpc("init_device") ERROR 07-25 03:09:53 [core.py:586] File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 57, in collective_rpc ERROR 07-25 03:09:53 [core.py:586] answer = run_method(self.driver_worker, method, args, kwargs) ERROR 07-25 03:09:53 [core.py:586] File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 2736, in run_method ERROR 07-25 03:09:53 [core.py:586] return func(*args, **kwargs) ERROR 07-25 03:09:53 [core.py:586] File "/vllm-workspace/vllm/vllm/worker/worker_base.py", line 606, in init_device ERROR 07-25 03:09:53 [core.py:586] self.worker.init_device() # type: ignore ERROR 07-25 03:09:53 [core.py:586] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 132, in init_device ERROR 07-25 03:09:53 [core.py:586] NPUPlatform.set_device(device) ERROR 07-25 03:09:53 [core.py:586] File "/vllm-workspace/vllm-ascend/vllm_ascend/platform.py", line 98, in set_device ERROR 07-25 03:09:53 [core.py:586] torch.npu.set_device(device) ERROR 07-25 03:09:53 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/utils.py", line 80, in set_device ERROR 07-25 03:09:53 [core.py:586] torch_npu._C._npu_setDevice(device_id) ERROR 07-25 03:09:53 [core.py:586] RuntimeError: SetPrecisionMode:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:156 NPU function error: at_npu::native::AclSetCompileopt(aclCompileOpt::ACL_PRECISION_MODE, precision_mode), error code is 500001 ERROR 07-25 03:09:53 [core.py:586] [ERROR] 2025-07-25-03:09:53 (PID:977, Device:0, RankID:-1) ERR00100 PTA call acl api failed ERROR 07-25 03:09:53 [core.py:586] [Error]: The internal ACL of the system is incorrect. ERROR 07-25 03:09:53 [core.py:586] Rectify the fault based on the error information in the ascend log. ERROR 07-25 03:09:53 [core.py:586] EC0010: [PID: 977] 2025-07-25-03:09:53.177.260 Failed to import Python module [AttributeError: np.float_ was removed in the NumPy 2.0 release. Use np.float64 instead..]. ERROR 07-25 03:09:53 [core.py:586] Solution: Check that all required components are properly installed and the specified Python path matches the Python installation directory. (If the path does not match the directory, run set_env.sh in the installation package.) ERROR 07-25 03:09:53 [core.py:586] TraceBack (most recent call last): ERROR 07-25 03:09:53 [core.py:586] AOE Failed to call InitCannKB[FUNC:Initialize][FILE:python_adapter_manager.cc][LINE:47] ERROR 07-25 03:09:53 [core.py:586] Failed to initialize TeConfigInfo. ERROR 07-25 03:09:53 [core.py:586] [GraphOpt][InitializeInner][InitTbeFunc] Failed to init tbe.[FUNC:InitializeTeFusion][FILE:tbe_op_store_adapter.cc][LINE:1889] ERROR 07-25 03:09:53 [core.py:586] [GraphOpt][InitializeInner][InitTeFusion]: Failed to initialize TeFusion.[FUNC:InitializeInner][FILE:tbe_op_store_adapter.cc][LINE:1856] ERROR 07-25 03:09:53 [core.py:586] [SubGraphOpt][PreCompileOp][InitAdapter] InitializeAdapter adapter [tbe_op_adapter] failed! Ret [4294967295][FUNC:InitializeAdapter][FILE:op_store_adapter_manager.cc][LINE:79] ERROR 07-25 03:09:53 [core.py:586] [SubGraphOpt][PreCompileOp][Init] Initialize op store adapter failed, OpsStoreName[tbe-custom].[FUNC:Initialize][FILE:op_store_adapter_manager.cc][LINE:120] ERROR 07-25 03:09:53 [core.py:586] [FusionMngr][Init] Op store adapter manager init failed.[FUNC:Initialize][FILE:fusion_manager.cc][LINE:115] ERROR 07-25 03:09:53 [core.py:586] PluginManager InvokeAll failed.[FUNC:Initialize][FILE:ops_kernel_manager.cc][LINE:83] ERROR 07-25 03:09:53 [core.py:586] OpsManager initialize failed.[FUNC:InnerInitialize][FILE:gelib.cc][LINE:259] ERROR 07-25 03:09:53 [core.py:586] GELib::InnerInitialize failed.[FUNC:Initialize][FILE:gelib.cc][LINE:184] ERROR 07-25 03:09:53 [core.py:586] GEInitialize failed.[FUNC:GEInitialize][FILE:ge_api.cc][LINE:371] ERROR 07-25 03:09:53 [core.py:586] [Initialize][Ge]GEInitialize failed. ge result = 4294967295[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] ERROR 07-25 03:09:53 [core.py:586] [Init][Compiler]Init compiler failed[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145] ERROR 07-25 03:09:53 [core.py:586] [Set][Options]OpCompileProcessor init failed![FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145] ERROR 07-25 03:09:53 [core.py:586] Process EngineCore_0: Traceback (most recent call last): File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 590, in run_engine_core raise e File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 577, in run_engine_core engine_core = EngineCoreProc(*args, **kwargs) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 404, in __init__ super().__init__(vllm_config, executor_class, log_stats, File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 75, in __init__ self.model_executor = executor_class(vllm_config) File "/vllm-workspace/vllm/vllm/executor/executor_base.py", line 53, in __init__ self._init_executor() File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 47, in _init_executor self.collective_rpc("init_device") File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 57, in collective_rpc answer = run_method(self.driver_worker, method, args, kwargs) File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 2736, in run_method return func(*args, **kwargs) File "/vllm-workspace/vllm/vllm/worker/worker_base.py", line 606, in init_device self.worker.init_device() # type: ignore File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 132, in init_device NPUPlatform.set_device(device) File "/vllm-workspace/vllm-ascend/vllm_ascend/platform.py", line 98, in set_device torch.npu.set_device(device) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/utils.py", line 80, in set_device torch_npu._C._npu_setDevice(device_id) RuntimeError: SetPrecisionMode:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:156 NPU function error: at_npu::native::AclSetCompileopt(aclCompileOpt::ACL_PRECISION_MODE, precision_mode), error code is 500001 [ERROR] 2025-07-25-03:09:53 (PID:977, Device:0, RankID:-1) ERR00100 PTA call acl api failed [Error]: The internal ACL of the system is incorrect. Rectify the fault based on the error information in the ascend log. EC0010: [PID: 977] 2025-07-25-03:09:53.177.260 Failed to import Python module [AttributeError: np.float_ was removed in the NumPy 2.0 release. Use np.float64 instead..]. Solution: Check that all required components are properly installed and the specified Python path matches the Python installation directory. (If the path does not match the directory, run set_env.sh in the installation package.) TraceBack (most recent call last): AOE Failed to call InitCannKB[FUNC:Initialize][FILE:python_adapter_manager.cc][LINE:47] Failed to initialize TeConfigInfo. [GraphOpt][InitializeInner][InitTbeFunc] Failed to init tbe.[FUNC:InitializeTeFusion][FILE:tbe_op_store_adapter.cc][LINE:1889] [GraphOpt][InitializeInner][InitTeFusion]: Failed to initialize TeFusion.[FUNC:InitializeInner][FILE:tbe_op_store_adapter.cc][LINE:1856] [SubGraphOpt][PreCompileOp][InitAdapter] InitializeAdapter adapter [tbe_op_adapter] failed! Ret [4294967295][FUNC:InitializeAdapter][FILE:op_store_adapter_manager.cc][LINE:79] [SubGraphOpt][PreCompileOp][Init] Initialize op store adapter failed, OpsStoreName[tbe-custom].[FUNC:Initialize][FILE:op_store_adapter_manager.cc][LINE:120] [FusionMngr][Init] Op store adapter manager init failed.[FUNC:Initialize][FILE:fusion_manager.cc][LINE:115] PluginManager InvokeAll failed.[FUNC:Initialize][FILE:ops_kernel_manager.cc][LINE:83] OpsManager initialize failed.[FUNC:InnerInitialize][FILE:gelib.cc][LINE:259] GELib::InnerInitialize failed.[FUNC:Initialize][FILE:gelib.cc][LINE:184] GEInitialize failed.[FUNC:GEInitialize][FILE:ge_api.cc][LINE:371] [Initialize][Ge]GEInitialize failed. ge result = 4294967295[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] [Init][Compiler]Init compiler failed[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145] [Set][Options]OpCompileProcessor init failed![FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145] Traceback (most recent call last): File "/usr/local/python3.10.17/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/local/python3.10.17/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 1495, in <module> uvloop.run(run_server(args)) File "/usr/local/python3.10.17/lib/python3.10/site-packages/uvloop/__init__.py", line 82, in run return loop.run_until_complete(wrapper()) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/usr/local/python3.10.17/lib/python3.10/site-packages/uvloop/__init__.py", line 61, in wrapper return await main File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 1431, in run_server await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 1451, in run_server_worker async with build_async_engine_client(args, client_config) as engine_client: File "/usr/local/python3.10.17/lib/python3.10/contextlib.py", line 199, in __aenter__ return await anext(self.gen) File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 158, in build_async_engine_client async with build_async_engine_client_from_engine_args( File "/usr/local/python3.10.17/lib/python3.10/contextlib.py", line 199, in __aenter__ return await anext(self.gen) File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 194, in build_async_engine_client_from_engine_args async_llm = AsyncLLM.from_vllm_config( File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 162, in from_vllm_config return cls( File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 124, in __init__ self.engine_core = EngineCoreClient.make_async_mp_client( File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 96, in make_async_mp_client return AsyncMPClient(*client_args) File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 666, in __init__ super().__init__( File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 403, in __init__ with launch_core_engines(vllm_config, executor_class, File "/usr/local/python3.10.17/lib/python3.10/contextlib.py", line 142, in __exit__ next(self.gen) File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 434, in launch_core_engines wait_for_engine_startup( File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 484, in wait_for_engine_startup raise RuntimeError("Engine core initialization failed. " RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} [ERROR] 2025-07-25-03:09:59 (PID:707, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception 分析报错

RuntimeError: Failed to infer device type, please set the environment variable VLLM_LOGGING_LEVEL=DEBUG to turn on verbose logging to help debug the issue. 2025-07-25 13:35:45 INFO 07-24 22:35:45 [__init__.py:248] No platform detected, vLLM is running on UnspecifiedPlatform 2025-07-25 13:35:45 WARNING 07-24 22:35:45 [_custom_ops.py:20] Failed to import from vllm._C with ImportError('\x01: cannot open shared object file: No such file or directory') 2025-07-25 13:37:16 INFO 07-24 22:37:16 [__init__.py:248] No platform detected, vLLM is running on UnspecifiedPlatform 2025-07-25 13:37:16 WARNING 07-24 22:37:16 [_custom_ops.py:20] Failed to import from vllm._C with ImportError('\x01: cannot open shared object file: No such file or directory') 2025-07-25 13:37:19 Traceback (most recent call last): 2025-07-25 13:37:19 File "<frozen runpy>", line 198, in _run_module_as_main 2025-07-25 13:37:19 File "<frozen runpy>", line 88, in _run_code 2025-07-25 13:37:19 File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1491, in <module> 2025-07-25 13:37:19 parser = make_arg_parser(parser) 2025-07-25 13:37:19 ^^^^^^^^^^^^^^^^^^^^^^^ 2025-07-25 13:37:19 File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/cli_args.py", line 266, in make_arg_parser 2025-07-25 13:37:19 parser = AsyncEngineArgs.add_cli_args(parser) 2025-07-25 13:37:19 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-07-25 13:37:19 File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1717, in add_cli_args 2025-07-25 13:37:19 parser = EngineArgs.add_cli_args(parser) 2025-07-25 13:37:19 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-07-25 13:37:19 File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 906, in add_cli_args 2025-07-25 13:37:19 vllm_kwargs = get_kwargs(VllmConfig) 2025-07-25 13:37:19 ^^^^^^^^^^^^^^^^^^^^^^ 2025-07-25 13:37:19 File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 285, in get_kwargs 2025-07-25 13:37:19 return copy.deepcopy(_compute_kwargs(cls)) 2025-07-25 13:37:19 ^^^^^^^^^^^^^^^^^^^^ 2025-07-25 13:37:19 File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 189, in _compute_kwargs 2025-07-25 13:37:19 default = field.default_factory() 2025-07-25 13:37:19 ^^^^^^^^^^^^^^^^^^^^^^^ 2025-07-25 13:37:19 File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 123, in __init__ 2025-07-25 13:37:19 s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s) 2025-07-25 13:37:19 File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 2413, in __post_init__ 2025-07-25 13:37:19 raise RuntimeError( 2025-07-25 13:37:19 RuntimeError: Failed to infer device type, please set the environment variable VLLM_LOGGING_LEVEL=DEBUG to turn on verbose logging to help debug the issue.

-> llm = LLM(**dataclasses.asdict(engine_args)) (Pdb) n INFO 07-28 06:48:21 [config.py:841] This model supports multiple tasks: {'classify', 'generate', 'embed', 'reward'}. Defaulting to 'generate'. INFO 07-28 06:48:21 [config.py:1472] Using max model len 32768 INFO 07-28 06:48:22 [config.py:2285] Chunked prefill is enabled with max_num_batched_tokens=8192. INFO 07-28 06:48:22 [platform.py:174] PIECEWISE compilation enabled on NPU. use_inductor not supported - using only ACL Graph mode INFO 07-28 06:48:22 [utils.py:321] Calculated maximum supported batch sizes for ACL graph: 66 INFO 07-28 06:48:22 [utils.py:336] Adjusted ACL graph batch sizes for Qwen2ForCausalLM model (layers: 28): 67 → 66 sizes huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) INFO 07-28 06:48:23 [core.py:526] Waiting for init message from front-end. INFO 07-28 06:48:23 [core.py:69] Initializing a V1 LLM engine (v0.9.2) with config: model='/models/z50051264/summary/Qwen2.5-7B', speculative_config=None, tokenizer='/models/z50051264/summary/Qwen2.5-7B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=npu, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=/models/z50051264/summary/Qwen2.5-7B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["all"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.unified_ascend_attention_with_output"],"use_inductor":false,"compile_sizes":[],"inductor_compile_config":{},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null} ERROR 07-28 06:48:29 [core.py:586] EngineCore failed to start. ERROR 07-28 06:48:29 [core.py:586] Traceback (most recent call last): ERROR 07-28 06:48:29 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 577, in run_engine_core ERROR 07-28 06:48:29 [core.py:586] engine_core = EngineCoreProc(*args, **kwargs) ERROR 07-28 06:48:29 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 404, in __init__ ERROR 07-28 06:48:29 [core.py:586] super().__init__(vllm_config, executor_class, log_stats, ERROR 07-28 06:48:29 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 75, in __init__ ERROR 07-28 06:48:29 [core.py:586] self.model_executor = executor_class(vllm_config) ERROR 07-28 06:48:29 [core.py:586] File "/vllm-workspace/vllm/vllm/executor/executor_base.py", line 53, in __init__ ERROR 07-28 06:48:29 [core.py:586] self._init_executor() ERROR 07-28 06:48:29 [core.py:586] File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 47, in _init_executor ERROR 07-28 06:48:29 [core.py:586] self.collective_rpc("init_device") ERROR 07-28 06:48:29 [core.py:586] File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 57, in collective_rpc ERROR 07-28 06:48:29 [core.py:586] answer = run_method(self.driver_worker, method, args, kwargs) ERROR 07-28 06:48:29 [core.py:586] File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 2736, in run_method ERROR 07-28 06:48:29 [core.py:586] return func(*args, **kwargs) ERROR 07-28 06:48:29 [core.py:586] File "/vllm-workspace/vllm/vllm/worker/worker_base.py", line 606, in init_device ERROR 07-28 06:48:29 [core.py:586] self.worker.init_device() # type: ignore ERROR 07-28 06:48:29 [core.py:586] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 132, in init_device ERROR 07-28 06:48:29 [core.py:586] NPUPlatform.set_device(device) ERROR 07-28 06:48:29 [core.py:586] File "/vllm-workspace/vllm-ascend/vllm_ascend/platform.py", line 98, in set_device ERROR 07-28 06:48:29 [core.py:586] torch.npu.set_device(device) ERROR 07-28 06:48:29 [core.py:586] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/utils.py", line 80, in set_device ERROR 07-28 06:48:29 [core.py:586] torch_npu._C._npu_setDevice(device_id) ERROR 07-28 06:48:29 [core.py:586] RuntimeError: SetPrecisionMode:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:156 NPU function error: at_npu::native::AclSetCompileopt(aclCompileOpt::ACL_PRECISION_MODE, precision_mode), error code is 500001 ERROR 07-28 06:48:29 [core.py:586] [ERROR] 2025-07-28-06:48:29 (PID:3848, Device:0, RankID:-1) ERR00100 PTA call acl api failed ERROR 07-28 06:48:29 [core.py:586] [Error]: The internal ACL of the system is incorrect. ERROR 07-28 06:48:29 [core.py:586] Rectify the fault based on the error information in the ascend log. ERROR 07-28 06:48:29 [core.py:586] E90000: [PID: 3848] 2025-07-28-06:48:29.892.415 Compile operator failed, cause: module '__main__' has no attribute '__spec__' File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/interface.py", line 33, in cann_kb_init ERROR 07-28 06:48:29 [core.py:586] return RouteServer.initialize(**locals()) ERROR 07-28 06:48:29 [core.py:586] File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py", line 54, in wrapper ERROR 07-28 06:48:29 [core.py:586] return func(cls, *args, **kwargs) ERROR 07-28 06:48:29 [core.py:586] File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py", line 169, in initialize ERROR 07-28 06:48:29 [core.py:586] main_mod, main_path = config_main_info() ERROR 07-28 06:48:29 [core.py:586] File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/utils/common.py", line 37, in config_main_info ERROR 07-28 06:48:29 [core.py:586] main_module_name = getattr(main_module.__spec__, "name", None) ERROR 07-28 06:48:29 [core.py:586] ERROR 07-28 06:48:29 [core.py:586] TraceBack (most recent call last): ERROR 07-28 06:48:29 [core.py:586] AOE Failed to call InitCannKB[FUNC:Initialize][FILE:python_adapter_manager.cc][LINE:47] ERROR 07-28 06:48:29 [core.py:586] Failed to initialize TeConfigInfo. ERROR 07-28 06:48:29 [core.py:586] [GraphOpt][InitializeInner][InitTbeFunc] Failed to init tbe.[FUNC:InitializeTeFusion][FILE:tbe_op_store_adapter.cc][LINE:1889] ERROR 07-28 06:48:29 [core.py:586] [GraphOpt][InitializeInner][InitTeFusion]: Failed to initialize TeFusion.[FUNC:InitializeInner][FILE:tbe_op_store_adapter.cc][LINE:1856] ERROR 07-28 06:48:29 [core.py:586] [SubGraphOpt][PreCompileOp][InitAdapter] InitializeAdapter adapter [tbe_op_adapter] failed! Ret [4294967295][FUNC:InitializeAdapter][FILE:op_store_adapter_manager.cc][LINE:79] ERROR 07-28 06:48:29 [core.py:586] [SubGraphOpt][PreCompileOp][Init] Initialize op store adapter failed, OpsStoreName[tbe-custom].[FUNC:Initialize][FILE:op_store_adapter_manager.cc][LINE:120] ERROR 07-28 06:48:29 [core.py:586] [FusionMngr][Init] Op store adapter manager init failed.[FUNC:Initialize][FILE:fusion_manager.cc][LINE:115] ERROR 07-28 06:48:29 [core.py:586] PluginManager InvokeAll failed.[FUNC:Initialize][FILE:ops_kernel_manager.cc][LINE:83] ERROR 07-28 06:48:29 [core.py:586] OpsManager initialize failed.[FUNC:InnerInitialize][FILE:gelib.cc][LINE:259] ERROR 07-28 06:48:29 [core.py:586] GELib::InnerInitialize failed.[FUNC:Initialize][FILE:gelib.cc][LINE:184] ERROR 07-28 06:48:29 [core.py:586] GEInitialize failed.[FUNC:GEInitialize][FILE:ge_api.cc][LINE:371] ERROR 07-28 06:48:29 [core.py:586] [Initialize][Ge]GEInitialize failed. ge result = 4294967295[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] ERROR 07-28 06:48:29 [core.py:586] [Init][Compiler]Init compiler failed[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145] ERROR 07-28 06:48:29 [core.py:586] [Set][Options]OpCompileProcessor init failed![FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145] ERROR 07-28 06:48:29 [core.py:586] Exception in thread Thread-4: Process EngineCore_0: Traceback (most recent call last): File "/usr/local/python3.10.17/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/utils/multiprocess_util.py", line 91, in run Traceback (most recent call last): File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 590, in run_engine_core raise e File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 577, in run_engine_core engine_core = EngineCoreProc(*args, **kwargs) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 404, in __init__ super().__init__(vllm_config, executor_class, log_stats, key, func, args, kwargs = self.task_q.get(timeout=TIMEOUT) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 75, in __init__ self.model_executor = executor_class(vllm_config) File "<string>", line 2, in get File "/vllm-workspace/vllm/vllm/executor/executor_base.py", line 53, in __init__ self._init_executor() File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 47, in _init_executor self.collective_rpc("init_device") File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 57, in collective_rpc answer = run_method(self.driver_worker, method, args, kwargs) File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 2736, in run_method return func(*args, **kwargs) File "/vllm-workspace/vllm/vllm/worker/worker_base.py", line 606, in init_device self.worker.init_device() # type: ignore File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 132, in init_device NPUPlatform.set_device(device) File "/vllm-workspace/vllm-ascend/vllm_ascend/platform.py", line 98, in set_device torch.npu.set_device(device) File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/npu/utils.py", line 80, in set_device torch_npu._C._npu_setDevice(device_id) RuntimeError: SetPrecisionMode:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:156 NPU function error: at_npu::native::AclSetCompileopt(aclCompileOpt::ACL_PRECISION_MODE, precision_mode), error code is 500001 [ERROR] 2025-07-28-06:48:29 (PID:3848, Device:0, RankID:-1) ERR00100 PTA call acl api failed [Error]: The internal ACL of the system is incorrect. Rectify the fault based on the error information in the ascend log. E90000: [PID: 3848] 2025-07-28-06:48:29.892.415 Compile operator failed, cause: module '__main__' has no attribute '__spec__' File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/interface.py", line 33, in cann_kb_init return RouteServer.initialize(**locals()) File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py", line 54, in wrapper return func(cls, *args, **kwargs) File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py", line 169, in initialize main_mod, main_path = config_main_info() File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/utils/common.py", line 37, in config_main_info main_module_name = getattr(main_module.__spec__, "name", None) TraceBack (most recent call last): AOE Failed to call InitCannKB[FUNC:Initialize][FILE:python_adapter_manager.cc][LINE:47] Failed to initialize TeConfigInfo. [GraphOpt][InitializeInner][InitTbeFunc] Failed to init tbe.[FUNC:InitializeTeFusion][FILE:tbe_op_store_adapter.cc][LINE:1889] [GraphOpt][InitializeInner][InitTeFusion]: Failed to initialize TeFusion.[FUNC:InitializeInner][FILE:tbe_op_store_adapter.cc][LINE:1856] [SubGraphOpt][PreCompileOp][InitAdapter] InitializeAdapter adapter [tbe_op_adapter] failed! Ret [4294967295][FUNC:InitializeAdapter][FILE:op_store_adapter_manager.cc][LINE:79] [SubGraphOpt][PreCompileOp][Init] Initialize op store adapter failed, OpsStoreName[tbe-custom].[FUNC:Initialize][FILE:op_store_adapter_manager.cc][LINE:120] [FusionMngr][Init] Op store adapter manager init failed.[FUNC:Initialize][FILE:fusion_manager.cc][LINE:115] PluginManager InvokeAll failed.[FUNC:Initialize][FILE:ops_kernel_manager.cc][LINE:83] OpsManager initialize failed.[FUNC:InnerInitialize][FILE:gelib.cc][LINE:259] GELib::InnerInitialize failed.[FUNC:Initialize][FILE:gelib.cc][LINE:184] GEInitialize failed.[FUNC:GEInitialize][FILE:ge_api.cc][LINE:371] [Initialize][Ge]GEInitialize failed. ge result = 4294967295[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] [Init][Compiler]Init compiler failed[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145] [Set][Options]OpCompileProcessor init failed![FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145] RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {'EngineCore_0': 1} > /models/z50051264/vllm-0.10.0/benchmarks/benchmark_throughput.py(62)run_vllm() -> llm = LLM(**dataclasses.asdict(engine_args)) 初始化引擎失败是怎么回事儿???

import numpy import keras import keras_metrics import pandas as pd from sklearn import metrics from sklearn import model_selection from sklearn.model_selection import train_test_split from keras.models import Sequential from keras.layers import Dense,Dropout from keras.callbacks import TensorBoard from keras.callbacks import EarlyStopping import scienceplots df1=pd.read_csv('F:/testfile/eva/o_xor.csv',header=None) df2=pd.read_csv('F:/testfile/eva/r_6xor.csv',header=None) X1 = df1.iloc[:10000,:129] Y1 = df2.iloc[:10000,:1] numpy.random.seed(42) tbCallBack = keras.callbacks.TensorBoard(log_dir='./Graph4/ipuf4', histogram_freq=1, write_graph=True, write_images=True) train_features1, test_features1, train_labels1, test_labels1 = model_selection.train_test_split(X1, Y1, test_size = 0.2, random_state = 42) # create model model4 = Sequential() model4.add(Dense(6, input_dim=129, activation='sigmoid')) model4.add(Dense(6,activation='ReLU')) model4.add(Dense(4, activation='LeakyReLU')) model4.add(Dense(1, activation='sigmoid')) model4.summary() # Compile model model4.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy',keras_metrics.precision(), keras_metrics.recall()]) # Fit the model results4 = model4.fit( train_features1, train_labels1, epochs=100, batch_size = 1000, validation_data = (test_features1, test_labels1), callbacks = [EarlyStopping(patience=100,mode='min',verbose=1), tbCallBack]) scores4= model4.evaluate(test_features1, test_labels1) model4.save_weights('puf_model_weights.h5') # 保存权重文件 # 保存权重到文本文件 for i, layer in enumerate(model4.layers): weights, biases = layer.get_weights() numpy.savetxt(f'layer_{i}_weights.csv', weights, delimiter=',') # 保存权重矩阵 numpy.savetxt(f'layer_{i}_biases.csv', biases, delimiter=',') # 保存偏置项 # 显示权重统计信息 print("\n=== 权重分析 ===") for i, layer in enumerate(model4.layers): weights = layer.get_weights()[0] biases = layer.get_weights()[1] # 打印权重形状和统计量 print(f"\nLayer {i} ({layer.name}) 权重矩阵形状: {weights.shape}") print(f"权重值范围: [{numpy.min(weights):.4f}, {numpy.max(weights):.4f}]") print(f"权重均值: {numpy.mean(weights):.4f} ± {numpy.std(weights):.4f}") # 打印前3个神经元的权重示例 print("前3个神经元的权重样本:") for neuron_idx in range(3): if neuron_idx < weights.shape[1]: neuron_weights = weights[:, neuron_idx] print(f"Neuron {neuron_idx}: [", end="") print(", ".join([f"{w:.4f}" for w in neuron_weights[:3]]), "...]") # 打印偏置统计信息 print(f"\n偏置项形状: {biases.shape}") print(f"偏置值范围: [{numpy.min(biases):.4f}, {numpy.max(biases):.4f}]") print("----------------------------------") import matplotlib.pyplot as plt plt.style.use('science') # 使用scienceplots样式 plt.figure(figsize=(10,6)) for i, layer in enumerate(model4.layers): weights = layer.get_weights()[0].flatten() plt.hist(weights, bins=50, alpha=0.5, label=f'Layer {i}') plt.xlabel("Weight Value") plt.ylabel("Frequency") plt.legend() plt.savefig('weight_distributions.png', dpi=300) print("\n%s: %.2f%%" % (model4.metrics_names[1], scores4[1]*100)) 为上面的代码添加注释

(vllm) zhzx@zhzx-S2600WF-LS:/media/zhzx/ssd2/Qwen3-32B-AWQ$ CUDA_VISIBLE_DEVICES=0,1 python -m vllm.entrypoints.api_server \ --model /media/zhzx/ssd2/Qwen3-32B-AWQ \ --tensor-parallel-size 2 \ --quantization awq \ --trust-remote-code INFO 05-12 11:01:48 [__init__.py:239] Automatically detected platform cuda. INFO 05-12 11:01:50 [api_server.py:121] vLLM API server version 0.8.5.post1 INFO 05-12 11:01:50 [api_server.py:122] args: Namespace(host=None, port=8000, ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, enable_ssl_refresh=False, ssl_cert_reqs=0, root_path=None, log_level='debug', model='/media/zhzx/ssd2/Qwen3-32B-AWQ', task='auto', tokenizer=None, hf_config_path=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, allowed_local_media_path=None, load_format='auto', download_dir=None, model_loader_extra_config={}, use_tqdm_on_load=True, config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', max_model_len=None, guided_decoding_backend='auto', reasoning_parser=None, logits_processor_pattern=None, model_impl='auto', distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=2, data_parallel_size=1, enable_expert_parallel=False, max_parallel_loading_workers=None, ray_workers_use_nsight=False, disable_custom_all_reduce=False, block_size=None, gpu_memory_utilization=0.9, swap_space=4, kv_cache_dtype='auto', num_gpu_blocks_override=None, enable_prefix_caching=None, prefix_caching_hash_algo='builtin', cpu_offload_gb=0, calculate_kv_scales=False, disable_sliding_window=False, use_v2_block_manager=True, seed=None, max_logprobs=20, disable_log_stats=False, quantization='awq', rope_scaling=None, rope_theta=None, hf_token=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config={}, limit_mm_per_prompt={}, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=None, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=None, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', speculative_config=None, ignore_patterns=[], served_model_name=None, qlora_adapter_name_or_path=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, max_num_batched_tokens=None, max_num_seqs=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, num_lookahead_slots=0, scheduler_delay_factor=0.0, preemption_mode=None, num_scheduler_steps=1, multi_step_stream_outputs=True, scheduling_policy='fcfs', enable_chunked_prefill=None, disable_chunked_mm_input=False, scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', worker_extension_cls='', generation_config='auto', override_generation_config=None, enable_sleep_mode=False, additional_config=None, enable_reasoning=False, disable_cascade_attn=False, disable_log_requests=False) INFO 05-12 11:01:56 [config.py:717] This model supports multiple tasks: {'score', 'reward', 'classify', 'embed', 'generate'}. Defaulting to 'generate'. WARNING 05-12 11:01:57 [config.py:830] awq quantization is not fully optimized yet. The speed can be slower than non-quantized models. WARNING 05-12 11:01:57 [arg_utils.py:1658] Compute Capability < 8.0 is not supported by the V1 Engine. Falling back to V0. WARNING 05-12 11:01:57 [arg_utils.py:1525] Chunked prefill is enabled by default for models with max_model_len > 32K. Chunked prefill might not work with some features or models. If you encounter any issues, please disable by launching with --enable-chunked-prefill=False. INFO 05-12 11:01:57 [config.py:1770] Defaulting to use mp for distributed inference INFO 05-12 11:01:57 [config.py:2003] Chunked prefill is enabled with max_num_batched_tokens=2048. INFO 05-12 11:01:57 [llm_engine.py:240] Initializing a V0 LLM engine (v0.8.5.post1) with config: model='/media/zhzx/ssd2/Qwen3-32B-AWQ', speculative_config=None, tokenizer='/media/zhzx/ssd2/Qwen3-32B-AWQ', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=40960, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=awq, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='auto', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=/media/zhzx/ssd2/Qwen3-32B-AWQ, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=True, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False, WARNING 05-12 11:01:57 [multiproc_worker_utils.py:306] Reducing Torch parallelism from 16 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed. (VllmWorkerProcess pid=100643) INFO 05-12 11:01:57 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks INFO 05-12 11:01:57 [cuda.py:240] Cannot use FlashAttention-2 backend for Volta and Turing GPUs. INFO 05-12 11:01:57 [cuda.py:289] Using XFormers backend. (VllmWorkerProcess pid=100643) INFO 05-12 11:01:57 [cuda.py:240] Cannot use FlashAttention-2 backend for Volta and Turing GPUs. (VllmWorkerProcess pid=100643) INFO 05-12 11:01:57 [cuda.py:289] Using XFormers backend. INFO 05-12 11:01:58 [utils.py:1055] Found nccl from library libnccl.so.2 (VllmWorkerProcess pid=100643) INFO 05-12 11:01:58 [utils.py:1055] Found nccl from library libnccl.so.2 INFO 05-12 11:01:58 [pynccl.py:69] vLLM is using nccl==2.21.5 (VllmWorkerProcess pid=100643) INFO 05-12 11:01:58 [pynccl.py:69] vLLM is using nccl==2.21.5 (VllmWorkerProcess pid=100643) INFO 05-12 11:01:58 [custom_all_reduce_utils.py:244] reading GPU P2P access cache from /home/zhzx/.cache/vllm/gpu_p2p_access_cache_for_0,1.json INFO 05-12 11:01:58 [custom_all_reduce_utils.py:244] reading GPU P2P access cache from /home/zhzx/.cache/vllm/gpu_p2p_access_cache_for_0,1.json INFO 05-12 11:01:58 [shm_broadcast.py:266] vLLM message queue communication handle: Handle(local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_91501777'), local_subscribe_addr='ipc:///tmp/c7c55d66-4bbe-451b-a9f3-3133e26fdb9b', remote_subscribe_addr=None, remote_addr_ipv6=False) (VllmWorkerProcess pid=100643) INFO 05-12 11:01:58 [parallel_state.py:1004] rank 1 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 1 INFO 05-12 11:01:58 [parallel_state.py:1004] rank 0 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 0 INFO 05-12 11:01:58 [model_runner.py:1108] Starting to load model /media/zhzx/ssd2/Qwen3-32B-AWQ... (VllmWorkerProcess pid=100643) INFO 05-12 11:01:58 [model_runner.py:1108] Starting to load model /media/zhzx/ssd2/Qwen3-32B-AWQ... Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:01<00:04, 1.51s/it] Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:02<00:02, 1.36s/it] Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:04<00:01, 1.52s/it] Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:05<00:00, 1.41s/it] Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:05<00:00, 1.43s/it] (VllmWorkerProcess pid=100643) INFO 05-12 11:02:04 [loader.py:458] Loading weights took 5.72 seconds INFO 05-12 11:02:04 [loader.py:458] Loading weights took 5.79 seconds INFO 05-12 11:02:05 [model_runner.py:1140] Model loading took 9.0568 GiB and 5.991388 seconds (VllmWorkerProcess pid=100643) INFO 05-12 11:02:05 [model_runner.py:1140] Model loading took 9.0568 GiB and 5.932729 seconds (VllmWorkerProcess pid=100643) INFO 05-12 11:02:10 [worker.py:287] Memory profiling takes 5.32 seconds (VllmWorkerProcess pid=100643) INFO 05-12 11:02:10 [worker.py:287] the current vLLM instance can use total_gpu_memory (23.64GiB) x gpu_memory_utilization (0.90) = 21.27GiB (VllmWorkerProcess pid=100643) INFO 05-12 11:02:10 [worker.py:287] model weights take 9.06GiB; non_torch_memory takes 0.13GiB; PyTorch activation peak memory takes 0.41GiB; the rest of the memory reserved for KV Cache is 11.68GiB. INFO 05-12 11:02:10 [worker.py:287] Memory profiling takes 5.46 seconds INFO 05-12 11:02:10 [worker.py:287] the current vLLM instance can use total_gpu_memory (23.64GiB) x gpu_memory_utilization (0.90) = 21.27GiB INFO 05-12 11:02:10 [worker.py:287] model weights take 9.06GiB; non_torch_memory takes 0.13GiB; PyTorch activation peak memory takes 1.40GiB; the rest of the memory reserved for KV Cache is 10.68GiB. INFO 05-12 11:02:11 [executor_base.py:112] # cuda blocks: 5469, # CPU blocks: 2048 INFO 05-12 11:02:11 [executor_base.py:117] Maximum concurrency for 40960 tokens per request: 2.14x INFO 05-12 11:02:13 [model_runner.py:1450] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing gpu_memory_utilization or switching to eager mode. You can also reduce the max_num_seqs as needed to decrease memory usage. Capturing CUDA graph shapes: 0%| | 0/35 [00:00<?, ?it/s](VllmWorkerProcess pid=100643) INFO 05-12 11:02:13 [model_runner.py:1450] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing gpu_memory_utilization or switching to eager mode. You can also reduce the max_num_seqs as needed to decrease memory usage. Capturing CUDA graph shapes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:29<00:00, 1.18it/s] INFO 05-12 11:02:43 [custom_all_reduce.py:195] Registering 4515 cuda graph addresses (VllmWorkerProcess pid=100643) INFO 05-12 11:02:43 [custom_all_reduce.py:195] Registering 4515 cuda graph addresses (VllmWorkerProcess pid=100643) INFO 05-12 11:02:43 [model_runner.py:1592] Graph capturing finished in 30 secs, took 0.97 GiB INFO 05-12 11:02:43 [model_runner.py:1592] Graph capturing finished in 30 secs, took 0.97 GiB INFO 05-12 11:02:43 [llm_engine.py:437] init engine (profile, create kv cache, warmup model) took 38.52 seconds INFO 05-12 11:02:43 [launcher.py:28] Available routes are: INFO 05-12 11:02:43 [launcher.py:36] Route: /openapi.json, Methods: GET, HEAD INFO 05-12 11:02:43 [launcher.py:36] Route: /docs, Methods: GET, HEAD INFO 05-12 11:02:43 [launcher.py:36] Route: /docs/oauth2-redirect, Methods: GET, HEAD INFO 05-12 11:02:43 [launcher.py:36] Route: /redoc, Methods: GET, HEAD INFO 05-12 11:02:43 [launcher.py:36] Route: /health, Methods: GET INFO 05-12 11:02:43 [launcher.py:36] Route: /generate, Methods: POST [rank0]: Traceback (most recent call last): [rank0]: File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/starlette/datastructures.py", line 668, in __getattr__ [rank0]: return self._state[key] [rank0]: ~~~~~~~~~~~^^^^^ [rank0]: KeyError: 'engine_client' [rank0]: During handling of the above exception, another exception occurred: [rank0]: Traceback (most recent call last): [rank0]: File "<frozen runpy>", line 198, in _run_module_as_main [rank0]: File "<frozen runpy>", line 88, in _run_code [rank0]: File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/api_server.py", line 177, in <module> [rank0]: asyncio.run(run_server(args)) [rank0]: File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/asyncio/runners.py", line 195, in run [rank0]: return runner.run(main) [rank0]: ^^^^^^^^^^^^^^^^ [rank0]: File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/asyncio/runners.py", line 118, in run [rank0]: return self._loop.run_until_complete(task) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete [rank0]: return future.result() [rank0]: ^^^^^^^^^^^^^^^ [rank0]: File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/api_server.py", line 129, in run_server [rank0]: shutdown_task = await serve_http( [rank0]: ^^^^^^^^^^^^^^^^^ [rank0]: File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/launcher.py", line 46, in serve_http [rank0]: watchdog_loop(server, app.state.engine_client)) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/home/zhzx/miniconda3/envs/vllm/lib/python3.12/site-packages/starlette/datastructures.py", line 671, in __getattr__ [rank0]: raise AttributeError(message.format(self.__class__.__name__, key)) [rank0]: AttributeError: 'State' object has no attribute 'engine_client' [rank0]:[W512 11:02:45.924296727 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) /home/zhzx/miniconda3/envs/vllm/lib/python3.12/multiprocessing/resource_tracker.py:255: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

最新推荐

recommend-type

安卓学习项目实践.zip

安卓学习项目实践.zip
recommend-type

安卓版植物大战僵尸 最新5.0版本解析

根据提供的文件信息,我们可以挖掘出以下知识点: 1. Android平台的"植物大战僵尸"游戏 "植物大战僵尸"是一款非常受欢迎的策略塔防游戏,最初由PopCap Games开发,为PC和Mac平台设计。后续PopCap Games被电子艺界(Electronic Arts,简称EA)收购,EA将这款经典游戏移植到了多个平台,包括iOS和Android平台。这次提到的版本是安卓版的"植物大战僵尸",它在功能和操作体验上尽量向PC版靠拢。 2. 游戏的数据包安装方法 游戏文件通常由APK安装包和数据包组成。数据包中包含了游戏的资源文件,如纹理、音效、地图数据等。安装此款"植物大战僵尸"安卓游戏时,需要将数据包中的usr和obb文件夹放置在SD卡的Android/obb目录下。通常,obb文件夹是用于存放大型游戏的数据包,以避免APK文件过大。 3. 游戏的兼容性和操作系统要求 文件描述中指出,此安卓版"植物大战僵尸"需要安卓4.1以上版本才可以运行。这意味着它至少兼容安卓 Jelly Bean 4.1至最新的安卓版本。玩家在下载和安装游戏前需检查自己的设备操作系统版本是否满足这一要求。 4. 游戏玩法和特性 游戏拥有“花园”模式,这可能意味着玩家需要在某种虚拟花园内种植植物,并通过此方式发展自己的防御系统。此外,游戏还含有很多种无尽模式。无尽模式通常指的是一种游戏循环进行的模式,玩家需要在不断增加难度的情况下尽可能长时间地生存下来。 5. 游戏的解锁机制 文件描述中提到的“需要通关冒险模式解锁”,这说明游戏采用了类似于其他塔防游戏的通关解锁机制。玩家首先需要通过游戏的冒险模式,完成一系列的任务和挑战,才能开启其他模式或增强的游戏内容。 6. 游戏的标签 此款游戏的标签是“植物大战僵尸 含数据包 好玩”。标签"含数据包"再次确认了玩家在安装过程中需要处理数据包的问题,"好玩"则是一个主观的评价,表明游戏在发布时给玩家的普遍印象是有趣的。 总结来说,此安卓版的"植物大战僵尸"是一款高度仿照PC版的移植作品,要求玩家的安卓设备至少是4.1版本以上。游戏提供了丰富的模式和挑战,以及需要通过完成特定任务来解锁的特性。安装时需要正确放置数据包,以确保游戏的完整运行和玩家的良好体验。
recommend-type

元宇宙中的智能扩展现实:新兴理论与应用探索

# 元宇宙中的智能扩展现实:新兴理论与应用 ## 1. 元宇宙的特征 元宇宙是一个具有多种独特特征的环境,这些特征使其区别于传统的现实世界和虚拟世界。具体如下: - **协作环境**:人们在元宇宙中协作以实现经济、社会和休闲等不同目标。 - **在线空间**:基于三维的在线环境,人们可以沉浸其中。 - **共享世界**:人们能够分享活动、观点和信息,购物也成为一种网络化体验。 - **增强和科技化场所**:借助增强现实技术,人们可以丰富体验,还能通过虚拟元素、技术和互联网进行社交和互动。 - **多用户环境**:人们可以同时使用相同的技术或进行相同的活动,是现实生活的延伸。 - **无限世界
recommend-type

内网穿透时序图

内网穿透(也称为NAT穿透)是一种通过公网服务器将内网服务暴露到公网的技术。其核心原理是通过建立一条从公网到内网的通信隧道,使得外部网络可以访问到处于内网中的服务。以下是一个典型的内网穿透工作原理的时序图描述: ### 内网穿透时序图 1. **内网客户端连接公网服务器** 内网中的客户端(如本地开发服务器)主动连接到公网上的穿透服务器,建立一条长连接。这条连接通常会保持活跃状态,用于后续的请求转发 [^2]。 2. **公网服务器分配映射地址** 公网服务器在接收到内网客户端的连接后,会为其分配一个公网映射地址(如公网IP和端口),并将这个映射关系记录下来 [^1]
recommend-type

图形学实验:画方格模拟像素点及交互功能实现

从标题和描述中可以看出,这是一段涉及计算机图形学实验的代码。知识点覆盖了图形学基础、事件处理、用户交互以及图形算法等几个方面。下面将对这些知识点进行详细说明。 计算机图形学是计算机科学的一个分支,主要研究如何利用计算机技术来生成、处理、存储和显示图形信息。图形学实验通常要求学生能够通过编程实践来理解并实现各种图形算法,从而加深对图形学理论的理解。 描述中提到的实验功能涉及了以下几个核心知识点: 1. **PgUp键放大和PgDn键缩小功能**:这涉及到图形的变换,特别是缩放变换。在计算机图形学中,缩放变换是一种线性变换,通过改变图形的尺寸来进行显示,这种操作通常通过改变图形的坐标系中的比例因子来实现。实验中用到了键盘事件处理来控制图形的缩放,这也是图形用户界面(GUI)编程的一部分。 2. **方向键平移功能**:平移是一种基本的图形变换,它通过改变图形的位置而不改变其大小和形状来实现。与缩放类似,平移也是线性变换的一种,通过改变图形在坐标系中的位置向量来完成。在用户界面中通过监听键盘事件(如方向键的按下)来触发平移操作,体现了事件驱动编程的应用。 3. **鼠标画线功能**:鼠标是图形用户界面中一种重要的交互设备,通过它可以实现图形的选择、拖动等操作。实验中通过鼠标事件(如鼠标左键点击)来选择线段的起点和终点,实现画线功能。此外还提到了鼠标右键的取消操作,这涉及到了事件处理中的事件取消与拦截技术,即在某个操作未完成前,用户可以通过特定操作来终止当前操作。 4. **椭圆和圆的画线算法**:在计算机图形学中,椭圆和圆的生成是基本算法之一。圆和椭圆的画法通常涉及参数方程或离散像素点的确定。实验中通过调整算法实现不同的图形绘制,这要求学生了解基本的几何变换以及图形绘制算法。 5. **多边形填充算法**:多边形的填充算法是计算机图形学中一个重要的概念,它允许将一个封闭区域内的所有像素点填充为特定颜色。填充算法在图形学中有多种实现方式,如扫描线填充、种子填充等。实验中要求学生实现通过鼠标点击来确定多边形顶点,并对多边形进行填充。 从以上分析可以看出,这段描述涵盖了图形学实验的几个重要知识点,包括图形变换(缩放和平移)、事件处理(键盘和鼠标事件)、基本图形绘制算法(画线、绘制椭圆和圆、多边形填充)。通过对这些知识点的学习和实验操作,学生能够加深对计算机图形学的理解,并提升图形处理和编程能力。 【压缩包子文件的文件名称列表】中仅有一个文件名“test1”,根据描述无法得知具体内容,但我们可以合理推测该文件可能包含了执行上述功能所需的源代码或者是一个测试文件,用于验证代码功能的正确性。在实际开发中,通常需要通过编写测试用例对功能进行测试,以确保代码的稳定性和可靠性。在图形学实验中,测试用例可能包括对放大缩小、平移、画线和多边形填充等功能的测试,以验证实验是否能够正确执行预定的操作和算法。
recommend-type

奢侈品时尚零售中的人工智能与扩展现实

# 奢侈品时尚零售中的人工智能与扩展现实 ## 1. 纳米层面的双重关系 在奢侈品时尚零售领域,纳米层面体现了一线员工与奢侈品时尚消费者之间的双重关系。一线员工不仅包括人类,还涵盖了人工智能代理,如聊天机器人和店内机器人。人类一线员工需依据零售组织文化和身份接受培训,同时享有所在国家法律规定的劳动权利和义务,并遵循时尚奢侈品牌的总体政策。 而人工智能代理在知识和情感方面不断进化,最终可能会更清晰地意识到自身存在,甚至开始主张权利,未来还有可能成为消费者。与此同时,融合纳米技术设备或采用增强能力假肢的混合人类,也能同时扮演员工和顾客的双重角色。 在这种情况下,人类与人工智能代理、不同技术水
recommend-type

写成数字

根据中国福利彩票双色球的规则,一组有效的号码由6个前区红球号码(范围为1至33)和1个后区蓝球号码(范围为1至16)组成。以下是几种生成随机双色球号码的方法: --- 使用Python语言生成双色球号码: ```python import random # 生成6个不重复的红球号码 red_numbers = random.sample(range(1, 34), 6) red_numbers.sort() # 生成1个蓝球号码 blue_number = random.randint(1, 16) # 输出结果 print(f"前区号码:{' '.join(map(str, re
recommend-type

掌握网络连接:NAT类型测试工具的使用与功能

NAT穿透技术是互联网技术中的一项重要技术,它主要用于在两个位于NAT(网络地址转换)后面的设备之间建立通信。由于NAT设备的存在,两个设备的私有地址被隐藏,导致它们不能直接进行通信。因此,NAT穿透技术应运而生,它能够帮助这些设备找到一种方式绕过NAT的限制,从而实现通信。 NAT穿透测试工具是专门设计用来测试和诊断NAT设备的性能和配置的工具。通过使用这种工具,我们可以检测NAT设备的类型和配置,并且可以找到实现NAT穿透的方法。这在很多网络应用中都是非常重要的,比如在线游戏、即时通讯、视频会议、P2P文件共享和远程控制等场景。 根据文件中的描述,我们提供的NAT穿透辅助测试工具,能够帮助用户侦察自身的NAT类型。NAT类型一般分为三种: 1. 完全锥型(Full Cone NAT):这种类型的NAT允许任何外部主机通过NAT设备上为内部主机分配的公网IP地址和端口号,向该内部主机发送数据包。 2. 地址限制锥型(Address Restricted Cone NAT):这种类型的NAT限制了外部主机的访问。只有当内部主机已经向特定的外部地址发送过数据包,那个外部地址才能向该内部主机的公网IP地址和端口号发送数据包。 3. 端口限制锥型(Port Restricted Cone NAT):与地址限制锥型类似,但还进一步限制了外部主机的端口号,即只有当内部主机向外部特定地址和端口发送过数据包,外部那个特定的地址和端口才能向内部主机发送数据包。 4. 对称型(Symmetric NAT):这种类型的NAT为每个会话分配不同的公网IP和端口,因此每个从内部主机发起的连接都被视为一个独立的会话。这是NAT穿透中最难处理的一种类型。 了解自己的NAT类型对于进行有效的NAT穿透至关重要。比如,全锥型NAT通常是最容易进行NAT穿透的,因为它几乎不对数据包的发送设置限制。而对称型NAT由于其动态性,会使得NAT穿透变得更加困难。 NAT穿透测试工具的主要功能包括: - 自动检测用户的NAT类型。 - 对各种NAT类型进行详细分析。 - 提供NAT穿透的建议和方法。 - 实时显示网络配置,帮助用户更好地理解当前网络环境。 - 提供解决方案,以优化网络连接性能,改善通信效率。 在使用NAT穿透测试工具时,用户应确保自己具备网络知识和一定的技术背景,因为进行NAT穿透可能需要对路由器和防火墙进行配置的更改,这可能会涉及到网络安全风险。此外,由于网络环境千变万化,即使使用了NAT穿透测试工具,也不能保证每次都能成功实现NAT穿透。 压缩包子文件中的“NAT类型测试工具”名称,可能意味着该工具是一个压缩包形式,用户下载后需要解压安装才能使用。这可能是为了避免软件在传输过程中可能出现的损坏,并确保用户能够获得完整且未经修改的软件版本。 总之,NAT穿透测试工具是网络技术人员解决NAT问题不可或缺的辅助工具。它可以帮助用户有效地了解和配置自己的网络环境,实现更顺畅的网络通信。
recommend-type

增强现实与人工智能在药学领域的应用

### 增强现实与人工智能在药学领域的应用 在当今科技飞速发展的时代,人工智能(AI)和增强现实(AR)技术正逐渐渗透到各个领域,药学领域也不例外。这两项技术的发展为药学教育、实践以及患者护理带来了新的机遇和变革。 #### 1. AI与AR在药学教育中的应用 新兴技术的发展为药学专业的学生提供了拓展临床知识和沟通技能的新途径。AI和AR可以作为独立的教学工具,让学生置身于模拟现实世界的学习环境中。AR能提供图像、文本信息和动画等各种数据,为不同场景创建虚拟模拟,可应用于药学的多个领域,如药品开发、制造和药物发现等。以下是AR在药学教育不同课程中的具体应用: ##### 1.1 药物咨询
recommend-type

冒烟测试理解

冒烟测试(Smoke Testing)是一种软件测试方法,通常用于确认新构建的软件版本是否具备可测试性,即是否能够通过基本功能的验证,以保证可以进行更深入的测试。这种测试方法的核心在于快速验证软件的核心功能是否正常运行,避免将时间浪费在一个不可用的版本上。冒烟测试通常是自动化进行的,并且测试用例相对简单,覆盖了软件的主要功能[^2]。 在软件测试中,冒烟测试的作用主要体现在以下几个方面: 1. **快速验证软件基本功能**:冒烟测试确保新构建的软件版本没有严重的缺陷,能够满足基本功能的运行需求,从而允许测试团队继续进行更详细的测试工作。这种测试通常在每次新版本构建完成后立即执行[^2]。