参考:
https://modelbest.feishu.cn/wiki/LZxLwp4Lzi29vXklYLFchwN5nCf
vllm:0.5.4
测试单卡4090,需要max-model-len 降到2000才能运行
CUDA_VISIBLE_DEVICES=1 vllm serve /ai/minicpmv --host 192**** --port 10868 --max-model-len 2000 --trust-remote-code --api-key token-abc123 --gpu_memory_utilization 1 --trust-remote-code --disable-frontend-multiprocessing --tensor-p