1 问题描述
运行模型训练,错误信息如下:
Traceback (most recent call last):File "/opt/Bert-VITS2/./text/chinese_bert.py", line 3, in <module>import torchFile "/root/anaconda3/envs/vits/lib/python3.9/site-packages/torch/__init__.py", line 191, in <module>_load_global_deps()File "/root/anaconda3/envs/vits/lib/python3.9/site-packages/torch/__init__.py", line 153, in _load_global_depsctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)File "/root/anaconda3/envs/vits/lib/python3.9/ctypes/__init__.py", line 382, in __init__self._handle = _dlopen(self._name, mode)
OSError: /root/anaconda3/envs/vits/lib/python3.9/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: symbol cublasLtGetStatusString, version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference
2 问题分析
查看模块版本,显示如下:
pip list
Package Version
----------------------------- ----------
......
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
......
torch 1.13.0
torchaudio 0.13.0
torchvision 0.14.0
......
此问题是由于torch和cuda的版本不匹配引起的,通过pytorch官网查看版本的对应关系。
网站地址:https://pytorch.org/get-started/locally/
3 问题解决
pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117
再次通过命令查看,显示如下
pip list
Package Version
----------------------------- ------------
......
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
......
torch 1.13.0+cu117
torchaudio 0.13.0
torchvision 0.14.0+cu117
......
再次运行程序,已经不再报错。