准备
视频教程
https://www.bilibili.com/video/BV1ce411J7nZ?p=14&vd_source=165c419c549bc8d0c2d71be2d7b93ccc
视频对应的资料
https://pan.baidu.com/wap/init?surl=AjPi7naUMcI3OGG9lDpnpQ&pwd=vai2#/home/%2FB%E7%AB%99%E5%85%AC%E5%BC%80%E8%AF%BE%E3%80%90%E8%AF%BE%E4%BB%B6%E3%80%91%2F%E6%9C%A8%E7%BE%BD%E8%80%81%E5%B8%88%E5%85%AC%E5%BC%80%E8%AF%BE%E8%AF%BE%E4%BB%B6/%2F2401276%E5%B0%8F%E6%97%B6%E6%8E%8C%E6%8F%A1%E5%BC%80%E6%BA%90%E5%A4%A7%E6%A8%A1%E5%9E%8B%E6%9C%AC%E5%9C%B0%E9%83%A8%E7%BD%B2%E5%88%B0%E5%BE%AE%E8%B0%83
使用“阿里云人工智能平台 PAI”
PAI-DSW免费试用
- https://free.aliyun.com/?spm=5176.14066474.J_5834642020.5.7b34754cmRbYhg&productCode=learn
- https://help.aliyun.com/document_detail/2261126.html
GPU规格和镜像版本选择(参考的 “基于Wav2Lip+TPS-Motion-Model+CodeFormer技术实现动漫风数字人”):
- dsw-registry-vpc.cn-beijing.cr.aliyuncs.com/pai/pytorch:1.12-gpu-py39-cu113-ubuntu20.04
- 规格名称为ecs.gn6v-c8g1.2xlarge,1 * NVIDIA V100
实操
环境准备和模型下载
创建conda虚拟环境
conda create --name chatglm3_test python=3.11# conda env list
/mnt/workspace> conda env list
# conda environments:
#
base /home/pai
chatglm3_test /home/pai/envs/chatglm3_test# conda activate chatglm3_test
# 如果报错CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.,可以执行source activate chatglm3_test。后面就可以正常使用conda activate 命令激活虚拟环境了
/mnt/workspace> source activate chatglm3_test
(chatglm3_test) /mnt/workspace> conda activate chatglm3_test
(chatglm3_test) /mnt/workspace> conda activate base
(base) /mnt/workspace> conda activate chatglm3_test
(chatglm3_test) /mnt/workspace>
查看当前驱动最高支持的CUDA版本
CUDA Version: 11.4
# nvidia-smi
(chatglm3_test) /mnt/workspace> nvidia-smi
Wed Jul 31 16:44:52 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01 Driver Version: 470.82.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... Off | 00000000:00:08.0 Off | 0 |
| N/A 34C P0 40W / 300W | 0MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
(chatglm3_test) /mnt/workspace>
在虚拟环境中安装Pytorch
进入Pytorch官网:https://pytorch.org/get-started/previous-versions/
(chatglm3_test) /mnt/workspace> conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=11.8 -c pytorch -c nvidia
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: -
Proceed ([y]/n)? yDownloading and Extracting PackagesPreparing transaction: done
Verifying transaction: done
Executing transaction: done
(chatglm3_test) /mnt/workspace>
Pytorch验证
- 如果输出是 True,则表示GPU版本的PyTorch已经安装成功并且可以使用CUDA
- 如果输出是
False,则表明没有安装GPU版本的PyTorch,或者CUDA环境没有正确配置,此时根据教程,重新检查
自己的执行过程。
(chatglm3_test) /mnt/workspace> python
Python 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
True
>>>
下载ChatGLM3的项目文件
(chatglm3_test) /mnt/workspace> mkdir chatglm3
(chatglm3_test) /mnt/workspace> cd chatglm3
(chatglm3_test) /mnt/workspace/chatglm3> git clone https://github.com/THUDM/ChatGLM3.git
Cloning into 'ChatGLM3'...
remote: Enumerating objects: 1549, done.
remote: Counting objects: 100% (244/244), done.
remote: Compressing objects: 100% (149/149), done.
remote: Total 1549 (delta 124), reused 182 (delta 93), pack-reused 1305
Receiving objects: 100% (1549/1549), 17.80 MiB | 7.97 MiB/s, done.
Resolving deltas: 100% (864/864), done.
(chatglm3_test) /mnt/workspace/chatglm3> ll ChatGLM3/
total 156
drwxrwxrwx 13 root root 4096 Jul 31 17:30 ./
drwxrwxrwx 3 root root 4096 Jul 31 17:30 ../
drwxrwxrwx 2 root root 4096 Jul 31 17:30 basic_demo/
drwxrwxrwx 4 root root 4096 Jul 31 17:30 composite_demo/
-rw-rw-rw- 1 root root 2304 Jul 31 17:30 DEPLOYMENT_en.md
-rw-rw-rw- 1 root root 2098 Jul 31 17:30 DEPLOYMENT.md
drwxrwxrwx 3 root root 4096 Jul 31 17:30 finetune_demo/
drwxrwxrwx 8 root root 4096 Jul 31 17:30 .git/
drwxrwxrwx 4 root root 4096 Jul 31 17:30 .github/
-rw-rw-rw- 1 root root 175 Jul 31 17:30 .gitignore
drwxrwxrwx 4 root root 4096 Jul 31 17:30 Intel_device_demo/
drwxrwxrwx 3 root root 4096 Jul 31 17:30 langchain_demo/
-rw-rw-rw- 1 root root 11353 Jul 31 17:30 LICENSE
-rw-rw-rw- 1 root root 5178 Jul 31 17:30 MODEL_LICENSE
drwxrwxrwx 2 root root 4096 Jul 31 17:30 openai_api_demo/
-rw-rw-rw- 1 root root 7118 Jul 31 17:30 PROMPT_en.md
-rw-rw-rw- 1 root root 6885 Jul 31 17:30 PROMPT.md
-rw-rw-rw- 1 root root 23163 Jul 31 17:30 README_en.md
-rw-rw-rw- 1 root root 22179 Jul 31 17:30 README.md
-rw-rw-rw- 1 root root 498 Jul 31 17:30 requirements.txt
drwxrwxrwx 2 root root 4096 Jul 31 17:30 resources/
drwxrwxrwx 2 root root 4096 Jul 31 17:30 tensorrt_llm_demo/
drwxrwxrwx 2 root root 4096 Jul 31 17:30 tools_using_demo/
-rw-rw-rw- 1 root root 240 Jul 31 17:30 update_requirements.sh
(chatglm3_test) /mnt/workspace/chatglm3>
升级pip版本
python -m pip install --upgrade pip
使用pip安装ChatGLM运行的项目依赖
(chatglm3_test) /mnt/workspace/chatglm3> cd ChatGLM3/
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> pip install -r requirements.txt
(不推荐) 从Hugging Face下载ChatGLM3模型权重
# 初始化Git LFS
apt-get install git-lfs# 初始化Git LFS
# git lfs install
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> git lfs install
Updated git hooks.
Git LFS initialized.
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> # 使用 Git LFS 下载ChatGLM3-6B的模型权重
git clone https://huggingface.co/THUDM/chatglm3-6b
# 无法访问,需要挂梯子
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> git clone https://huggingface.co/THUDM/chatglm3-6b
Cloning into 'chatglm3-6b'...
fatal: unable to access 'https://huggingface.co/THUDM/chatglm3-6b/': Failed to connect to huggingface.co port 443: Connection timed out
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3>
(base) /mnt/workspace/chatglm3/ChatGLM3> ping https://huggingface.co/
ping: https://huggingface.co/: Name or service not known
(base) /mnt/workspace/chatglm3/ChatGLM3>
(推荐)从modelscope下载ChatGLM3模型权重
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> pip install modelscope
(base) /mnt/workspace/chatglm3/ChatGLM3> mkdir /mnt/workspace/chatglm3-6b/(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> python
Python 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from modelscope import snapshot_download
>>> model_dir = snapshot_download("ZhipuAI/chatglm3-6b",cache_dir="/mnt/workspace/chatglm3-6b/", revision = "v1.0.0")
2024-07-31 18:03:22,937 - modelscope - INFO - Use user-specified model revision: v1.0.0
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1.29k/1.29k [00:00<00:00, 2.54kB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 40.0/40.0 [00:00<00:00, 66.8B/s]
...
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> mv /mnt/workspace/chatglm3-6b/ZhipuAI/chatglm3-6b ./
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> ll chatglm3-6b
total 12195768
drwxrwxrwx 2 root root 4096 Jul 31 18:05 ./
drwxrwxrwx 14 root root 4096 Jul 31 18:07 ../
-rw-rw-rw- 1 root root 1317 Jul 31 18:03 config.json
-rw-rw-rw- 1 root root 2332 Jul 31 18:03 configuration_chatglm.py
-rw-rw-rw- 1 root root 40 Jul 31 18:03 configuration.json
-rw-rw-rw- 1 root root 42 Jul 31 18:03 .mdl
-rw-rw-rw- 1 root root 55596 Jul 31 18:03 modeling_chatglm.py
-rw-rw-rw- 1 root root 4133 Jul 31 18:03 MODEL_LICENSE
-rw------- 1 root root 1422 Jul 31 18:05 .msc
-rw-rw-rw- 1 root root 36 Jul 31 18:05 .mv
-rw-rw-rw- 1 root root 1827781090 Jul 31 18:03 pytorch_model-00001-of-00007.bin
-rw-rw-rw- 1 root root 1968299480 Jul 31 18:03 pytorch_model-00002-of-00007.bin
-rw-rw-rw- 1 root root 1927415036 Jul 31 18:04 pytorch_model-00003-of-00007.bin
-rw-rw-rw- 1 root root 1815225998 Jul 31 18:04 pytorch_model-00004-of-00007.bin
-rw-rw-rw- 1 root root 1968299544 Jul 31 18:04 pytorch_model-00005-of-00007.bin
-rw-rw-rw- 1 root root 1927415036 Jul 31 18:05 pytorch_model-00006-of-00007.bin
-rw-rw-rw- 1 root root 1052808542 Jul 31 18:05 pytorch_model-00007-of-00007.bin
-rw-rw-rw- 1 root root 20437 Jul 31 18:05 pytorch_model.bin.index.json
-rw-rw-rw- 1 root root 14692 Jul 31 18:05 quantization.py
-rw-rw-rw- 1 root root 4474 Jul 31 18:05 README.md
-rw-rw-rw- 1 root root 11279 Jul 31 18:05 tokenization_chatglm.py
-rw-rw-rw- 1 root root 244 Jul 31 18:05 tokenizer_config.json
-rw-rw-rw- 1 root root 1018370 Jul 31 18:05 tokenizer.model
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> rm -rf /mnt/workspace/chatglm3-6b
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3>
运行ChatGLM3-6B模型
方式一、基于命令行的交互式对话
这种方式可以为非技术用户提供一个脱离代码环境的对话方式。
对于这种启动方式,官方提供的脚本名称是cli_demo.py,在运行之前,需要确认一下模型的加载路径,修改为正确的地址
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> ll basic_demo/cli_demo.py
-rw-rw-rw- 1 root root 2065 Jul 31 17:30 basic_demo/cli_demo.py
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3>
修改完成后,直接使用 python cli_demp.py 即可启动,如果启动成功,就会开启交互式对话,如
果输入 stop 可以退出该运行环境。
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> cd basic_demo/
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/basic_demo> python cli_demo.py
有出来结果,但是反应很慢…
方式二、基于 Gradio 的Web端对话应用
基于网页端的对话是目前非常通用的大语言交互方式,ChatGLM3官方项目组提供了两种Web端对话demo,两个示例应用功能一致,只是采用了不同的Web框架进行开发。首先是基于 Gradio 的Web 端对话应用demo。Gradio是一个Python库,用于快速创建用于演示机器学习模型的Web界面。开发者可以用几行代码为模型创建输入和输出接口,用户可以通过这些接口与模型进行交互。用户可以轻松地测试和使用机器学习模型,比如通过上传图片来测试图像识别模型,或者输入文本来测试自然语言处理模型。Gradio非常适合于快速原型设计和模型展示。
对于这种启动方式,官方提供的脚本名称是web_demo_gradio.py,在运行之前,需要确认一下模型的加载路径,修改为正确的地址。
直接使用python启动即可,如果启动正常,会自动弹出Web页面,可以直接在Web页面上进行交互。
# 执行报错 No module named 'peft'
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/basic_demo> python web_demo_gradio.py
Traceback (most recent call last):File "/mnt/workspace/chatglm3/ChatGLM3/basic_demo/web_demo_gradio.py", line 26, in <module>from peft import AutoPeftModelForCausalLM, PeftModelForCausalLM
ModuleNotFoundError: No module named 'peft'
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/basic_demo>
# 之前应该是下载了的
(base) /mnt/workspace/chatglm3/ChatGLM3> grep gradio requirements.txt
gradio>=4.26.0
(base) /mnt/workspace/chatglm3/ChatGLM3>
# 确实没有peft
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/basic_demo> conda list | grep peft
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/basic_demo> conda list | grep gradio
gradio 4.39.0 pypi_0 pypi
gradio-client 1.1.1 pypi_0 pypi
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/basic_demo>
# 安装peft
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/basic_demo> pip install peftchatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/basic_demo> python web_demo_gradio.py
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00, 5.60it/s]
Running on local URL: http://127.0.0.1:7870To create a public link, set `share=True` in `launch()`.
我出现了超时的情况
====conversation====[{'role': 'user', 'content': 'hello'}]
Traceback (most recent call last):File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/gradio/queueing.py", line 536, in process_eventsresponse = await route_utils.call_process_api(^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/gradio/route_utils.py", line 276, in call_process_apioutput = await app.get_blocks().process_api(^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/gradio/blocks.py", line 1923, in process_apiresult = await self.call_function(^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/gradio/blocks.py", line 1520, in call_functionprediction = await utils.async_iteration(iterator)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/gradio/utils.py", line 663, in async_iterationreturn await iterator.__anext__()^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/gradio/utils.py", line 656, in __anext__return await anyio.to_thread.run_sync(^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_syncreturn await get_async_backend().run_sync_in_worker_thread(^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_threadreturn await future^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in runresult = context.run(func, *args)^^^^^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/gradio/utils.py", line 639, in run_sync_iterator_asyncreturn next(iterator)^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/gradio/utils.py", line 801, in gen_wrapperresponse = next(iterator)^^^^^^^^^^^^^^File "/mnt/workspace/chatglm3/ChatGLM3/basic_demo/web_demo_gradio.py", line 145, in predictfor new_token in streamer:File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/transformers/generation/streamers.py", line 223, in __next__value = self.text_queue.get(timeout=self.timeout)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/queue.py", line 179, in getraise Empty
_queue.Empty
修改超时时间
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/basic_demo> vi web_demo_gradio.py
...#streamer = TextIteratorStreamer(tokenizer, timeout=60, skip_prompt=True, skip_special_tokens=True)streamer = TextIteratorStreamer(tokenizer, timeout=600, skip_prompt=True, skip_special_tokens=True)
...
再次运行就可以了
方式三、基于 Streamlit 的Web端对话应用
ChatGLM3官方提供的第二个Web对话应用demo,是一个基于Streamlit的Web应用。Streamlit是另一个用于创建数据科学和机器学习Web应用的Python库。它强调简单性和快速的开发流程,让开发者能够通过编写普通的Python脚本来创建互动式Web应用。Streamlit自动管理UI布局和状态,这样开发者就可以专注于数据和模型的逻辑。Streamlit应用通常用于数据分析、可视化、构建探索性数据分析工具等场景。
对于这种启动方式,官方提供的脚本名称是web_demo_streamlit.py。同样,先使用 vim 编辑器修改模型的加载路径。
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/basic_demo> vi web_demo_streamlit.py
...
#MODEL_PATH = os.environ.get('MODEL_PATH', 'THUDM/chatglm3-6b')
MODEL_PATH = os.environ.get('MODEL_PATH', '../chatglm3-6b')
启动命令略有不同,不再使用 python ,而是需要使用 streamkit run 的方式来启动。
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/basic_demo> streamlit run web_demo_streamlit.pyCollecting usage statistics. To deactivate, set browser.gatherUsageStats to false.You can now view your Streamlit app in your browser.Local URL: http://localhost:8501Network URL: http://10.224.132.38:8501External URL: http://39.107.58.222:8501
实践的人时候,回复的字出来得很慢…有些奇怪
方式四、在指定虚拟环境的Jupyter Lab中运行
我们在部署Chatglm3-6B模型之前,创建了一个 chatglme3_test 虚拟环境来支撑该模型的运行。
除了在终端中使用命令行启动,同样可以在Jupyter Lab环境中启动这个模型。具体的执行过程如下:
确认conda环境,并在该环境中安装 ipykernel 软件包。这个软件包将允许Jupyter Notebook使用特定环境的Python。
(base) /mnt/workspace/chatglm3/ChatGLM3> conda env list
# conda environments:
#
base * /home/pai
chatglm3_test /home/pai/envs/chatglm3_test(base) /mnt/workspace/chatglm3/ChatGLM3> conda activate chatglm3_test
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> conda install ipykernel
Collecting package metadata (current_repodata.json): done
Solving environment: /
...
将该环境添加到Jupyter Notebook中。运行以下命令:
# 这里的 chatglm3_test 替换成需要使用的虚拟环境名称
python -m ipykernel install --user --name=yenv_name --displayname="Python(chatglm3_test)"# 报错 No module named ipykernel
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> python -m ipykernel install --user --name=yenv_name --display-name="Python (chatglm3_test)"
/home/pai/envs/chatglm3_test/bin/python: No module named ipykernel
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> # 使用 pip安装 ipykernel
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> pip install ipykernel
# 再次执行
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> python -m ipykernel install --user --name=yenv_name --display-name="Python (chatglm3_test)"
Installed kernelspec yenv_name in /root/.local/share/jupyter/kernels/yenv_name
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3>
执行完上述过程后,在终端输入 jupyter lab 启动。
# Jupyter command `jupyter-lab` not found.
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> jupyter lab
usage: jupyter [-h] [--version] [--config-dir] [--data-dir] [--runtime-dir] [--paths] [--json] [--debug] [subcommand]Jupyter: Interactive Computingpositional arguments:subcommand the subcommand to launchoptions:-h, --help show this help message and exit--version show the versions of core jupyter packages and exit--config-dir show Jupyter config dir--data-dir show Jupyter data dir--runtime-dir show Jupyter runtime dir--paths show all Jupyter paths. Add --json for machine-readable format.--json output paths as machine-readable json--debug output debug information about pathsAvailable subcommands: bundlerextension console dejavu events execute kernel kernelspec migrate nbclassic nbconvert nbextension notebook qtconsole
run server serverextension troubleshoot trustJupyter command `jupyter-lab` not found.
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> # 安装jupyterlab
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> pip install jupyterlab
# 安装其他需要的包
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> pip install nni# 再次执行
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> jupyter lab
[I 2024-07-31 21:17:54.158 ServerApp] jupyter_lsp | extension was successfully linked.
[I 2024-07-31 21:17:54.162 ServerApp] jupyter_server_terminals | extension was successfully linked.
[I 2024-07-31 21:17:54.167 ServerApp] jupyterlab | extension was successfully linked.
[I 2024-07-31 21:17:54.167 ServerApp] nni.tools.jupyter_extension | extension was successfully linked.
[I 2024-07-31 21:17:54.354 ServerApp] notebook_shim | extension was successfully linked.
[I 2024-07-31 21:17:54.369 ServerApp] notebook_shim | extension was successfully loaded.
[I 2024-07-31 21:17:54.371 ServerApp] jupyter_lsp | extension was successfully loaded.
[I 2024-07-31 21:17:54.372 ServerApp] jupyter_server_terminals | extension was successfully loaded.
[I 2024-07-31 21:17:54.373 LabApp] JupyterLab extension loaded from /home/pai/envs/chatglm3_test/lib/python3.11/site-packages/jupyterlab
[I 2024-07-31 21:17:54.373 LabApp] JupyterLab application directory is /home/pai/envs/chatglm3_test/share/jupyter/lab
[I 2024-07-31 21:17:54.374 LabApp] Extension Manager is 'pypi'.
[I 2024-07-31 21:17:54.413 ServerApp] jupyterlab | extension was successfully loaded.
[I 2024-07-31 21:17:54.413 ServerApp] nni.tools.jupyter_extension | extension was successfully loaded.
[C 2024-07-31 21:17:54.414 ServerApp] Running as root is not recommended. Use --allow-root to bypass.
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> jupyter lab --allow-root# 有些报错
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> jupyter lab --allow-root
[I 2024-07-31 21:18:05.910 ServerApp] jupyter_lsp | extension was successfully linked.
[I 2024-07-31 21:18:05.915 ServerApp] jupyter_server_terminals | extension was successfully linked.
[I 2024-07-31 21:18:05.919 ServerApp] jupyterlab | extension was successfully linked.
[I 2024-07-31 21:18:05.919 ServerApp] nni.tools.jupyter_extension | extension was successfully linked.
[I 2024-07-31 21:18:06.115 ServerApp] notebook_shim | extension was successfully linked.
[I 2024-07-31 21:18:06.131 ServerApp] notebook_shim | extension was successfully loaded.
[I 2024-07-31 21:18:06.134 ServerApp] jupyter_lsp | extension was successfully loaded.
[I 2024-07-31 21:18:06.135 ServerApp] jupyter_server_terminals | extension was successfully loaded.
[I 2024-07-31 21:18:06.136 LabApp] JupyterLab extension loaded from /home/pai/envs/chatglm3_test/lib/python3.11/site-packages/jupyterlab
[I 2024-07-31 21:18:06.136 LabApp] JupyterLab application directory is /home/pai/envs/chatglm3_test/share/jupyter/lab
[I 2024-07-31 21:18:06.137 LabApp] Extension Manager is 'pypi'.
[I 2024-07-31 21:18:06.176 ServerApp] jupyterlab | extension was successfully loaded.
[I 2024-07-31 21:18:06.176 ServerApp] nni.tools.jupyter_extension | extension was successfully loaded.
[I 2024-07-31 21:18:06.177 ServerApp] Serving notebooks from local directory: /mnt/workspace/chatglm3/ChatGLM3
[I 2024-07-31 21:18:06.177 ServerApp] Jupyter Server 2.14.2 is running at:
[I 2024-07-31 21:18:06.177 ServerApp] http://localhost:8888/lab?token=d86607de475637c21dedc034d312f35a48640fe155b0bbe2
[I 2024-07-31 21:18:06.177 ServerApp] http://127.0.0.1:8888/lab?token=d86607de475637c21dedc034d312f35a48640fe155b0bbe2
[I 2024-07-31 21:18:06.177 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 2024-07-31 21:18:06.181 ServerApp] No web browser found: Error('could not locate runnable browser').
[C 2024-07-31 21:18:06.181 ServerApp] To access the server, open this file in a browser:file:///root/.local/share/jupyter/runtime/jpserver-5341-open.htmlOr copy and paste one of these URLs:http://localhost:8888/lab?token=d86607de475637c21dedc034d312f35a48640fe155b0bbe2http://127.0.0.1:8888/lab?token=d86607de475637c21dedc034d312f35a48640fe155b0bbe2
[I 2024-07-31 21:18:06.532 ServerApp] Skipped non-installed server(s): bash-language-server, dockerfile-language-server-nodejs, javascript-typescript-langserver, jedi-language-server, julia-language-server, pyright, python-language-server, python-lsp-server, r-languageserver, sql-language-server, texlab, typescript-language-server, unified-language-server, vscode-css-languageserver-bin, vscode-html-languageserver-bin, vscode-json-languageserver-bin, yaml-language-server
[W 2024-07-31 21:19:26.365 LabApp] Blocking request with non-local 'Host' 115450-proxy-8888.dsw-gateway-cn-beijing.data.aliyun.com (115450-proxy-8888.dsw-gateway-cn-beijing.data.aliyun.com). If the server should be accessible at that name, set ServerApp.allow_remote_access to disable the check.
[E 2024-07-31 21:19:26.376 ServerApp] Could not open static file ''
[W 2024-07-31 21:19:26.377 LabApp] 403 GET /lab?token=[secret] (@127.0.0.1) 12.72ms referer=None
[W 2024-07-31 21:19:26.771 ServerApp] Blocking request with non-local 'Host' 115450-proxy-8888.dsw-gateway-cn-beijing.data.aliyun.com (115450-proxy-8888.dsw-gateway-cn-beijing.data.aliyun.com). If the server should be accessible at that name, set ServerApp.allow_remote_access to disable the check.
[W 2024-07-31 21:19:26.774 ServerApp] 403 GET /static/lab/style/bootstrap-theme.min.css (@127.0.0.1) 3.13ms referer=https://115450-proxy-8888.dsw-gateway-cn-beijing.data.aliyun.com/lab?token=[secret]
加上"–ServerApp.allow_remote_access=True"
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> jupyter lab --allow-root --ServerApp.allow_remote_access=True
[I 2024-07-31 21:44:48.328 ServerApp] jupyter_lsp | extension was successfully linked.
[I 2024-07-31 21:44:48.333 ServerApp] jupyter_server_terminals | extension was successfully linked.
[I 2024-07-31 21:44:48.337 ServerApp] jupyterlab | extension was successfully linked.
[I 2024-07-31 21:44:48.337 ServerApp] nni.tools.jupyter_extension | extension was successfully linked.
[I 2024-07-31 21:44:48.522 ServerApp] notebook_shim | extension was successfully linked.
[I 2024-07-31 21:44:48.537 ServerApp] notebook_shim | extension was successfully loaded.
[I 2024-07-31 21:44:48.539 ServerApp] jupyter_lsp | extension was successfully loaded.
[I 2024-07-31 21:44:48.540 ServerApp] jupyter_server_terminals | extension was successfully loaded.
[I 2024-07-31 21:44:48.541 LabApp] JupyterLab extension loaded from /home/pai/envs/chatglm3_test/lib/python3.11/site-packages/jupyterlab
[I 2024-07-31 21:44:48.541 LabApp] JupyterLab application directory is /home/pai/envs/chatglm3_test/share/jupyter/lab
[I 2024-07-31 21:44:48.542 LabApp] Extension Manager is 'pypi'.
[I 2024-07-31 21:44:48.580 ServerApp] jupyterlab | extension was successfully loaded.
[I 2024-07-31 21:44:48.580 ServerApp] nni.tools.jupyter_extension | extension was successfully loaded.
[I 2024-07-31 21:44:48.580 ServerApp] The port 8888 is already in use, trying another port.
[I 2024-07-31 21:44:48.581 ServerApp] Serving notebooks from local directory: /mnt/workspace/chatglm3/ChatGLM3
[I 2024-07-31 21:44:48.581 ServerApp] Jupyter Server 2.14.2 is running at:
[I 2024-07-31 21:44:48.581 ServerApp] http://localhost:8889/lab?token=1d76fe80713c071a02f8343fd835d5a6d46b13bb2efa0462
[I 2024-07-31 21:44:48.581 ServerApp] http://127.0.0.1:8889/lab?token=1d76fe80713c071a02f8343fd835d5a6d46b13bb2efa0462
[I 2024-07-31 21:44:48.581 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 2024-07-31 21:44:48.585 ServerApp] No web browser found: Error('could not locate runnable browser').
[C 2024-07-31 21:44:48.585 ServerApp] To access the server, open this file in a browser:file:///root/.local/share/jupyter/runtime/jpserver-5910-open.htmlOr copy and paste one of these URLs:http://localhost:8889/lab?token=1d76fe80713c071a02f8343fd835d5a6d46b13bb2efa0462http://127.0.0.1:8889/lab?token=1d76fe80713c071a02f8343fd835d5a6d46b13bb2efa0462
[I 2024-07-31 21:44:48.918 ServerApp] Skipped non-installed server(s): bash-language-server, dockerfile-language-server-nodejs, javascript-typescript-langserver, jedi-language-server, julia-language-server, pyright, python-language-server, python-lsp-server, r-languageserver, sql-language-server, texlab, typescript-language-server, unified-language-server, vscode-css-languageserver-bin, vscode-html-languageserver-bin, vscode-json-languageserver-bin, yaml-language-server
[I 2024-07-31 21:44:52.018 LabApp] 302 GET /lab (@127.0.0.1) 1.06ms
点击地址
按照页面提示,即可进到页面
- 使用打印出来的token,set密码即可
- 或者将token拼接到url后,然后访问,应该也行。https://115450-proxy-8889.dsw-gateway-cn-beijing.data.aliyun.com/lab
创建一个notebook
视频教程和对应文档里给出的命令是本地跑的,和我的环境可能不一样,我实际执行的时候遇到问题
from transformers import AutoTokenizer, AutoModeltokenizer = AutoTokenizer.from_pretrained('/mnt/workspace/chatglm3/ChatGLM3/chatglm3-6b', trust_remote_code=True)
model = AutoModel.from_pretrained('/mnt/workspace/chatglm3/ChatGLM3/chatglm3-6b', trust_remote_code=True, device="cuda")
model= model.eval()response, history = model.chat(tokenizer, "你好", history=[])
print(response)#报错
RuntimeError: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.
最后是参考 basic_demo/cli_demo.py 调整了一下命令,可以顺利执行
https://115450-proxy-8889.dsw-gateway-cn-beijing.data.aliyun.com/lab/tree/Untitled.ipynb
from transformers import AutoTokenizer, AutoModeltokenizer = AutoTokenizer.from_pretrained('/mnt/workspace/chatglm3/ChatGLM3/chatglm3-6b', trust_remote_code=True)
model = AutoModel.from_pretrained('/mnt/workspace/chatglm3/ChatGLM3/chatglm3-6b', trust_remote_code=True, device_map="auto")
model= model.eval()response, history = model.chat(tokenizer, "你好", history=[])
print(response)
教程里的解释
- 从transformers中加载AutoTokenizer 和 AutoModel,指定好模型的路径即可。tokenizer
这个词大家应该不会很陌生,可以简单理解我们在之前使用gpt系列模型的时候,使用tiktoken库帮我们把输入的自然语言,也就是prompt按照一种特定的编码方式来切分成token,从而生成API调用的成本。但在Transform中tokenizer要干的事会更多一些,它会把输入到大语言模型的文本,包在tokenizer中去做一些前置的预处理,会将自然语言文本转换为模型能够理解的格式,然后拆分为tokens(如单词、字符或子词单位)等操作。 - 而对于模型的加载来说,官方的代码中指向的路径是 THUDM/chatglm3-6b ,表示可以直接在云端加载模型,所以如果我们没有下载chatglm3-6b模型的话,直接运行此代码也是可以的,只不过第一次加载会很慢,耐心等待即可,同时需要确保当前的网络是联通的(必要的情况下需要开梯子)。因为我们已经将ChatGLM3-6B的模型权重下载到本地了,所以此处可以直接指向我们下载的Chatglm3-6b模型的存储路径来进行推理测试。
- 对于其他参数来说,model 有一个eval模式,就是评估的方法,模型基本就是两个阶段的事,一个是训练,一个是推理,计算的量更大,它需要把输入的值做一个推理,如果是一个有监督的模型,那必然存在一个标签值,也叫真实值,这个值会跟模型推理的值做一个比较,这个过程是正向传播。差异如果很大,就说明这个模型的能力还远远不够,既然效果不好,就要调整参数来不断地修正,通过不断地求导,链式法则等方式进行反向传播。当模型训练好了,模型的参数就不会变了,形成一个静态的文件,可以下载下来,当我们使用的时候,就不需要这个反向传播的过程,只需要做正向的推理就好了,此处设置 model.eval()就是说明这个过程。而trust_remote_code=True 表示信任远程代码(如果有), device=‘cuda’ 表示将模型加载到CUDA设备上以便使用GPU加速,这两个就很好理解了。
方式五、OpenAI风格API调用方法
ChatGLM3-6B模型提供了OpenAI风格的API调用方法。正如此前所说,在OpenAI几乎定义了整个前沿AI应用开发标准的当下,提供一个OpenAI风格的API调用方法,毫无疑问可以让ChatGLM3模型无缝接入OpenAI开发生态。所谓的OpenAI风格的API调用,指的是借助OpenAI库中的ChatCompletion函数进行ChatGLM3模型调用。而现在,我们只需要在model参数上输入chatglm3-6b,即可调用ChatGLM3模型。调用API风格的统一,无疑也将大幅提高开发效率。
而要执行OpenAI风格的API调用,则首先需要安装openai库,并提前运行openai_api.py脚本。
首先需要注意:OpenAI目前已将openai库更新至1.x,但目前Chatglm3-6B仍需要使用旧版本
0.28。所以需要确保当前环境的openai版本。
(chatglm3_test) /mnt/workspace> conda list | grep openai
openai 1.37.1 pypi_0 pypi
(chatglm3_test) /mnt/workspace> pip install openai==0.28.1
...
(chatglm3_test) /mnt/workspace> conda list | grep openai
openai 0.28.1 pypi_0 pypi
(chatglm3_test) /mnt/workspace>
需要安装tiktoken包,用于将文本分割成 tokens
需要降级 typing_extensions 依赖包,否则会报错
需要安装 sentence_transformers 依赖包,安装最新的即可
# tiktoken
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> conda list | grep tiktoken
tiktoken 0.7.0 pypi_0 pypi
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> pip install tiktoken
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> conda list | grep tiktoken
tiktoken 0.7.0 pypi_0 pypi
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> # 降级 typing_extensions 依赖包
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> conda list | grep typing_extensions
typing_extensions 4.11.0 py311h06a4308_0
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> pip install typing_extensions==4.8.0
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> pip install typing_extensions==4.8.0
Looking in indexes: https://mirrors.cloud.aliyuncs.com/pypi/simple
Requirement already satisfied: typing_extensions==4.8.0 in /home/pai/envs/chatglm3_test/lib/python3.11/site-packages (4.8.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3>
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> conda list | grep typing_extensions
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> ll /home/pai/envs/chatglm3_test/lib/python3.11/site-packages | grep typing_extensions
drwxrwxrwx 2 root root 4096 Jul 31 22:27 typing_extensions-4.8.0.dist-info/
-rw-rw-rw- 1 root root 103397 Jul 31 22:27 typing_extensions.py
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> # 安装 sentence_transformers 依赖包
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> pip install sentence_transformers
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> ll /home/pai/envs/chatglm3_test/lib/python3.11/site-packages | grep sentence_transformers
drwxrwxrwx 9 root root 4096 Jul 31 17:35 sentence_transformers/
drwxrwxrwx 2 root root 4096 Jul 31 17:35 sentence_transformers-3.0.1.dist-info/
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3>
安装完成后,使用命令 python openai_api.py 启动,第一次启动会有点慢,耐心等待。
当前下载的代码里没有python api_server.py,查看git上的readme文件之后,确认当下是api_server.py
# 没有教程里说的文件openai_api.py
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> ll
total 52
drwxrwxrwx 2 root root 4096 Jul 31 17:30 ./
drwxrwxrwx 15 root root 4096 Jul 31 22:17 ../
-rw-rw-rw- 1 root root 18125 Jul 31 17:30 api_server.py
-rw-rw-rw- 1 root root 1907 Jul 31 17:30 docker-compose.yml
-rw-rw-rw- 1 root root 67 Jul 31 17:30 .env
-rw-rw-rw- 1 root root 1566 Jul 31 17:30 langchain_openai_api.py
-rw-rw-rw- 1 root root 3097 Jul 31 17:30 openai_api_request.py
-rw-rw-rw- 1 root root 6285 Jul 31 17:30 utils.py
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> # https://github.com/THUDM/ChatGLM3?tab=readme-ov-file#openai-api--zhipu-api-demo
# python api_server.py
#修改 MODEL_PATH
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> vi api_server.py
#MODEL_PATH = os.environ.get('MODEL_PATH', 'THUDM/chatglm3-6b')
MODEL_PATH = os.environ.get('MODEL_PATH', '../chatglm3-6b')# 再次执行,报错没有BAAI/bge-m3,且连不上huggingface下载
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> python api_server.py
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00, 4.35it/s]
No sentence-transformers model found with name BAAI/bge-m3. Creating a new one with mean pooling.
/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.warnings.warn(
Traceback (most recent call last):File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/urllib3/connection.py", line 196, in _new_connsock = connection.create_connection(...File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/transformers/utils/hub.py", line 441, in cached_fileraise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like BAAI/bge-m3 is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo>
下载api_server.py需要的BAAI/bge-m3,推荐从modelscope下载
https://www.modelscope.cn/models/Xorbits/bge-m3/files
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> vi api_server.py
...
# set Embedding Model path
EMBEDDING_PATH = os.environ.get('EMBEDDING_PATH', 'BAAI/bge-m3')#
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> mkdir /mnt/workspace/bge-m3
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> python
Python 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from modelscope import snapshot_download
>>> model_dir = snapshot_download("Xorbits/bge-m3",cache_dir="/mnt/workspace/bge-m3/", revision = "v1.0.0")
Traceback (most recent call last):File "<stdin>", line 1, in <module>File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/modelscope/hub/snapshot_download.py", line 74, in snapshot_downloadreturn _snapshot_download(^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/modelscope/hub/snapshot_download.py", line 194, in _snapshot_downloadrevision_detail = _api.get_valid_revision_detail(^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/modelscope/hub/api.py", line 544, in get_valid_revision_detailraise NotExistError('The model: %s has no revision: %s valid are: %s!' %
modelscope.hub.errors.NotExistError: The model: Xorbits/bge-m3 has no revision: v1.0.0 valid are: [v0.0.1]!
>>> model_dir = snapshot_download("Xorbits/bge-m3",cache_dir="/mnt/workspace/bge-m3/", revision = "v0.0.1")
2024-07-31 22:58:30,265 - modelscope - INFO - Use user-specified model revision: v0.0.1
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 2.00M/2.00M [00:00<00:00, 3.81MB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 687/687 [00:00<00:00, 1.38kB/s]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 191/191 [00:00<00:00, 313B/s]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 123/123 [00:00<00:00, 194B/s]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 181/181 [00:00<00:00, 350B/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 213k/213k [00:00<00:00, 525kB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 196k/196k [00:00<00:00, 456kB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 318k/318k [00:00<00:00, 695kB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 2.12G/2.12G [00:11<00:00, 193MB/s]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 349/349 [00:00<00:00, 686B/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 2.12G/2.12G [00:11<00:00, 201MB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1.32k/1.32k [00:00<00:00, 2.24kB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 54.0/54.0 [00:00<00:00, 107B/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 4.83M/4.83M [00:00<00:00, 9.73MB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 3.43k/3.43k [00:00<00:00, 6.67kB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 964/964 [00:00<00:00, 1.94kB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 16.3M/16.3M [00:00<00:00, 24.1MB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1.28k/1.28k [00:00<00:00, 2.52kB/s]
>>> (chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> mv /mnt/workspace/bge-m3/Xorbits/bge-m3 /mnt/workspace/chatglm3/ChatGLM3
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> ll /mnt/workspace/chatglm3/ChatGLM3/bge-m3/
total 4459532
drwxrwxrwx 4 root root 4096 Jul 31 22:59 ./
drwxrwxrwx 16 root root 4096 Jul 31 23:02 ../
drwxrwxrwx 2 root root 4096 Jul 31 22:58 1_Pooling/
-rw-rw-rw- 1 root root 2100674 Jul 31 22:58 colbert_linear.pt
-rw-rw-rw- 1 root root 687 Jul 31 22:58 config.json
-rw-rw-rw- 1 root root 123 Jul 31 22:58 config_sentence_transformers.json
-rw-rw-rw- 1 root root 181 Jul 31 22:58 configuration.json
drwxrwxrwx 2 root root 4096 Jul 31 22:58 imgs/
-rw-rw-rw- 1 root root 37 Jul 31 22:58 .mdl
-rw-rw-rw- 1 root root 2271064456 Jul 31 22:58 model.safetensors
-rw-rw-rw- 1 root root 349 Jul 31 22:58 modules.json
-rw------- 1 root root 1320 Jul 31 22:59 .msc
-rw-rw-rw- 1 root root 36 Jul 31 22:59 .mv
-rw-rw-rw- 1 root root 2271145830 Jul 31 22:59 pytorch_model.bin
-rw-rw-rw- 1 root root 1356 Jul 31 22:59 README.md
-rw-rw-rw- 1 root root 54 Jul 31 22:59 sentence_bert_config.json
-rw-rw-rw- 1 root root 5069051 Jul 31 22:59 sentencepiece.bpe.model
-rw-rw-rw- 1 root root 3516 Jul 31 22:59 sparse_linear.pt
-rw-rw-rw- 1 root root 964 Jul 31 22:59 special_tokens_map.json
-rw-rw-rw- 1 root root 1313 Jul 31 22:59 tokenizer_config.json
-rw-rw-rw- 1 root root 17098108 Jul 31 22:59 tokenizer.json
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo>
修改bge-m3的地址,再次执行
# set Embedding Model path
#EMBEDDING_PATH = os.environ.get('EMBEDDING_PATH', 'BAAI/bge-m3')
EMBEDDING_PATH = os.environ.get('EMBEDDING_PATH', '../bge-m3')(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> python api_server.py
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00, 4.68it/s]
Traceback (most recent call last):File "/mnt/workspace/chatglm3/ChatGLM3/openai_api_demo/api_server.py", line 537, in <module>embedding_model = SentenceTransformer(EMBEDDING_PATH, device="cuda")^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/sentence_transformers/SentenceTransformer.py", line 316, in __init__self.to(device)File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1173, in toreturn self._apply(convert)^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/torch/nn/modules/module.py", line 779, in _applymodule._apply(fn)File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/torch/nn/modules/module.py", line 779, in _applymodule._apply(fn)File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/torch/nn/modules/module.py", line 779, in _applymodule._apply(fn)[Previous line repeated 1 more time]File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/torch/nn/modules/module.py", line 804, in _applyparam_applied = fn(param)^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1159, in convertreturn t.to(^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/torch/cuda/__init__.py", line 293, in _lazy_inittorch._C._cuda_init()
RuntimeError: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> # 参考之前的经验,尝试修改
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> vi api_server.py #embedding_model = SentenceTransformer(EMBEDDING_PATH, device="cuda")embedding_model = SentenceTransformer(EMBEDDING_PATH, device_map="auto")# 报错了,参数有问题
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> python api_server.py
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00, 4.78it/s]
Traceback (most recent call last):File "/mnt/workspace/chatglm3/ChatGLM3/openai_api_demo/api_server.py", line 538, in <module>embedding_model = SentenceTransformer(EMBEDDING_PATH, device_map="auto")^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: SentenceTransformer.__init__() got an unexpected keyword argument 'device_map'
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> # 修改成 device="auto"# load Embedding#embedding_model = SentenceTransformer(EMBEDDING_PATH, device="cuda")embedding_model = SentenceTransformer(EMBEDDING_PATH, device="auto")# 看这里的参数介绍,看起来可能必须填cuda了
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> python api_server.py
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00, 4.71it/s]
Traceback (most recent call last):File "/mnt/workspace/chatglm3/ChatGLM3/openai_api_demo/api_server.py", line 538, in <module>embedding_model = SentenceTransformer(EMBEDDING_PATH, device="auto")^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/sentence_transformers/SentenceTransformer.py", line 316, in __init__self.to(device)File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1137, in todevice, dtype, non_blocking, convert_to_format = torch._C._nn._parse_to(*args, **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected one of cpu, cuda, ipu, xpu, mkldnn, opengl, opencl, ideep, hip, ve, fpga, ort, xla, lazy, vulkan, mps, meta, hpu, mtia, privateuseone device type at start of device string: auto
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo>
发现是由于之前下载的pytorch怎么无法使用了,再次下载,然后可以执行python api_server.py了
# 奇怪,一开始就下载过pytorch呀,当时返回还是True
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> python
Python 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/torch/cuda/__init__.py:118: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)return torch._C._cuda_getDeviceCount() > 0
False
>>>
# 重新下载
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=11.8 -c pytorch -c nvidia
Collecting package metadata (current_repodata.json): done
# ok了
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> python
Python 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
True
>>> (chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> python api_server.py
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:06<00:00, 1.15it/s]
INFO: Started server process [10998]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
这次反应很快