ChatGLM3-6B模型部署微调实战

准备

视频教程

https://www.bilibili.com/video/BV1ce411J7nZ?p=14&vd_source=165c419c549bc8d0c2d71be2d7b93ccc

视频对应的资料
https://pan.baidu.com/wap/init?surl=AjPi7naUMcI3OGG9lDpnpQ&pwd=vai2#/home/%2FB%E7%AB%99%E5%85%AC%E5%BC%80%E8%AF%BE%E3%80%90%E8%AF%BE%E4%BB%B6%E3%80%91%2F%E6%9C%A8%E7%BE%BD%E8%80%81%E5%B8%88%E5%85%AC%E5%BC%80%E8%AF%BE%E8%AF%BE%E4%BB%B6/%2F2401276%E5%B0%8F%E6%97%B6%E6%8E%8C%E6%8F%A1%E5%BC%80%E6%BA%90%E5%A4%A7%E6%A8%A1%E5%9E%8B%E6%9C%AC%E5%9C%B0%E9%83%A8%E7%BD%B2%E5%88%B0%E5%BE%AE%E8%B0%83

使用“阿里云人工智能平台 PAI”

PAI-DSW免费试用

  • https://free.aliyun.com/?spm=5176.14066474.J_5834642020.5.7b34754cmRbYhg&productCode=learn
  • https://help.aliyun.com/document_detail/2261126.html
    在这里插入图片描述

GPU规格和镜像版本选择(参考的 “基于Wav2Lip+TPS-Motion-Model+CodeFormer技术实现动漫风数字人”):

  • dsw-registry-vpc.cn-beijing.cr.aliyuncs.com/pai/pytorch:1.12-gpu-py39-cu113-ubuntu20.04
  • 规格名称为ecs.gn6v-c8g1.2xlarge,1 * NVIDIA V100

实操

环境准备和模型下载

创建conda虚拟环境

conda create --name chatglm3_test python=3.11# conda env list
/mnt/workspace> conda env list
# conda environments:
#
base                     /home/pai
chatglm3_test            /home/pai/envs/chatglm3_test# conda activate chatglm3_test
# 如果报错CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.,可以执行source activate chatglm3_test。后面就可以正常使用conda activate 命令激活虚拟环境了
/mnt/workspace> source activate chatglm3_test
(chatglm3_test) /mnt/workspace> conda activate chatglm3_test
(chatglm3_test) /mnt/workspace> conda activate base
(base) /mnt/workspace> conda activate chatglm3_test
(chatglm3_test) /mnt/workspace> 

查看当前驱动最高支持的CUDA版本
CUDA Version: 11.4

# nvidia-smi
(chatglm3_test) /mnt/workspace> nvidia-smi
Wed Jul 31 16:44:52 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01    Driver Version: 470.82.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:08.0 Off |                    0 |
| N/A   34C    P0    40W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
(chatglm3_test) /mnt/workspace> 

在虚拟环境中安装Pytorch
进入Pytorch官网:https://pytorch.org/get-started/previous-versions/
在这里插入图片描述

(chatglm3_test) /mnt/workspace> conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=11.8 -c pytorch -c nvidia
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: -
Proceed ([y]/n)? yDownloading and Extracting PackagesPreparing transaction: done                                                                                                                           
Verifying transaction: done
Executing transaction: done
(chatglm3_test) /mnt/workspace>

Pytorch验证

  • 如果输出是 True,则表示GPU版本的PyTorch已经安装成功并且可以使用CUDA
  • 如果输出是
    False,则表明没有安装GPU版本的PyTorch,或者CUDA环境没有正确配置,此时根据教程,重新检查
    自己的执行过程。
(chatglm3_test) /mnt/workspace> python
Python 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
True
>>> 

下载ChatGLM3的项目文件

(chatglm3_test) /mnt/workspace> mkdir chatglm3
(chatglm3_test) /mnt/workspace> cd chatglm3
(chatglm3_test) /mnt/workspace/chatglm3> git clone https://github.com/THUDM/ChatGLM3.git
Cloning into 'ChatGLM3'...
remote: Enumerating objects: 1549, done.
remote: Counting objects: 100% (244/244), done.
remote: Compressing objects: 100% (149/149), done.
remote: Total 1549 (delta 124), reused 182 (delta 93), pack-reused 1305
Receiving objects: 100% (1549/1549), 17.80 MiB | 7.97 MiB/s, done.
Resolving deltas: 100% (864/864), done.
(chatglm3_test) /mnt/workspace/chatglm3> ll ChatGLM3/
total 156
drwxrwxrwx 13 root root  4096 Jul 31 17:30 ./
drwxrwxrwx  3 root root  4096 Jul 31 17:30 ../
drwxrwxrwx  2 root root  4096 Jul 31 17:30 basic_demo/
drwxrwxrwx  4 root root  4096 Jul 31 17:30 composite_demo/
-rw-rw-rw-  1 root root  2304 Jul 31 17:30 DEPLOYMENT_en.md
-rw-rw-rw-  1 root root  2098 Jul 31 17:30 DEPLOYMENT.md
drwxrwxrwx  3 root root  4096 Jul 31 17:30 finetune_demo/
drwxrwxrwx  8 root root  4096 Jul 31 17:30 .git/
drwxrwxrwx  4 root root  4096 Jul 31 17:30 .github/
-rw-rw-rw-  1 root root   175 Jul 31 17:30 .gitignore
drwxrwxrwx  4 root root  4096 Jul 31 17:30 Intel_device_demo/
drwxrwxrwx  3 root root  4096 Jul 31 17:30 langchain_demo/
-rw-rw-rw-  1 root root 11353 Jul 31 17:30 LICENSE
-rw-rw-rw-  1 root root  5178 Jul 31 17:30 MODEL_LICENSE
drwxrwxrwx  2 root root  4096 Jul 31 17:30 openai_api_demo/
-rw-rw-rw-  1 root root  7118 Jul 31 17:30 PROMPT_en.md
-rw-rw-rw-  1 root root  6885 Jul 31 17:30 PROMPT.md
-rw-rw-rw-  1 root root 23163 Jul 31 17:30 README_en.md
-rw-rw-rw-  1 root root 22179 Jul 31 17:30 README.md
-rw-rw-rw-  1 root root   498 Jul 31 17:30 requirements.txt
drwxrwxrwx  2 root root  4096 Jul 31 17:30 resources/
drwxrwxrwx  2 root root  4096 Jul 31 17:30 tensorrt_llm_demo/
drwxrwxrwx  2 root root  4096 Jul 31 17:30 tools_using_demo/
-rw-rw-rw-  1 root root   240 Jul 31 17:30 update_requirements.sh
(chatglm3_test) /mnt/workspace/chatglm3> 

升级pip版本

python -m pip install --upgrade pip

使用pip安装ChatGLM运行的项目依赖

(chatglm3_test) /mnt/workspace/chatglm3> cd ChatGLM3/
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> pip install -r requirements.txt

(不推荐) 从Hugging Face下载ChatGLM3模型权重

# 初始化Git LFS
apt-get install git-lfs# 初始化Git LFS  
# git lfs install
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> git lfs install
Updated git hooks.
Git LFS initialized.
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> # 使用 Git LFS 下载ChatGLM3-6B的模型权重
git clone https://huggingface.co/THUDM/chatglm3-6b
# 无法访问,需要挂梯子
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> git clone https://huggingface.co/THUDM/chatglm3-6b
Cloning into 'chatglm3-6b'...
fatal: unable to access 'https://huggingface.co/THUDM/chatglm3-6b/': Failed to connect to huggingface.co port 443: Connection timed out
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> 
(base) /mnt/workspace/chatglm3/ChatGLM3> ping https://huggingface.co/
ping: https://huggingface.co/: Name or service not known
(base) /mnt/workspace/chatglm3/ChatGLM3> 

(推荐)从modelscope下载ChatGLM3模型权重

(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> pip install modelscope
(base) /mnt/workspace/chatglm3/ChatGLM3> mkdir /mnt/workspace/chatglm3-6b/(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> python
Python 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from modelscope import snapshot_download
>>> model_dir = snapshot_download("ZhipuAI/chatglm3-6b",cache_dir="/mnt/workspace/chatglm3-6b/", revision = "v1.0.0")
2024-07-31 18:03:22,937 - modelscope - INFO - Use user-specified model revision: v1.0.0
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1.29k/1.29k [00:00<00:00, 2.54kB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 40.0/40.0 [00:00<00:00, 66.8B/s]
...
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> mv /mnt/workspace/chatglm3-6b/ZhipuAI/chatglm3-6b ./
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> ll chatglm3-6b
total 12195768
drwxrwxrwx  2 root root       4096 Jul 31 18:05 ./
drwxrwxrwx 14 root root       4096 Jul 31 18:07 ../
-rw-rw-rw-  1 root root       1317 Jul 31 18:03 config.json
-rw-rw-rw-  1 root root       2332 Jul 31 18:03 configuration_chatglm.py
-rw-rw-rw-  1 root root         40 Jul 31 18:03 configuration.json
-rw-rw-rw-  1 root root         42 Jul 31 18:03 .mdl
-rw-rw-rw-  1 root root      55596 Jul 31 18:03 modeling_chatglm.py
-rw-rw-rw-  1 root root       4133 Jul 31 18:03 MODEL_LICENSE
-rw-------  1 root root       1422 Jul 31 18:05 .msc
-rw-rw-rw-  1 root root         36 Jul 31 18:05 .mv
-rw-rw-rw-  1 root root 1827781090 Jul 31 18:03 pytorch_model-00001-of-00007.bin
-rw-rw-rw-  1 root root 1968299480 Jul 31 18:03 pytorch_model-00002-of-00007.bin
-rw-rw-rw-  1 root root 1927415036 Jul 31 18:04 pytorch_model-00003-of-00007.bin
-rw-rw-rw-  1 root root 1815225998 Jul 31 18:04 pytorch_model-00004-of-00007.bin
-rw-rw-rw-  1 root root 1968299544 Jul 31 18:04 pytorch_model-00005-of-00007.bin
-rw-rw-rw-  1 root root 1927415036 Jul 31 18:05 pytorch_model-00006-of-00007.bin
-rw-rw-rw-  1 root root 1052808542 Jul 31 18:05 pytorch_model-00007-of-00007.bin
-rw-rw-rw-  1 root root      20437 Jul 31 18:05 pytorch_model.bin.index.json
-rw-rw-rw-  1 root root      14692 Jul 31 18:05 quantization.py
-rw-rw-rw-  1 root root       4474 Jul 31 18:05 README.md
-rw-rw-rw-  1 root root      11279 Jul 31 18:05 tokenization_chatglm.py
-rw-rw-rw-  1 root root        244 Jul 31 18:05 tokenizer_config.json
-rw-rw-rw-  1 root root    1018370 Jul 31 18:05 tokenizer.model
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> rm -rf /mnt/workspace/chatglm3-6b
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> 

运行ChatGLM3-6B模型

方式一、基于命令行的交互式对话

这种方式可以为非技术用户提供一个脱离代码环境的对话方式。
对于这种启动方式,官方提供的脚本名称是cli_demo.py,在运行之前,需要确认一下模型的加载路径,修改为正确的地址

(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> ll basic_demo/cli_demo.py
-rw-rw-rw- 1 root root 2065 Jul 31 17:30 basic_demo/cli_demo.py
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> 

在这里插入图片描述修改完成后,直接使用 python cli_demp.py 即可启动,如果启动成功,就会开启交互式对话,如
果输入 stop 可以退出该运行环境。

(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> cd basic_demo/
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/basic_demo> python cli_demo.py 

有出来结果,但是反应很慢…
在这里插入图片描述

方式二、基于 Gradio 的Web端对话应用

基于网页端的对话是目前非常通用的大语言交互方式,ChatGLM3官方项目组提供了两种Web端对话demo,两个示例应用功能一致,只是采用了不同的Web框架进行开发。首先是基于 Gradio 的Web 端对话应用demo。Gradio是一个Python库,用于快速创建用于演示机器学习模型的Web界面。开发者可以用几行代码为模型创建输入和输出接口,用户可以通过这些接口与模型进行交互。用户可以轻松地测试和使用机器学习模型,比如通过上传图片来测试图像识别模型,或者输入文本来测试自然语言处理模型。Gradio非常适合于快速原型设计和模型展示。
对于这种启动方式,官方提供的脚本名称是web_demo_gradio.py,在运行之前,需要确认一下模型的加载路径,修改为正确的地址。
在这里插入图片描述直接使用python启动即可,如果启动正常,会自动弹出Web页面,可以直接在Web页面上进行交互。

# 执行报错 No module named 'peft'
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/basic_demo> python web_demo_gradio.py
Traceback (most recent call last):File "/mnt/workspace/chatglm3/ChatGLM3/basic_demo/web_demo_gradio.py", line 26, in <module>from peft import AutoPeftModelForCausalLM, PeftModelForCausalLM
ModuleNotFoundError: No module named 'peft'
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/basic_demo> 
# 之前应该是下载了的
(base) /mnt/workspace/chatglm3/ChatGLM3> grep gradio requirements.txt
gradio>=4.26.0
(base) /mnt/workspace/chatglm3/ChatGLM3> 
# 确实没有peft
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/basic_demo> conda list | grep peft
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/basic_demo> conda list | grep gradio
gradio                    4.39.0                   pypi_0    pypi
gradio-client             1.1.1                    pypi_0    pypi
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/basic_demo> 
# 安装peft
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/basic_demo> pip install peftchatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/basic_demo> python web_demo_gradio.py
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00,  5.60it/s]
Running on local URL:  http://127.0.0.1:7870To create a public link, set `share=True` in `launch()`.

在这里插入图片描述我出现了超时的情况

====conversation====[{'role': 'user', 'content': 'hello'}]
Traceback (most recent call last):File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/gradio/queueing.py", line 536, in process_eventsresponse = await route_utils.call_process_api(^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/gradio/route_utils.py", line 276, in call_process_apioutput = await app.get_blocks().process_api(^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/gradio/blocks.py", line 1923, in process_apiresult = await self.call_function(^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/gradio/blocks.py", line 1520, in call_functionprediction = await utils.async_iteration(iterator)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/gradio/utils.py", line 663, in async_iterationreturn await iterator.__anext__()^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/gradio/utils.py", line 656, in __anext__return await anyio.to_thread.run_sync(^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_syncreturn await get_async_backend().run_sync_in_worker_thread(^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_threadreturn await future^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in runresult = context.run(func, *args)^^^^^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/gradio/utils.py", line 639, in run_sync_iterator_asyncreturn next(iterator)^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/gradio/utils.py", line 801, in gen_wrapperresponse = next(iterator)^^^^^^^^^^^^^^File "/mnt/workspace/chatglm3/ChatGLM3/basic_demo/web_demo_gradio.py", line 145, in predictfor new_token in streamer:File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/transformers/generation/streamers.py", line 223, in __next__value = self.text_queue.get(timeout=self.timeout)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/queue.py", line 179, in getraise Empty
_queue.Empty

在这里插入图片描述修改超时时间

(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/basic_demo> vi web_demo_gradio.py 
...#streamer = TextIteratorStreamer(tokenizer, timeout=60, skip_prompt=True, skip_special_tokens=True)streamer = TextIteratorStreamer(tokenizer, timeout=600, skip_prompt=True, skip_special_tokens=True)
...

在这里插入图片描述再次运行就可以了
在这里插入图片描述

方式三、基于 Streamlit 的Web端对话应用

ChatGLM3官方提供的第二个Web对话应用demo,是一个基于Streamlit的Web应用。Streamlit是另一个用于创建数据科学和机器学习Web应用的Python库。它强调简单性和快速的开发流程,让开发者能够通过编写普通的Python脚本来创建互动式Web应用。Streamlit自动管理UI布局和状态,这样开发者就可以专注于数据和模型的逻辑。Streamlit应用通常用于数据分析、可视化、构建探索性数据分析工具等场景。
对于这种启动方式,官方提供的脚本名称是web_demo_streamlit.py。同样,先使用 vim 编辑器修改模型的加载路径。

(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/basic_demo> vi web_demo_streamlit.py
...
#MODEL_PATH = os.environ.get('MODEL_PATH', 'THUDM/chatglm3-6b')
MODEL_PATH = os.environ.get('MODEL_PATH', '../chatglm3-6b')

启动命令略有不同,不再使用 python ,而是需要使用 streamkit run 的方式来启动。

(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/basic_demo> streamlit run web_demo_streamlit.pyCollecting usage statistics. To deactivate, set browser.gatherUsageStats to false.You can now view your Streamlit app in your browser.Local URL: http://localhost:8501Network URL: http://10.224.132.38:8501External URL: http://39.107.58.222:8501

在这里插入图片描述实践的人时候,回复的字出来得很慢…有些奇怪
在这里插入图片描述

方式四、在指定虚拟环境的Jupyter Lab中运行

我们在部署Chatglm3-6B模型之前,创建了一个 chatglme3_test 虚拟环境来支撑该模型的运行。
除了在终端中使用命令行启动,同样可以在Jupyter Lab环境中启动这个模型。具体的执行过程如下:
确认conda环境,并在该环境中安装 ipykernel 软件包。这个软件包将允许Jupyter Notebook使用特定环境的Python。

(base) /mnt/workspace/chatglm3/ChatGLM3> conda env list
# conda environments:
#
base                  *  /home/pai
chatglm3_test            /home/pai/envs/chatglm3_test(base) /mnt/workspace/chatglm3/ChatGLM3> conda activate chatglm3_test
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> conda install ipykernel
Collecting package metadata (current_repodata.json): done
Solving environment: / 
...

将该环境添加到Jupyter Notebook中。运行以下命令:

# 这里的 chatglm3_test 替换成需要使用的虚拟环境名称
python -m ipykernel install --user --name=yenv_name --displayname="Python(chatglm3_test)"# 报错 No module named ipykernel
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> python -m ipykernel install --user --name=yenv_name --display-name="Python (chatglm3_test)"
/home/pai/envs/chatglm3_test/bin/python: No module named ipykernel
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> # 使用 pip安装 ipykernel
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> pip install ipykernel
# 再次执行
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> python -m ipykernel install --user --name=yenv_name --display-name="Python (chatglm3_test)"
Installed kernelspec yenv_name in /root/.local/share/jupyter/kernels/yenv_name
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> 

执行完上述过程后,在终端输入 jupyter lab 启动。

# Jupyter command `jupyter-lab` not found.
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> jupyter lab
usage: jupyter [-h] [--version] [--config-dir] [--data-dir] [--runtime-dir] [--paths] [--json] [--debug] [subcommand]Jupyter: Interactive Computingpositional arguments:subcommand     the subcommand to launchoptions:-h, --help     show this help message and exit--version      show the versions of core jupyter packages and exit--config-dir   show Jupyter config dir--data-dir     show Jupyter data dir--runtime-dir  show Jupyter runtime dir--paths        show all Jupyter paths. Add --json for machine-readable format.--json         output paths as machine-readable json--debug        output debug information about pathsAvailable subcommands: bundlerextension console dejavu events execute kernel kernelspec migrate nbclassic nbconvert nbextension notebook qtconsole
run server serverextension troubleshoot trustJupyter command `jupyter-lab` not found.
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> # 安装jupyterlab
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> pip install jupyterlab
# 安装其他需要的包
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> pip install nni# 再次执行
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> jupyter lab
[I 2024-07-31 21:17:54.158 ServerApp] jupyter_lsp | extension was successfully linked.
[I 2024-07-31 21:17:54.162 ServerApp] jupyter_server_terminals | extension was successfully linked.
[I 2024-07-31 21:17:54.167 ServerApp] jupyterlab | extension was successfully linked.
[I 2024-07-31 21:17:54.167 ServerApp] nni.tools.jupyter_extension | extension was successfully linked.
[I 2024-07-31 21:17:54.354 ServerApp] notebook_shim | extension was successfully linked.
[I 2024-07-31 21:17:54.369 ServerApp] notebook_shim | extension was successfully loaded.
[I 2024-07-31 21:17:54.371 ServerApp] jupyter_lsp | extension was successfully loaded.
[I 2024-07-31 21:17:54.372 ServerApp] jupyter_server_terminals | extension was successfully loaded.
[I 2024-07-31 21:17:54.373 LabApp] JupyterLab extension loaded from /home/pai/envs/chatglm3_test/lib/python3.11/site-packages/jupyterlab
[I 2024-07-31 21:17:54.373 LabApp] JupyterLab application directory is /home/pai/envs/chatglm3_test/share/jupyter/lab
[I 2024-07-31 21:17:54.374 LabApp] Extension Manager is 'pypi'.
[I 2024-07-31 21:17:54.413 ServerApp] jupyterlab | extension was successfully loaded.
[I 2024-07-31 21:17:54.413 ServerApp] nni.tools.jupyter_extension | extension was successfully loaded.
[C 2024-07-31 21:17:54.414 ServerApp] Running as root is not recommended. Use --allow-root to bypass.
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> jupyter lab --allow-root# 有些报错
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> jupyter lab --allow-root
[I 2024-07-31 21:18:05.910 ServerApp] jupyter_lsp | extension was successfully linked.
[I 2024-07-31 21:18:05.915 ServerApp] jupyter_server_terminals | extension was successfully linked.
[I 2024-07-31 21:18:05.919 ServerApp] jupyterlab | extension was successfully linked.
[I 2024-07-31 21:18:05.919 ServerApp] nni.tools.jupyter_extension | extension was successfully linked.
[I 2024-07-31 21:18:06.115 ServerApp] notebook_shim | extension was successfully linked.
[I 2024-07-31 21:18:06.131 ServerApp] notebook_shim | extension was successfully loaded.
[I 2024-07-31 21:18:06.134 ServerApp] jupyter_lsp | extension was successfully loaded.
[I 2024-07-31 21:18:06.135 ServerApp] jupyter_server_terminals | extension was successfully loaded.
[I 2024-07-31 21:18:06.136 LabApp] JupyterLab extension loaded from /home/pai/envs/chatglm3_test/lib/python3.11/site-packages/jupyterlab
[I 2024-07-31 21:18:06.136 LabApp] JupyterLab application directory is /home/pai/envs/chatglm3_test/share/jupyter/lab
[I 2024-07-31 21:18:06.137 LabApp] Extension Manager is 'pypi'.
[I 2024-07-31 21:18:06.176 ServerApp] jupyterlab | extension was successfully loaded.
[I 2024-07-31 21:18:06.176 ServerApp] nni.tools.jupyter_extension | extension was successfully loaded.
[I 2024-07-31 21:18:06.177 ServerApp] Serving notebooks from local directory: /mnt/workspace/chatglm3/ChatGLM3
[I 2024-07-31 21:18:06.177 ServerApp] Jupyter Server 2.14.2 is running at:
[I 2024-07-31 21:18:06.177 ServerApp] http://localhost:8888/lab?token=d86607de475637c21dedc034d312f35a48640fe155b0bbe2
[I 2024-07-31 21:18:06.177 ServerApp]     http://127.0.0.1:8888/lab?token=d86607de475637c21dedc034d312f35a48640fe155b0bbe2
[I 2024-07-31 21:18:06.177 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 2024-07-31 21:18:06.181 ServerApp] No web browser found: Error('could not locate runnable browser').
[C 2024-07-31 21:18:06.181 ServerApp] To access the server, open this file in a browser:file:///root/.local/share/jupyter/runtime/jpserver-5341-open.htmlOr copy and paste one of these URLs:http://localhost:8888/lab?token=d86607de475637c21dedc034d312f35a48640fe155b0bbe2http://127.0.0.1:8888/lab?token=d86607de475637c21dedc034d312f35a48640fe155b0bbe2
[I 2024-07-31 21:18:06.532 ServerApp] Skipped non-installed server(s): bash-language-server, dockerfile-language-server-nodejs, javascript-typescript-langserver, jedi-language-server, julia-language-server, pyright, python-language-server, python-lsp-server, r-languageserver, sql-language-server, texlab, typescript-language-server, unified-language-server, vscode-css-languageserver-bin, vscode-html-languageserver-bin, vscode-json-languageserver-bin, yaml-language-server
[W 2024-07-31 21:19:26.365 LabApp] Blocking request with non-local 'Host' 115450-proxy-8888.dsw-gateway-cn-beijing.data.aliyun.com (115450-proxy-8888.dsw-gateway-cn-beijing.data.aliyun.com). If the server should be accessible at that name, set ServerApp.allow_remote_access to disable the check.
[E 2024-07-31 21:19:26.376 ServerApp] Could not open static file ''
[W 2024-07-31 21:19:26.377 LabApp] 403 GET /lab?token=[secret] (@127.0.0.1) 12.72ms referer=None
[W 2024-07-31 21:19:26.771 ServerApp] Blocking request with non-local 'Host' 115450-proxy-8888.dsw-gateway-cn-beijing.data.aliyun.com (115450-proxy-8888.dsw-gateway-cn-beijing.data.aliyun.com). If the server should be accessible at that name, set ServerApp.allow_remote_access to disable the check.
[W 2024-07-31 21:19:26.774 ServerApp] 403 GET /static/lab/style/bootstrap-theme.min.css (@127.0.0.1) 3.13ms referer=https://115450-proxy-8888.dsw-gateway-cn-beijing.data.aliyun.com/lab?token=[secret]

加上"–ServerApp.allow_remote_access=True"

(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> jupyter lab --allow-root --ServerApp.allow_remote_access=True
[I 2024-07-31 21:44:48.328 ServerApp] jupyter_lsp | extension was successfully linked.
[I 2024-07-31 21:44:48.333 ServerApp] jupyter_server_terminals | extension was successfully linked.
[I 2024-07-31 21:44:48.337 ServerApp] jupyterlab | extension was successfully linked.
[I 2024-07-31 21:44:48.337 ServerApp] nni.tools.jupyter_extension | extension was successfully linked.
[I 2024-07-31 21:44:48.522 ServerApp] notebook_shim | extension was successfully linked.
[I 2024-07-31 21:44:48.537 ServerApp] notebook_shim | extension was successfully loaded.
[I 2024-07-31 21:44:48.539 ServerApp] jupyter_lsp | extension was successfully loaded.
[I 2024-07-31 21:44:48.540 ServerApp] jupyter_server_terminals | extension was successfully loaded.
[I 2024-07-31 21:44:48.541 LabApp] JupyterLab extension loaded from /home/pai/envs/chatglm3_test/lib/python3.11/site-packages/jupyterlab
[I 2024-07-31 21:44:48.541 LabApp] JupyterLab application directory is /home/pai/envs/chatglm3_test/share/jupyter/lab
[I 2024-07-31 21:44:48.542 LabApp] Extension Manager is 'pypi'.
[I 2024-07-31 21:44:48.580 ServerApp] jupyterlab | extension was successfully loaded.
[I 2024-07-31 21:44:48.580 ServerApp] nni.tools.jupyter_extension | extension was successfully loaded.
[I 2024-07-31 21:44:48.580 ServerApp] The port 8888 is already in use, trying another port.
[I 2024-07-31 21:44:48.581 ServerApp] Serving notebooks from local directory: /mnt/workspace/chatglm3/ChatGLM3
[I 2024-07-31 21:44:48.581 ServerApp] Jupyter Server 2.14.2 is running at:
[I 2024-07-31 21:44:48.581 ServerApp] http://localhost:8889/lab?token=1d76fe80713c071a02f8343fd835d5a6d46b13bb2efa0462
[I 2024-07-31 21:44:48.581 ServerApp]     http://127.0.0.1:8889/lab?token=1d76fe80713c071a02f8343fd835d5a6d46b13bb2efa0462
[I 2024-07-31 21:44:48.581 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 2024-07-31 21:44:48.585 ServerApp] No web browser found: Error('could not locate runnable browser').
[C 2024-07-31 21:44:48.585 ServerApp] To access the server, open this file in a browser:file:///root/.local/share/jupyter/runtime/jpserver-5910-open.htmlOr copy and paste one of these URLs:http://localhost:8889/lab?token=1d76fe80713c071a02f8343fd835d5a6d46b13bb2efa0462http://127.0.0.1:8889/lab?token=1d76fe80713c071a02f8343fd835d5a6d46b13bb2efa0462
[I 2024-07-31 21:44:48.918 ServerApp] Skipped non-installed server(s): bash-language-server, dockerfile-language-server-nodejs, javascript-typescript-langserver, jedi-language-server, julia-language-server, pyright, python-language-server, python-lsp-server, r-languageserver, sql-language-server, texlab, typescript-language-server, unified-language-server, vscode-css-languageserver-bin, vscode-html-languageserver-bin, vscode-json-languageserver-bin, yaml-language-server
[I 2024-07-31 21:44:52.018 LabApp] 302 GET /lab (@127.0.0.1) 1.06ms

点击地址
在这里插入图片描述按照页面提示,即可进到页面

  • 使用打印出来的token,set密码即可
  • 或者将token拼接到url后,然后访问,应该也行。https://115450-proxy-8889.dsw-gateway-cn-beijing.data.aliyun.com/lab
    在这里插入图片描述创建一个notebook
    在这里插入图片描述在这里插入图片描述视频教程和对应文档里给出的命令是本地跑的,和我的环境可能不一样,我实际执行的时候遇到问题
from transformers import AutoTokenizer, AutoModeltokenizer = AutoTokenizer.from_pretrained('/mnt/workspace/chatglm3/ChatGLM3/chatglm3-6b', trust_remote_code=True)
model = AutoModel.from_pretrained('/mnt/workspace/chatglm3/ChatGLM3/chatglm3-6b', trust_remote_code=True, device="cuda")
model= model.eval()response, history = model.chat(tokenizer, "你好", history=[])
print(response)#报错
RuntimeError: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.

最后是参考 basic_demo/cli_demo.py 调整了一下命令,可以顺利执行
在这里插入图片描述https://115450-proxy-8889.dsw-gateway-cn-beijing.data.aliyun.com/lab/tree/Untitled.ipynb
在这里插入图片描述

from transformers import AutoTokenizer, AutoModeltokenizer = AutoTokenizer.from_pretrained('/mnt/workspace/chatglm3/ChatGLM3/chatglm3-6b', trust_remote_code=True)
model = AutoModel.from_pretrained('/mnt/workspace/chatglm3/ChatGLM3/chatglm3-6b', trust_remote_code=True, device_map="auto")
model= model.eval()response, history = model.chat(tokenizer, "你好", history=[])
print(response)

教程里的解释

  • 从transformers中加载AutoTokenizer 和 AutoModel,指定好模型的路径即可。tokenizer
    这个词大家应该不会很陌生,可以简单理解我们在之前使用gpt系列模型的时候,使用tiktoken库帮我们把输入的自然语言,也就是prompt按照一种特定的编码方式来切分成token,从而生成API调用的成本。但在Transform中tokenizer要干的事会更多一些,它会把输入到大语言模型的文本,包在tokenizer中去做一些前置的预处理,会将自然语言文本转换为模型能够理解的格式,然后拆分为tokens(如单词、字符或子词单位)等操作。
  • 而对于模型的加载来说,官方的代码中指向的路径是 THUDM/chatglm3-6b ,表示可以直接在云端加载模型,所以如果我们没有下载chatglm3-6b模型的话,直接运行此代码也是可以的,只不过第一次加载会很慢,耐心等待即可,同时需要确保当前的网络是联通的(必要的情况下需要开梯子)。因为我们已经将ChatGLM3-6B的模型权重下载到本地了,所以此处可以直接指向我们下载的Chatglm3-6b模型的存储路径来进行推理测试。
  • 对于其他参数来说,model 有一个eval模式,就是评估的方法,模型基本就是两个阶段的事,一个是训练,一个是推理,计算的量更大,它需要把输入的值做一个推理,如果是一个有监督的模型,那必然存在一个标签值,也叫真实值,这个值会跟模型推理的值做一个比较,这个过程是正向传播。差异如果很大,就说明这个模型的能力还远远不够,既然效果不好,就要调整参数来不断地修正,通过不断地求导,链式法则等方式进行反向传播。当模型训练好了,模型的参数就不会变了,形成一个静态的文件,可以下载下来,当我们使用的时候,就不需要这个反向传播的过程,只需要做正向的推理就好了,此处设置 model.eval()就是说明这个过程。而trust_remote_code=True 表示信任远程代码(如果有), device=‘cuda’ 表示将模型加载到CUDA设备上以便使用GPU加速,这两个就很好理解了。

方式五、OpenAI风格API调用方法

ChatGLM3-6B模型提供了OpenAI风格的API调用方法。正如此前所说,在OpenAI几乎定义了整个前沿AI应用开发标准的当下,提供一个OpenAI风格的API调用方法,毫无疑问可以让ChatGLM3模型无缝接入OpenAI开发生态。所谓的OpenAI风格的API调用,指的是借助OpenAI库中的ChatCompletion函数进行ChatGLM3模型调用。而现在,我们只需要在model参数上输入chatglm3-6b,即可调用ChatGLM3模型。调用API风格的统一,无疑也将大幅提高开发效率。
而要执行OpenAI风格的API调用,则首先需要安装openai库,并提前运行openai_api.py脚本。

首先需要注意:OpenAI目前已将openai库更新至1.x,但目前Chatglm3-6B仍需要使用旧版本
0.28。所以需要确保当前环境的openai版本。

(chatglm3_test) /mnt/workspace> conda list | grep openai
openai                    1.37.1                   pypi_0    pypi
(chatglm3_test) /mnt/workspace> pip install openai==0.28.1
...
(chatglm3_test) /mnt/workspace> conda list | grep openai
openai                    0.28.1                   pypi_0    pypi
(chatglm3_test) /mnt/workspace> 

需要安装tiktoken包,用于将文本分割成 tokens
需要降级 typing_extensions 依赖包,否则会报错
需要安装 sentence_transformers 依赖包,安装最新的即可

# tiktoken
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> conda list | grep tiktoken
tiktoken                  0.7.0                    pypi_0    pypi
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> pip install tiktoken
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> conda list | grep tiktoken
tiktoken                  0.7.0                    pypi_0    pypi
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> # 降级 typing_extensions 依赖包
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> conda list | grep typing_extensions
typing_extensions         4.11.0          py311h06a4308_0  
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> pip install typing_extensions==4.8.0
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> pip install typing_extensions==4.8.0
Looking in indexes: https://mirrors.cloud.aliyuncs.com/pypi/simple
Requirement already satisfied: typing_extensions==4.8.0 in /home/pai/envs/chatglm3_test/lib/python3.11/site-packages (4.8.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> 
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> conda list | grep typing_extensions
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> ll /home/pai/envs/chatglm3_test/lib/python3.11/site-packages | grep typing_extensions
drwxrwxrwx   2 root root      4096 Jul 31 22:27 typing_extensions-4.8.0.dist-info/
-rw-rw-rw-   1 root root    103397 Jul 31 22:27 typing_extensions.py
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> # 安装 sentence_transformers 依赖包
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> pip install sentence_transformers
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> ll /home/pai/envs/chatglm3_test/lib/python3.11/site-packages | grep sentence_transformers 
drwxrwxrwx   9 root root      4096 Jul 31 17:35 sentence_transformers/
drwxrwxrwx   2 root root      4096 Jul 31 17:35 sentence_transformers-3.0.1.dist-info/
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3> 

安装完成后,使用命令 python openai_api.py 启动,第一次启动会有点慢,耐心等待。
当前下载的代码里没有python api_server.py,查看git上的readme文件之后,确认当下是api_server.py

# 没有教程里说的文件openai_api.py
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> ll
total 52
drwxrwxrwx  2 root root  4096 Jul 31 17:30 ./
drwxrwxrwx 15 root root  4096 Jul 31 22:17 ../
-rw-rw-rw-  1 root root 18125 Jul 31 17:30 api_server.py
-rw-rw-rw-  1 root root  1907 Jul 31 17:30 docker-compose.yml
-rw-rw-rw-  1 root root    67 Jul 31 17:30 .env
-rw-rw-rw-  1 root root  1566 Jul 31 17:30 langchain_openai_api.py
-rw-rw-rw-  1 root root  3097 Jul 31 17:30 openai_api_request.py
-rw-rw-rw-  1 root root  6285 Jul 31 17:30 utils.py
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> # https://github.com/THUDM/ChatGLM3?tab=readme-ov-file#openai-api--zhipu-api-demo
# python api_server.py
#修改 MODEL_PATH
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> vi api_server.py
#MODEL_PATH = os.environ.get('MODEL_PATH', 'THUDM/chatglm3-6b')
MODEL_PATH = os.environ.get('MODEL_PATH', '../chatglm3-6b')# 再次执行,报错没有BAAI/bge-m3,且连不上huggingface下载
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> python api_server.py
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00,  4.35it/s]
No sentence-transformers model found with name BAAI/bge-m3. Creating a new one with mean pooling.
/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.warnings.warn(
Traceback (most recent call last):File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/urllib3/connection.py", line 196, in _new_connsock = connection.create_connection(...File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/transformers/utils/hub.py", line 441, in cached_fileraise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like BAAI/bge-m3 is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo>

下载api_server.py需要的BAAI/bge-m3,推荐从modelscope下载
https://www.modelscope.cn/models/Xorbits/bge-m3/files

(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> vi api_server.py
...
# set Embedding Model path
EMBEDDING_PATH = os.environ.get('EMBEDDING_PATH', 'BAAI/bge-m3')# 
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> mkdir /mnt/workspace/bge-m3
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> python
Python 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from modelscope import snapshot_download
>>> model_dir = snapshot_download("Xorbits/bge-m3",cache_dir="/mnt/workspace/bge-m3/", revision = "v1.0.0")
Traceback (most recent call last):File "<stdin>", line 1, in <module>File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/modelscope/hub/snapshot_download.py", line 74, in snapshot_downloadreturn _snapshot_download(^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/modelscope/hub/snapshot_download.py", line 194, in _snapshot_downloadrevision_detail = _api.get_valid_revision_detail(^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/modelscope/hub/api.py", line 544, in get_valid_revision_detailraise NotExistError('The model: %s has no revision: %s valid are: %s!' %
modelscope.hub.errors.NotExistError: The model: Xorbits/bge-m3 has no revision: v1.0.0 valid are: [v0.0.1]!
>>> model_dir = snapshot_download("Xorbits/bge-m3",cache_dir="/mnt/workspace/bge-m3/", revision = "v0.0.1")
2024-07-31 22:58:30,265 - modelscope - INFO - Use user-specified model revision: v0.0.1
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 2.00M/2.00M [00:00<00:00, 3.81MB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 687/687 [00:00<00:00, 1.38kB/s]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 191/191 [00:00<00:00, 313B/s]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 123/123 [00:00<00:00, 194B/s]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 181/181 [00:00<00:00, 350B/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 213k/213k [00:00<00:00, 525kB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 196k/196k [00:00<00:00, 456kB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 318k/318k [00:00<00:00, 695kB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 2.12G/2.12G [00:11<00:00, 193MB/s]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 349/349 [00:00<00:00, 686B/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 2.12G/2.12G [00:11<00:00, 201MB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1.32k/1.32k [00:00<00:00, 2.24kB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 54.0/54.0 [00:00<00:00, 107B/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 4.83M/4.83M [00:00<00:00, 9.73MB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 3.43k/3.43k [00:00<00:00, 6.67kB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 964/964 [00:00<00:00, 1.94kB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 16.3M/16.3M [00:00<00:00, 24.1MB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1.28k/1.28k [00:00<00:00, 2.52kB/s]
>>> (chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> mv /mnt/workspace/bge-m3/Xorbits/bge-m3 /mnt/workspace/chatglm3/ChatGLM3
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> ll /mnt/workspace/chatglm3/ChatGLM3/bge-m3/
total 4459532
drwxrwxrwx  4 root root       4096 Jul 31 22:59 ./
drwxrwxrwx 16 root root       4096 Jul 31 23:02 ../
drwxrwxrwx  2 root root       4096 Jul 31 22:58 1_Pooling/
-rw-rw-rw-  1 root root    2100674 Jul 31 22:58 colbert_linear.pt
-rw-rw-rw-  1 root root        687 Jul 31 22:58 config.json
-rw-rw-rw-  1 root root        123 Jul 31 22:58 config_sentence_transformers.json
-rw-rw-rw-  1 root root        181 Jul 31 22:58 configuration.json
drwxrwxrwx  2 root root       4096 Jul 31 22:58 imgs/
-rw-rw-rw-  1 root root         37 Jul 31 22:58 .mdl
-rw-rw-rw-  1 root root 2271064456 Jul 31 22:58 model.safetensors
-rw-rw-rw-  1 root root        349 Jul 31 22:58 modules.json
-rw-------  1 root root       1320 Jul 31 22:59 .msc
-rw-rw-rw-  1 root root         36 Jul 31 22:59 .mv
-rw-rw-rw-  1 root root 2271145830 Jul 31 22:59 pytorch_model.bin
-rw-rw-rw-  1 root root       1356 Jul 31 22:59 README.md
-rw-rw-rw-  1 root root         54 Jul 31 22:59 sentence_bert_config.json
-rw-rw-rw-  1 root root    5069051 Jul 31 22:59 sentencepiece.bpe.model
-rw-rw-rw-  1 root root       3516 Jul 31 22:59 sparse_linear.pt
-rw-rw-rw-  1 root root        964 Jul 31 22:59 special_tokens_map.json
-rw-rw-rw-  1 root root       1313 Jul 31 22:59 tokenizer_config.json
-rw-rw-rw-  1 root root   17098108 Jul 31 22:59 tokenizer.json
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo>

修改bge-m3的地址,再次执行

# set Embedding Model path
#EMBEDDING_PATH = os.environ.get('EMBEDDING_PATH', 'BAAI/bge-m3')
EMBEDDING_PATH = os.environ.get('EMBEDDING_PATH', '../bge-m3')(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> python api_server.py
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00,  4.68it/s]
Traceback (most recent call last):File "/mnt/workspace/chatglm3/ChatGLM3/openai_api_demo/api_server.py", line 537, in <module>embedding_model = SentenceTransformer(EMBEDDING_PATH, device="cuda")^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/sentence_transformers/SentenceTransformer.py", line 316, in __init__self.to(device)File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1173, in toreturn self._apply(convert)^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/torch/nn/modules/module.py", line 779, in _applymodule._apply(fn)File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/torch/nn/modules/module.py", line 779, in _applymodule._apply(fn)File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/torch/nn/modules/module.py", line 779, in _applymodule._apply(fn)[Previous line repeated 1 more time]File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/torch/nn/modules/module.py", line 804, in _applyparam_applied = fn(param)^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1159, in convertreturn t.to(^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/torch/cuda/__init__.py", line 293, in _lazy_inittorch._C._cuda_init()
RuntimeError: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> # 参考之前的经验,尝试修改
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> vi api_server.py #embedding_model = SentenceTransformer(EMBEDDING_PATH, device="cuda")embedding_model = SentenceTransformer(EMBEDDING_PATH, device_map="auto")# 报错了,参数有问题
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> python api_server.py
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00,  4.78it/s]
Traceback (most recent call last):File "/mnt/workspace/chatglm3/ChatGLM3/openai_api_demo/api_server.py", line 538, in <module>embedding_model = SentenceTransformer(EMBEDDING_PATH, device_map="auto")^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: SentenceTransformer.__init__() got an unexpected keyword argument 'device_map'
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> # 修改成 device="auto"# load Embedding#embedding_model = SentenceTransformer(EMBEDDING_PATH, device="cuda")embedding_model = SentenceTransformer(EMBEDDING_PATH, device="auto")# 看这里的参数介绍,看起来可能必须填cuda了
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> python api_server.py
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00,  4.71it/s]
Traceback (most recent call last):File "/mnt/workspace/chatglm3/ChatGLM3/openai_api_demo/api_server.py", line 538, in <module>embedding_model = SentenceTransformer(EMBEDDING_PATH, device="auto")^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/sentence_transformers/SentenceTransformer.py", line 316, in __init__self.to(device)File "/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1137, in todevice, dtype, non_blocking, convert_to_format = torch._C._nn._parse_to(*args, **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected one of cpu, cuda, ipu, xpu, mkldnn, opengl, opencl, ideep, hip, ve, fpga, ort, xla, lazy, vulkan, mps, meta, hpu, mtia, privateuseone device type at start of device string: auto
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> 

发现是由于之前下载的pytorch怎么无法使用了,再次下载,然后可以执行python api_server.py了

# 奇怪,一开始就下载过pytorch呀,当时返回还是True
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> python
Python 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
/home/pai/envs/chatglm3_test/lib/python3.11/site-packages/torch/cuda/__init__.py:118: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)return torch._C._cuda_getDeviceCount() > 0
False
>>> 
# 重新下载
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=11.8 -c pytorch -c nvidia
Collecting package metadata (current_repodata.json): done
# ok了
(chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> python
Python 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
True
>>> (chatglm3_test) /mnt/workspace/chatglm3/ChatGLM3/openai_api_demo> python api_server.py
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:06<00:00,  1.15it/s]
INFO:     Started server process [10998]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

这次反应很快
在这里插入图片描述

高效微调

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/388126.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

HTTP协议详解(一)

协议 为了使数据在网络上从源头到达目的&#xff0c;网络通信的参与方必须遵循相同的规则&#xff0c;这套规则称为协议&#xff0c;它最终体现为在网络上传输的数据包的格式。 一、HTTP 协议介绍 HTTP&#xff08;Hyper Text Transfer Protocol&#xff09;&#xff1a; 全…

Monorepo简介

Monorepo 第一章&#xff1a;与Monorepo的邂逅第二章&#xff1a;Multirepo的困境第三章&#xff1a;Monorepo的魔力 - 不可思议的解决问题能力第四章&#xff1a;Monorepo的挑战与应对策略第五章&#xff1a;总结第六章&#xff1a;参考 第一章&#xff1a;与Monorepo的邂逅 …

【AI大模型】分布式训练:深入探索与实践优化

欢迎来到 破晓的历程的 博客 ⛺️不负时光&#xff0c;不负己✈️ 文章目录 一、分布式训练的核心原理二、技术细节与实现框架1. 数据并行与模型并行2. 主流框架 三、面临的挑战与优化策略1. 通信开销2. 数据一致性3. 负载均衡 4.使用示例示例一&#xff1a;TensorFlow中的数据…

VAE、GAN与Transformer核心公式解析

VAE、GAN与Transformer核心公式解析 VAE、GAN与Transformer&#xff1a;三大深度学习模型的异同解析 【表格】VAE、GAN与Transformer的对比分析 序号对比维度VAE&#xff08;变分自编码器&#xff09;GAN&#xff08;生成对抗网络&#xff09;Transformer&#xff08;变换器&…

设计师的素材管理神器,eagle、千鹿大测评

前言 专业的设计师都会精心维护自己的个人素材库&#xff0c;常常需要耗费大量时间用于浏览采集、分类标注、预览筛选、分享协作&#xff0c;还要管理字体、图片、音视频等各类设计素材 如果你作为设计师的话&#xff0c;今天&#xff0c;就为大家带来两款热门的素材管理工具…

SpringMVC中的常用注解

目录 SpringMVC的定义 SpringMVC的常用注解 获取Cookie和Session SpringMVC的定义 Spring Web MVC 是基于 Servlet API 构建的原始 Web 框架&#xff0c;从⼀开始就包含在 Spring 框架中。它的正式名称“Spring Web MVC”来⾃其源模块的名称(Spring-webmvc)&#xff0c;但它…

全麦饼:健康与美味的完美结合

在追求健康饮食的当下&#xff0c;全麦饼以其独特的魅力脱颖而出&#xff0c;成为了众多美食爱好者的新宠。食家巷全麦饼&#xff0c;顾名思义&#xff0c;主要由全麦面粉制作而成。与普通面粉相比&#xff0c;全麦面粉保留了小麦的麸皮、胚芽和胚乳&#xff0c;富含更多的膳食…

免费聊天回复神器微信小程序

客服在手机上通过微信聊天&#xff0c;回复客户咨询的时候&#xff0c;如果想把整理好的话术一键发给客户&#xff0c;又不想切换微信聊天窗口&#xff0c;微信小程序是一个很好的选择 微信小程序支持微信聊天 客服在手机上通过微信聊天&#xff0c;回复客户咨询的时候&#x…

Shell编程——简介和基础语法(1)

文章目录 Shell简介什么是ShellShell环境第一个Shell脚本Shell脚本的运行方法 Shell基础语法Shell变量Shell传递参数Shell字符串Shell字符串截取Shell数组Shell运算符 Shell简介 什么是Shell Shell是一种程序设计语言。作为命令语言&#xff0c;它交互式解释和执行用户输入的命…

linux进程控制——进程等待——wait、waitpid

前言&#xff1a;本节内容仍然是进程的控制&#xff0c;上一节博主讲解的是进程控制里面的进程创建、进程退出、终止。本节内容将讲到进程的等待——等待是为了能够将子进程的资源回收&#xff0c;是父进程等待子进程。 我们前面的章节也提到过等待&#xff0c; 那里的等待是进…

ThreadPoolExecutor工作原理及源码详解

一、前言 创建一个线程可以通过继承Thread类或实现Runnable接口来实现&#xff0c;这两种方式创建的线程在运行结束后会被虚拟机回收并销毁。若线程数量过多&#xff0c;频繁的创建和销毁线程会浪费资源&#xff0c;降低效率。而线程池的引入就很好解决了上述问题&#xff0c;…

计算机组成原理---机器中的数字表示

二进制&#xff0c;八进制&#xff0c;十六进制之间转化 十进制转二进制 75.3的整数部分75&#xff1a; 75.3小数部分0.3&#xff1a; 原则&#xff1a;1.先除r/乘r得到的是结果部分中接近小数点的数字 2.都是取结果一部分&#xff08;余数/整数部分&#xff09;&#xff0c;使…

51单片机15(直流电机实验)

一、序言&#xff1a;我们知道在单片机当中&#xff0c;直流电机的控制也是非常多的&#xff0c;所以有必要了解一些这个电机相关的一些知识&#xff0c;以及如何使用单片机来控制这个电机&#xff0c;那么在没有学习PWM之前&#xff0c;我们先简单的使用GPIO这个管脚来控制电机…

npm提示 certificate has expired 证书已过期 已解决

在用npm新建项目时&#xff0c;突然发现报错提示 : certificate has expired 证书已过期 了解一下&#xff0c;在网络通信中&#xff0c;HTTPS 是一种通过 SSL/TLS 加密的安全 HTTP 通信协议。证书在 HTTPS 中扮演着至关重要的角色&#xff0c;用于验证服务器身份并加密数据传输…

vue实现电子签名、图片合成、及预览功能

业务功能&#xff1a;电子签名、图片合成、及预览功能 业务背景&#xff1a;需求说想要实现一个电子签名&#xff0c;然后需要提供一个预览的功能&#xff0c;可以查看签完名之后的完整效果。 需求探讨&#xff1a;后端大佬跟我说&#xff0c;文档我返回给你一个PDF的oss链接…

【书生大模型实战营(暑假场)】入门任务一 Linux+InternStudio 关卡

入门任务一 LinuxInternStudio 关卡 参考&#xff1a; 教程任务 1 闯关任务 1.1 基于 VScode 的 SSH 链接 感谢官方教程的清晰指引&#xff0c;基于VS code 实现 SSH 的链接并不困难&#xff0c;完成公钥配之后&#xff0c;可以实现快速一键链接&#xff0c;链接后效果如下…

XXE -靶机

XXE靶机 一.扫描端口 进入xxe靶机 1.1然后进入到kali里 使用namp 扫描一下靶机开放端口等信息 1.2扫描他的目录 二 利用获取的信息 进入到 robots.txt 按他给出的信息 去访问xss 是一个登陆界面 admin.php 也是一个登陆界面 我们访问xss登陆界面 随便输 打开burpsuite抓包 发…

【MySQL】事务 【下】{重点了解读-写 4个记录隐藏列字段 undo log日志 模拟MVCC Read View sel}

文章目录 1.MVCC数据库并发的场景重点了解 读-写4个记录隐藏列字段 2.理解事务undo log日志mysql日志简介 模拟MVCC 3.Read Viewselect lock in share modeMVCC流程RR与RC 1.MVCC MVCC&#xff08;Multi-Version Concurrency Control&#xff0c;多版本并发控制&#xff09;是…

20240801 每日AI必读资讯

&#x1f50a;OpenAI向ChatGPT Plus用户推出高级语音模式 - 只给一小部分Plus用户推送&#xff0c;全部Plus用户要等到秋季 - 被选中的Alpha 测试的用户将收到一封包含说明的电子邮件&#xff0c;并在其移动应用中收到一条消息。 - 同时视频和屏幕共享功能继续推出&#xff…

ElasticSearch父子索引实战

关于父子索引 ES底层是Lucene,由于Lucene实际上是不支持嵌套类型的,所有文档都是以扁平的结构存储在Lucene中,ES对父子文档的支持,实际上也是采取了一种投机取巧的方式实现的. 父子文档均以独立的文档存入,然后添加关联关系,且父子文档必须在同一分片,由于父子类型文档并没有…