llama-3.1下载部署

llama-3.1

下载
部署

下载

huggingface

详情页填写申请后等待审核
在这里插入图片描述
点击头像->setting->access token 创建token

配置环境变量

下载模型

pip install -U huggingface_hub

huggingface-cli download --resume-download meta-llama/Meta-Llama-3.1-8B-Instruct --local-dir E:\codes\model\meta-llama\Meta-Llama-3.1-8B-Instruct --local-dir-use-symlinks False --resume-download --token xxxxx

在这里插入图片描述
对于Linux系统

export HF_ENDPOINT=https://hf-mirror.com

huggingface-cli download --resume-download meta-llama/Meta-Llama-3.1-8B-Instruct --local-dir /home/model/meta-llama/Meta-Llama-3.1-8B-Instruct --local-dir-use-symlinks False --resume-download --token xxxxx

使用wget一次下载单个文件

wget --header "Authorization: Bearer 你的token" https://hf-mirror.com/meta-llama/Meta-Llama-3.1-8B/resolve/main/model-00003-of-00004.safetensors

部署

环境python3.10

pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu121

pip install transformers==4.43.2 numpy==1.26.4 bitsandbytes==0.43.3 accelerate==0.33.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

from transformers import pipeline
import torchmodel_id = r"E:\codes\model\meta-llama\Meta-Llama-3.1-8B-Instruct"# pipeline = pipeline(
#     "text-generation",
#     model=model_id,
#     model_kwargs={"torch_dtype": torch.bfloat16},
#     device_map="auto",
# )
pipeline = pipeline("text-generation",model=model_id,model_kwargs={"torch_dtype": torch.bfloat16,"quantization_config": {"load_in_4bit": True}},
)messages = [{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},{"role": "user", "content": "Who are you?"},
]outputs = pipeline(messages,max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])