llama-3.1
- 下载
- 部署
下载
huggingface
详情页填写申请后等待审核
点击 头像->setting->access token 创建token
配置环境变量
下载模型
pip install -U huggingface_hub
huggingface-cli download --resume-download meta-llama/Meta-Llama-3.1-8B-Instruct --local-dir E:\codes\model\meta-llama\Meta-Llama-3.1-8B-Instruct --local-dir-use-symlinks False --resume-download --token xxxxx
对于Linux系统
export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download --resume-download meta-llama/Meta-Llama-3.1-8B-Instruct --local-dir /home/model/meta-llama/Meta-Llama-3.1-8B-Instruct --local-dir-use-symlinks False --resume-download --token xxxxx
使用wget一次下载单个文件
wget --header "Authorization: Bearer 你的token" https://hf-mirror.com/meta-llama/Meta-Llama-3.1-8B/resolve/main/model-00003-of-00004.safetensors
部署
环境python3.10
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu121
pip install transformers==4.43.2 numpy==1.26.4 bitsandbytes==0.43.3 accelerate==0.33.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
from transformers import pipeline
import torchmodel_id = r"E:\codes\model\meta-llama\Meta-Llama-3.1-8B-Instruct"# pipeline = pipeline(
# "text-generation",
# model=model_id,
# model_kwargs={"torch_dtype": torch.bfloat16},
# device_map="auto",
# )
pipeline = pipeline("text-generation",model=model_id,model_kwargs={"torch_dtype": torch.bfloat16,"quantization_config": {"load_in_4bit": True}},
)messages = [{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},{"role": "user", "content": "Who are you?"},
]outputs = pipeline(messages,max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])