书生·浦语大模型实战营之Llama 3 高效部署实践(LMDeploy 版)

书生·浦语大模型实战营之Llama 3 高效部署实践(LMDeploy 版)

在这里插入图片描述

在这里插入图片描述
在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

  • 环境,模型准备
  • LMDeploy chat
  • Turmind和Transformer的速度对比
  • LMDeploy模型量化(lite)
  • LMDeploy服务(serve)

环境,模型准备

InternStudio 可以直接使用

 studio-conda -t Llama3_lmdeploy  -o pytorch-2.1.2

在这里插入图片描述

在这里插入图片描述

Llama3 的下载

软链接 InternStudio 中的模型

mkdir -p ~/model
ln -s /root/share/new_models/meta-llama/Meta-Llama-3-8B-Instruct ~/model/Meta-Llama-3-8B-Instruct

LMDeploy chat

Huggingface与TurboMind

  • HuggingFace
    HuggingFace是一个高速发展的社区,包括Meta、Google、Microsoft、Amazon在内的超过5000家组织机构在为HuggingFace开源社区贡献代码、数据集和模型。可以认为是一个针对深度学习模型和数据集的在线托管社区,如果你有数据集或者模型想对外分享,网盘又不太方便,就不妨托管在HuggingFace。
    托管在HuggingFace社区的模型通常采用HuggingFace格式存储,简写为HF格式。
    但是HuggingFace社区的服务器在国外,国内访问不太方便。国内可以使用阿里巴巴的MindScope社区,或者上海AI Lab搭建的OpenXLab社区,上面托管的模型也通常采用HF格式。
  • TurboMind
    TurboMind是LMDeploy团队开发的一款关于LLM推理的高效推理引擎,它的主要功能包括:LLaMa 结构模型的支持,continuous batch 推理模式和可扩展的 KV 缓存管理器。
    TurboMind推理引擎仅支持推理TurboMind格式的模型。因此,TurboMind在推理HF格式的模型时,会首先自动将HF格式模型转换为TurboMind格式的模型。该过程在新版本的LMDeploy中是自动进行的,无需用户操作。

几个容易迷惑的点:

  • TurboMind与LMDeploy的关系:LMDeploy是涵盖了LLM 任务全套轻量化、部署和服务解决方案的集成功能包,TurboMind是LMDeploy的一个推理引擎,是一个子模块。LMDeploy也可以使用pytorch作为推理引擎。
  • TurboMind与TurboMind模型的关系:TurboMind是推理引擎的名字,TurboMind模型是一种模型存储格式,TurboMind引擎只能推理TurboMind格式的模型。

使用Transformer库运行模型
使用Transformer库之前需要确定安装的是最新版本

pip install transformers==4.40.0

在这里插入图片描述
运行touch /root/pipeline_transformer.py 将下面代码复制进去,然后保存

import torch
from transformers import AutoTokenizer, AutoModelForCausalLMtokenizer = AutoTokenizer.from_pretrained("/root/model/Meta-Llama-3-8B-Instruct", trust_remote_code=True)# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and cause OOM Error.
model = AutoModelForCausalLM.from_pretrained("/root/model/Meta-Llama-3-8B-Instruct", torch_dtype=torch.float16, trust_remote_code=True).cuda()
model = model.eval()messages = [{"role": "system", "content": "你现在是一个友好的机器人,回答的时候只能使用中文"},{"role": "user", "content": "你好"},
]input_ids = tokenizer.apply_chat_template(messages,add_generation_prompt=True,return_tensors="pt"
).to(model.device)terminators = [tokenizer.eos_token_id,tokenizer.convert_tokens_to_ids("<|eot_id|>")
]outputs = model.generate(input_ids,max_new_tokens=256,eos_token_id=terminators,do_sample=True,temperature=0.6,top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

执行代码:

python /root/pipeline_transformer.py

运行结果为:模型能够正常输出结果,那就表明下载的模型没有问题。

(Llama3_lmdeploy) root@intern-studio-061925:~# python /root/pipeline_transformer.py
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████| 4/4 [00:41<00:00, 10.42s/it]
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
你好!很高兴见到你!如何可以帮助你?
(Llama3_lmdeploy) root@intern-studio-061925:~#

接下来,可以使用lmdeploy进行对话交互。

使用LMDeploy与模型对话

直接在终端运行

lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct

Llama3模型在回答问题时倾向于使用英文,特别是对于稍微复杂的问题。简单的中文问题它会用中文回答,但是一旦问题变得复杂一些,就会全部使用英文。


(Llama3_lmdeploy) root@intern-studio-061925:~#
(Llama3_lmdeploy) root@intern-studio-061925:~# lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct
2024-04-24 09:38:44,886 - lmdeploy - WARNING - Fallback to pytorch engine because turbomind engine is not installed correctly. If you insist to use turbomind engine, you may need to reinstall lmdeploy from pypi or build from source and try again.
2024-04-24 09:38:57,493 - lmdeploy - INFO - Checking environment for PyTorch Engine.
2024-04-24 09:39:00,136 - lmdeploy - INFO - Checking model.
2024-04-24 09:39:00,137 - lmdeploy - WARNING - LMDeploy requires transformers version: [4.33.0 ~ 4.38.2], but found version: 4.40.0
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████| 4/4 [00:37<00:00,  9.42s/it]
2024-04-24 09:39:48,210 - lmdeploy - INFO - build CacheEngine with config:CacheConfig(block_size=64, num_cpu_blocks=512, num_gpu_blocks=512, window_size=-1, cache_max_entry_count=0.8, max_prefill_token_num=4096)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
match template: <llama3>double enter to end input >>> 你好<|begin_of_text|><|start_header_id|>user<|end_header_id|>你好<|eot_id|><|start_header_id|>assistant<|end_header_id|>😊 !你好!double enter to end input >>> 北京有哪些著名景点<|start_header_id|>user<|end_header_id|>北京有哪些著名景点<|eot_id|><|start_header_id|>assistant<|end_header_id|>Beijing has many famous landmarks and attractions. Here are some of the most popular ones:1. **The Great Wall of China** (长城): A series of fortifications built across the northern borders of China, stretching over 4,000 miles.
2. **The Forbidden City** (Forbidden City): A palace complex that was the imperial palace of the Ming and Qing dynasties for over 500 years.
3. **Tiananmen Square** (天安门广场): One of the largest city squares in the world, surrounded by famous landmarks like the Forbidden City, Mao's Mausoleum, and the National Museum of China.
4. **Temple of Heaven** (天坛): A Taoist temple complex built in the 15th century, where emperors would worship and make sacrifices to heaven.
5. **Summer Palace** (颐和园): A beautiful palace complex with gardens, temples, and pavilions, built in the 18th century as a summer retreat for emperors.
6. **Ming Tombs** (明十三陵): The mausoleum of 13 Ming dynasty emperors, located about 45 miles northwest of Beijing.
7. **Hutongs** (胡同): Narrow alleys and traditional courtyard homes that are a reminder of old Beijing.
8. **Beihai Park** (北海公园): A large park with a lake, temples, and gardens, located in the heart of Beijing.
9. **National Grand Theater** (国家大剧院): A modern theater complex with a unique "egg" design, hosting various performances and events.
10. **Olympic Park** (奥林匹克公园): A large park built for the 2008 Beijing Olympics, featuring the iconic "Bird's Nest" and "Water Cube" stadiums.These are just a few of the many amazing attractions Beijing has to offer.double enter to end input >>>

在这里插入图片描述

因此需要修改prompt,打开/root/lmdeploy/lmdeploy/model.py

 (Llama3_lmdeploy) root@intern-studio-061925:~# cat /root/lmdeploy/lmdeploy/model.py
# Copyright (c) OpenMMLab. All rights reserved.
import dataclasses
import json
import uuid
from abc import abstractmethod
from typing import List, Literal, Optionalfrom mmengine import Registryfrom lmdeploy.utils import get_loggerlogger = get_logger('lmdeploy')
MODELS = Registry('model', locations=['lmdeploy.model'])def random_uuid() -> str:"""Return a random uuid."""return str(uuid.uuid4().hex)@dataclasses.dataclass
class ChatTemplateConfig:"""Parameters for chat template.Args:model_name (str): the name of the deployed model. Determine which chat template will be applied.All the chat template names: `lmdeploy list`system (str | None): begin of the system promptmeta_instruction (str | None): system prompteosys (str | None): end of the system promptuser (str | None): begin of the user prompteoh (str | None): end of the user promptassistant (str | None): begin of the assistant prompteoa (str | None): end of the assistant promptcapability: ('completion' | 'infilling' | 'chat' | 'python') = None"""  # noqa: E501model_name: strsystem: Optional[str] = Nonemeta_instruction: Optional[str] = Noneeosys: Optional[str] = Noneuser: Optional[str] = Noneeoh: Optional[str] = Noneassistant: Optional[str] = Noneeoa: Optional[str] = Noneseparator: Optional[str] = Nonecapability: Optional[Literal['completion', 'infilling', 'chat','python']] = Nonestop_words: Optional[List[str]] = None@propertydef chat_template(self):attrs = {key: valuefor key, value in dataclasses.asdict(self).items()if value is not None}attrs.pop('model_name', None)if self.model_name in MODELS.module_dict.keys():model: BaseModel = MODELS.get(self.model_name)(**attrs)else:logger.warning(f'Could not find {self.model_name} in registered models. 'f'Register {self.model_name} using the BaseChatTemplate.')model = BaseChatTemplate(**attrs)return modeldef to_json(self, file_path=None):"""Convert the dataclass instance to a JSON formatted string andoptionally save to a file."""json_str = json.dumps(dataclasses.asdict(self),ensure_ascii=False,indent=4)if file_path:with open(file_path, 'w', encoding='utf-8') as file:file.write(json_str)return json_str@classmethoddef from_json(cls, file_or_string):"""Construct a dataclass instance from a JSON file or JSON string."""try:# Try to open the input_data as a file pathwith open(file_or_string, 'r', encoding='utf-8') as file:json_data = file.read()except FileNotFoundError:# If it's not a file path, assume it's a JSON stringjson_data = file_or_stringexcept IOError:# If it's not a file path and not a valid JSON string, raise errorraise ValueError('Invalid input. Must be a file path or a valid JSON string.')json_data = json.loads(json_data)if json_data.get('model_name', None) is None:json_data['model_name'] = random_uuid()if json_data['model_name'] not in MODELS.module_dict.keys():MODELS.register_module(json_data['model_name'],module=BaseChatTemplate)return cls(**json_data)@MODELS.register_module(name='llama')
@MODELS.register_module(name='base')
class BaseModel:"""Base model."""def __init__(self,session_len=2048,capability='chat',stop_words=None,**kwargs):self.session_len = session_lenself.stop_words = stop_wordsself.capability = capabilitydef get_prompt(self, prompt, sequence_start=True):"""Return the prompt that is concatenated with other elements in thechat template.Args:prompt (str): user's input promptsequence_start (bool): indicator for the first round chat of asession sequenceReturns:str: the concatenated prompt"""return prompt@abstractmethoddef messages2prompt(self, messages, sequence_start=True):"""Return the prompt that is concatenated with other elements in thechat template. When messages arg is a string, returnself.get_prompt(messages). When messages arg is a chat history, returntranslated prompt from chat history.Args:messages (str | List): user's input promptReturns:str: the concatenated prompt"""if isinstance(messages, str):return self.get_prompt(messages)# chat history processing in derived classes@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""return Noneclass BaseChatTemplate(BaseModel):"""Base Chat template."""def __init__(self,system='',meta_instruction='',eosys='',user='',eoh='',assistant='',eoa='',separator='',**kwargs):super().__init__(**kwargs)self.system = systemself.meta_instruction = meta_instructionself.user = userself.eoh = eohself.eoa = eoaself.separator = separatorself.eosys = eosysself.assistant = assistantdef get_prompt(self, prompt, sequence_start=True):"""Return the prompt that is concatenated with other elements in thechat template.Args:prompt (str): user's input promptsequence_start (bool): indicator for the first round chat of asession sequenceReturns:str: the concatenated prompt"""if self.capability == 'completion':return promptif sequence_start:# None is different from ''if self.meta_instruction is not None:return f'{self.system}{self.meta_instruction}{self.eosys}' \f'{self.user}{prompt}{self.eoh}' \f'{self.assistant}'else:return f'{self.user}{prompt}{self.eoh}' \f'{self.assistant}'else:return f'{self.separator}{self.user}{prompt}{self.eoh}' \f'{self.assistant}'def messages2prompt(self, messages, sequence_start=True):"""Return the prompt that is concatenated with other elements in thechat template.Args:messages (str | List): user's input promptReturns:str: the concatenated prompt"""if isinstance(messages, str):return self.get_prompt(messages, sequence_start)box_map = dict(user=self.user,assistant=self.assistant,system=self.system)eox_map = dict(user=self.eoh,assistant=self.eoa + self.separator,system=self.eosys)ret = ''if self.meta_instruction is not None:if len(messages) and messages[0]['role'] != 'system':ret += f'{self.system}{self.meta_instruction}{self.eosys}'for message in messages:role = message['role']content = message['content']ret += f'{box_map[role]}{content}{eox_map[role]}'ret += f'{self.assistant}'return ret@MODELS.register_module(name='wizardlm')
@MODELS.register_module(name='vicuna')
class Vicuna(BaseChatTemplate):"""Chat template of vicuna model."""def __init__(self,meta_instruction="""A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.""",  # noqa: E501eosys=' ',user='USER: ',eoh=' ',assistant='ASSISTANT: ',eoa='</s>',stop_words=['</s>'],**kwargs):super().__init__(meta_instruction=meta_instruction,eosys=eosys,user=user,eoh=eoh,assistant=assistant,eoa=eoa,stop_words=stop_words,**kwargs)@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""path = model_path.lower()if 'vicuna' in path:return 'vicuna'if 'wizardlm' in path:return 'wizardlm'@MODELS.register_module(name='mini-gemini-vicuna')
class MiniGemini(Vicuna):"""Chat template of vicuna model."""def __init__(self, session_len=4096, **kwargs):super().__init__(session_len=session_len, **kwargs)def get_prompt(self, prompt, sequence_start=True):return super().get_prompt(prompt, sequence_start)[:-1]def messages2prompt(self, messages, sequence_start=True):return super().messages2prompt(messages, sequence_start)[:-1]@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""path = model_path.lower()if 'mini-gemini-7b' in path or 'mini-gemini-13b' in path:return 'mini-gemini-vicuna'@MODELS.register_module(name='internlm-chat')
@MODELS.register_module(name='internlm-chat-7b')
@MODELS.register_module(name='internlm')
class InternLMChat7B(BaseChatTemplate):"""Chat template of InternLM model."""def __init__(self,system='<|System|>:',meta_instruction="""You are an AI assistant whose name is InternLM (书生·浦语).
- InternLM (书生·浦语) is a conversational language model that is developed by Shanghai AI Laboratory (上海人工智能实验室). It is designed to be helpful, honest, and harmless.
- InternLM (书生·浦语) can understand and communicate fluently in the language chosen by the user such as English and 中文.
""",  # noqa: E501eosys='\n',user='<|User|>:',eoh='\n',assistant='<|Bot|>:',eoa='<eoa>',separator='\n',stop_words=['<eoa>'],**kwargs):super().__init__(system=system,meta_instruction=meta_instruction,eosys=eosys,user=user,eoh=eoh,assistant=assistant,eoa=eoa,separator=separator,stop_words=stop_words,**kwargs)@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""path = model_path.lower()if all([c not in path for c in ['internlm2', '8k']]) and \all([c in path for c in ['internlm', 'chat']]):return 'internlm'@MODELS.register_module(name='internlm-chat-20b')
@MODELS.register_module(name='internlm-chat-7b-8k')
class InternLMChat7B8K(InternLMChat7B):"""Chat template and generation parameters of InternLM-Chat-7B-8K andInternLM-Chat-20B models."""def __init__(self, session_len=8192, **kwargs):super(InternLMChat7B8K, self).__init__(**kwargs)self.session_len = session_len@MODELS.register_module(name='internlm-20b')
class InternLMBaseModel20B(BaseChatTemplate):"""Generation parameters of InternLM-20B-Base model."""def __init__(self, session_len=4096, capability='completion', **kwargs):super().__init__(session_len=session_len,capability=capability,**kwargs)@MODELS.register_module(name=['internlm2-1_8b', 'internlm2-7b', 'internlm2-20b'])
class InternLM2BaseModel7B(BaseChatTemplate):"""Generation parameters of InternLM2-7B-Base model."""def __init__(self, session_len=32768, capability='completion', **kwargs):super().__init__(session_len=session_len,capability=capability,**kwargs)@MODELS.register_module(name=['internlm2-chat', 'internlm2-chat-1_8b', 'internlm2-chat-7b','internlm2-chat-20b'
])
@MODELS.register_module(name='internlm2')
class InternLM2Chat7B(InternLMChat7B):"""Chat template and generation parameters of InternLM2-Chat-7B."""def __init__(self,session_len=32768,system='<|im_start|>system\n',user='<|im_start|>user\n',assistant='<|im_start|>assistant\n',environment='<|im_start|>environment\n',plugin='<|plugin|>',interpreter='<|interpreter|>',eosys='<|im_end|>\n',eoh='<|im_end|>\n',eoa='<|im_end|>',eoenv='<|im_end|>\n',separator='\n',stop_words=['<|im_end|>', '<|action_end|>'],**kwargs):self.plugin = pluginself.interpreter = interpreterself.environment = environmentself.eoenv = eoenvsuper(InternLM2Chat7B, self).__init__(session_len=session_len,system=system,user=user,assistant=assistant,eosys=eosys,eoh=eoh,eoa=eoa,separator=separator,stop_words=stop_words,**kwargs)@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""path = model_path.lower()if 'internlm2' in path and ('chat' in path or 'math' in path):return 'internlm2'def messages2prompt(self, messages, sequence_start=True):"""Return the prompt that is concatenated with other elements in thechat template.Args:messages (str | List): user's input promptReturns:str: the concatenated prompt"""if isinstance(messages, str):return self.get_prompt(messages, sequence_start)box_map = dict(user=self.user,assistant=self.assistant,system=self.system,environment=self.environment)eox_map = dict(user=self.eoh,assistant=self.eoa + self.separator,system=self.eosys,environment=self.eoenv)name_map = dict(plugin=self.plugin, interpreter=self.interpreter)ret = ''if self.meta_instruction is not None:if len(messages) and messages[0]['role'] != 'system':ret += f'{self.system}{self.meta_instruction}{self.eosys}'for message in messages:role = message['role']content = message['content']begin = box_map[role].strip() + f" name={name_map[message['name']]}\n" if 'name' in message else box_map[role]ret += f'{begin}{content}{eox_map[role]}'ret += f'{self.assistant}'return ret@MODELS.register_module(name='internlm-xcomposer2')
class InternLMXComposer2Chat7B(InternLMChat7B):"""Chat template and generation parameters of InternLM-XComposer2-7b."""def __init__(self,session_len=4096,system='[UNUSED_TOKEN_146]system\n',meta_instruction="""You are an AI assistant whose name is InternLM-XComposer (浦语·灵笔).
- InternLM-XComposer (浦语·灵笔) is a multi-modality conversational language model that is developed by Shanghai AI Laboratory (上海人工智能实验室). It is designed to be helpful, honest, and harmless.
- InternLM-XComposer (浦语·灵笔) can understand and communicate fluently in the language chosen by the user such as English and 中文.
- InternLM-XComposer (浦语·灵笔) is capable of comprehending and articulating responses effectively based on the provided image.""",user='[UNUSED_TOKEN_146]user\n',assistant='[UNUSED_TOKEN_146]assistant\n',eosys='[UNUSED_TOKEN_145]\n',eoh='[UNUSED_TOKEN_145]\n',eoa='[UNUSED_TOKEN_145]\n',separator='\n',stop_words=['[UNUSED_TOKEN_145]'],**kwargs):super().__init__(session_len=session_len,system=system,meta_instruction=meta_instruction,user=user,assistant=assistant,eosys=eosys,eoh=eoh,eoa=eoa,separator=separator,stop_words=stop_words,**kwargs)@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""path = model_path.lower()if 'internlm' in path and 'xcomposer2' in path and '4khd' not in path:return 'internlm-xcomposer2'@MODELS.register_module(name='internlm-xcomposer2-4khd')
class InternLMXComposer24khdChat7B(InternLMXComposer2Chat7B):"""Chat template and generation parameters of InternLM-XComposer2-4khd-7b."""def __init__(self, session_len=16384, **kwargs):super().__init__(session_len=session_len, **kwargs)@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""path = model_path.lower()if 'internlm' in path and 'xcomposer2' in path and '4khd' in path:return 'internlm-xcomposer2-4khd'@MODELS.register_module(name='baichuan-7b')
@MODELS.register_module(name='baichuan-base')
class Baichuan7B(BaseChatTemplate):"""Generation parameters of Baichuan-7B base model."""def __init__(self, **kwargs):super().__init__(**kwargs)@MODELS.register_module(name='baichuan2-7b')
@MODELS.register_module(name='baichuan2')
class Baichuan2_7B(BaseChatTemplate):"""Chat template and generation parameters of Baichuan2-7B-Base andBaichuan2-7B-Chat models."""def __init__(self,user='<reserved_106>',assistant='<reserved_107>',**kwargs):super().__init__(user=user, assistant=assistant, **kwargs)@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""path = model_path.lower()if 'baichuan2' in path and 'chat' in path:return 'baichuan2'@MODELS.register_module(name='puyu')
class Puyu(BaseChatTemplate):"""Chat template of puyu model.This is only for internal usage in ShanghaiAI Laboratory."""def __init__(self,meta_instruction='',system='',eosys='',user='',eoh='',assistant='',eoa='',stop_words=None,**kwargs):super().__init__(meta_instruction=meta_instruction,system=system,eosys=eosys,user=user,eoh=eoh,assistant=assistant,eoa=eoa,stop_words=stop_words,**kwargs)@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""if 'puyu' in model_path.lower():return 'puyu'@MODELS.register_module(name=['llama2', 'llama-2', 'llama-2-chat'])
class Llama2(BaseChatTemplate):"""Chat template of LLaMA2 model."""def __init__(self,system='[INST] <<SYS>>\n',meta_instruction="""\
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.""",  # noqa: E501eosys='\n<</SYS>>\n\n',assistant=' [/INST] ',eoa='</s>',separator='<s>[INST] ',session_len=4096,**kwargs):super().__init__(system=system,meta_instruction=meta_instruction,eosys=eosys,assistant=assistant,eoa=eoa,separator=separator,session_len=session_len,**kwargs)@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""if 'llama-2' in model_path.lower() or 'llama2' in model_path.lower():return 'llama2'@MODELS.register_module(name='llama3')
class Llama3(BaseChatTemplate):"""Chat template of LLaMA3 model."""def __init__(self,system='<|start_header_id|>system<|end_header_id|>\n\n',meta_instruction=None,eosys='<|eot_id|>',assistant='<|start_header_id|>assistant<|end_header_id|>\n\n',eoa='<|eot_id|>',user='<|start_header_id|>user<|end_header_id|>\n\n',eoh='<|eot_id|>',stop_words=['<|eot_id|>', '<|end_of_text|>'],session_len=8192,**kwargs):super().__init__(system=system,meta_instruction=meta_instruction,eosys=eosys,assistant=assistant,eoa=eoa,user=user,eoh=eoh,stop_words=stop_words,session_len=session_len,**kwargs)def get_prompt(self, prompt, sequence_start=True):if sequence_start:return '<|begin_of_text|>' + super().get_prompt(prompt, sequence_start)return super().get_prompt(prompt, sequence_start)def messages2prompt(self, messages, sequence_start=True):if sequence_start and not isinstance(messages, str):return '<|begin_of_text|>' + super().messages2prompt(messages, sequence_start)[:-1]return super().messages2prompt(messages, sequence_start)[:-1]@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""if 'llama-3-' in model_path.lower() or 'llama3-' in model_path.lower():return 'llama3'@MODELS.register_module(name='qwen-14b')
@MODELS.register_module(name='qwen-7b')
@MODELS.register_module(name='qwen')
class Qwen7BChat(BaseChatTemplate):"""Chat template for Qwen-7B-Chat."""def __init__(self,session_len=8192,system='<|im_start|>system\n',meta_instruction='You are a helpful assistant.',eosys='<|im_end|>\n',user='<|im_start|>user\n',eoh='<|im_end|>\n',assistant='<|im_start|>assistant\n',eoa='<|im_end|>',separator='\n',stop_words=['<|im_end|>'],**kwargs):super().__init__(system=system,meta_instruction=meta_instruction,eosys=eosys,user=user,eoh=eoh,assistant=assistant,eoa=eoa,separator=separator,stop_words=stop_words,session_len=session_len,**kwargs)@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""if 'qwen' in model_path.lower():return 'qwen'@MODELS.register_module(name='codellama')
class CodeLlama(Llama2):def __init__(self,meta_instruction='',session_len=4096,suffix_first=False,stop_words=None,**kwargs):super().__init__(meta_instruction=meta_instruction,session_len=session_len,stop_words=stop_words,**kwargs)caps = ['completion', 'infilling', 'chat', 'python']assert self.capability in caps, \f'{self.capability} is not supported. ' \f'The supported capabilities are: {caps}'self.meta_instruction = meta_instructionself.session_len = session_lenself.suffix_first = suffix_firstself.stop_words = stop_wordsif self.capability == 'infilling':if self.stop_words is None:self.stop_words = ['<EOT>']def get_prompt(self, prompt, sequence_start=True):if self.capability == 'infilling':return self._infill_prompt(prompt)elif self.capability == 'chat':return super().get_prompt(prompt, sequence_start)else:  # python speicalistreturn promptdef _infill_prompt(self, prompt):prefix, suffix = prompt.split('<FILL>')if self.suffix_first:# format as "<PRE> <SUF>{suf} <MID> {pre}"prompt = f'<PRE> <SUF>{suffix} <MID> {prefix}'else:# format as "<PRE> {pre} <SUF>{suf} <MID>"prompt = f'<PRE> {prefix} <SUF>{suffix} <MID>'return prompt@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""if 'codellama' in model_path.lower():return 'codellama'@MODELS.register_module(name='falcon')
class Falcon(BaseModel):def __init__(self, **kwargs):super().__init__(**kwargs)@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""if 'falcon' in model_path.lower():return 'falcon'@MODELS.register_module(name='chatglm2-6b')
@MODELS.register_module(name='chatglm')
class ChatGLM2(BaseModel):def __init__(self,user='问:',eoh='\n\n',assistant='答:',eoa='\n\n',**kwargs):super().__init__(**kwargs)self._user = userself._assistant = assistantself._eoh = eohself._eoa = eoaself.count = 0def get_prompt(self, prompt, sequence_start=True):"""get prompt."""# need more check# https://github.com/THUDM/ChatGLM2-6B/issues/48# [64790, 64792] to be prependedself.count += 1ret = f'[Round {self.count}]\n\n'ret += f'{self._user}{prompt}{self._eoh}'ret += f'{self._assistant}'return retdef messages2prompt(self, messages, sequence_start=True):"""message to prompt."""if isinstance(messages, str):return self.get_prompt(messages, sequence_start)ret = ''count = 0for message in messages:role = message['role']content = message['content']if role == 'user':count += 1ret += f'[Round {count}]\n\n'ret += f'{self._user}{content}{self._eoh}'ret += f'{self._assistant}'if role == 'assistant':ret += f'{content}'return ret@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""if 'chatglm' in model_path.lower():return 'chatglm'@MODELS.register_module(name=['solar', 'solar-70b'])
class SOLAR(BaseChatTemplate):"""Chat template of SOLAR model.`https://huggingface.co/upstage/SOLAR-0-70b-16bit`"""def __init__(self,system='### System:\n',eosys='\n\n',user='### User:\n',eoh='\n\n',assistant='### Assistant:\n',meta_instruction='',session_len=2048,**kwargs):super().__init__(**kwargs)self.system = systemself.eosys = eosysself.user = userself.eoh = eohself.assistant = assistantself.meta_instruction = meta_instructionself.session_len = session_len@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""if 'solar' in model_path.lower():return 'solar'@MODELS.register_module(name='ultracm')
@MODELS.register_module(name='ultralm')
class UltraChat(BaseChatTemplate):"""Template of UltraCM and UltraLM models.`https://huggingface.co/openbmb/UltraCM-13b``https://huggingface.co/openbmb/UltraLM-13b`"""def __init__(self,system='User: ',meta_instruction="""A one-turn chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, very detailed, and polite answers to the user's questions.""",  # noqa: E501eosys='</s>\n',user='User: ',eoh='</s>\n',assistant='Assistant: ',eoa='</s>',separator='\n',stop_words=['</s>'],session_len=2048,**kwargs):super().__init__(system=system,meta_instruction=meta_instruction,eosys=eosys,user=user,eoh=eoh,assistant=assistant,eoa=eoa,separator=separator,stop_words=stop_words,session_len=session_len,**kwargs)@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""if 'ultracm' in model_path.lower():return 'ultracm'if 'ultralm' in model_path.lower():return 'ultralm'@MODELS.register_module(name=['yi', 'yi-chat', 'yi-200k', 'yi-34b'])
class Yi(BaseChatTemplate):"""Chat template of Yi model."""def __init__(self,system='<|im_start|>system\n',meta_instruction=None,eosys='<|im_end|>\n',user='<|im_start|>user\n',eoh='<|im_end|>\n',assistant='<|im_start|>assistant\n',eoa='<|im_end|>',separator='\n',stop_words=['<|im_end|>', '<|endoftext|>'],**kwargs):super().__init__(system=system,meta_instruction=meta_instruction,eosys=eosys,user=user,eoh=eoh,assistant=assistant,eoa=eoa,separator=separator,stop_words=stop_words,**kwargs)@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""path = model_path.lower()if 'yi' in path and 'vl' not in path:return 'yi'@MODELS.register_module(name=['mistral', 'mixtral'])
@MODELS.register_module(name=['Mistral-7B-Instruct', 'Mixtral-8x7B-Instruct'])
class MistralChat(BaseChatTemplate):"""Template of Mistral and Mixtral Instruct models.`https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1``https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1`"""def __init__(self,user='[INST] ',eoh=' [/INST]',eoa='</s>',session_len=2048,**kwargs):super().__init__(user=user,eoh=eoh,eoa=eoa,session_len=session_len,**kwargs)@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""if 'instruct' in model_path.lower():if 'mistral' in model_path.lower():return 'mistral'if 'mixtral' in model_path.lower():return 'mixtral'@MODELS.register_module(name=['gemma'])
class Gemma(BaseChatTemplate):"""Template of Gemma models.`https://huggingface.co/google/gemma-7b-it`"""def __init__(self,user='<start_of_turn>user\n',eoh='<end_of_turn>\n',assistant='<start_of_turn>model\n',eoa='<end_of_turn>\n',**kwargs):super().__init__(user=user,eoh=eoh,assistant=assistant,eoa=eoa,**kwargs)@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""if 'gemma' in model_path.lower():return 'gemma'@MODELS.register_module(name=['deepseek-chat'])
@MODELS.register_module(name=['deepseek'])
class Deepseek(BaseChatTemplate):def __init__(self,user='User: ',eoh='\n\n',assistant='Assistant: ',eoa='<|end▁of▁sentence|>',**kwargs):super().__init__(user=user,eoh=eoh,assistant=assistant,eoa=eoa,**kwargs)def get_prompt(self, prompt, sequence_start=True):return super().get_prompt(prompt, sequence_start)[:-1]def messages2prompt(self, messages, sequence_start=True):return super().messages2prompt(messages, sequence_start)[:-1]@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""path = model_path.lower()if 'deepseek' in path and 'chat' in path and 'vl' not in path:return 'deepseek'@MODELS.register_module(name=['internvl-zh'])
class InternVLZH(BaseChatTemplate):def __init__(self,user='<human>: ',eoh=' ',assistant='<bot>: ',eoa='</s>',session_len=4096,**kwargs):super().__init__(user=user,eoh=eoh,assistant=assistant,eoa=eoa,session_len=session_len,**kwargs)def get_prompt(self, prompt, sequence_start=True):return super().get_prompt(prompt, sequence_start)[:-1]def messages2prompt(self, messages, sequence_start=True):return super().messages2prompt(messages, sequence_start)[:-1]@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""path = model_path.lower()if 'internvl-chat-chinese' in path and 'v1-1' in path:return 'internvl-zh'@MODELS.register_module(name=['deepseek-vl'])
class DeepseekVL(BaseChatTemplate):def __init__(self,meta_instruction="""You are a helpful language and vision assistant. You are able to understand the visual content that the user provides, and assist the user with a variety of tasks using natural language.""",  # noqa: E501eosys='\n\n',user='User: ',eoh='\n\n',assistant='Assistant: ',eoa='<|end▁of▁sentence|>',**kwargs):super().__init__(meta_instruction=meta_instruction,eosys=eosys,user=user,eoh=eoh,assistant=assistant,eoa=eoa,**kwargs)@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""path = model_path.lower()if 'deepseek-vl' in path and 'chat' in path:return 'deepseek-vl'@MODELS.register_module(name='deepseek-coder')
class DeepSeek(BaseChatTemplate):"""Chat template of deepseek model."""def __init__(self,session_len=4096,system='',meta_instruction="""You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer\n""",  # noqa: E501eosys='',user='### Instruction:\n',eoh='\n',assistant='### Response:\n',eoa='\n<|EOT|>',separator='\n',stop_words=['<|EOT|>'],**kwargs):super().__init__(session_len=session_len,system=system,meta_instruction=meta_instruction,eosys=eosys,user=user,eoh=eoh,assistant=assistant,eoa=eoa,separator=separator,stop_words=stop_words,**kwargs)@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""path = model_path.lower()if 'deepseek-coder' in path:return 'deepseek-coder'@MODELS.register_module(name=['yi-vl'])
class YiVL(BaseChatTemplate):def __init__(self,meta_instruction="""This is a chat between an inquisitive human and an AI assistant. Assume the role of the AI assistant. Read all the images carefully, and respond to the human's questions with informative, helpful, detailed and polite answers. 这是一个好奇的人类和一个人工智能助手之间的对话。假设你扮演这个AI助手的角色。仔细阅读所有的图像,并对人类的问题做出信息丰富、有帮助、详细的和礼貌的回答。\n\n""",  # noqa: E501user='### Human: ',eoh='\n',assistant='### Assistant:',eoa='\n',stop_words=['###'],**kwargs):super().__init__(meta_instruction=meta_instruction,user=user,eoh=eoh,assistant=assistant,eoa=eoa,stop_words=stop_words,**kwargs)@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""path = model_path.lower()if 'yi-vl' in path:return 'yi-vl'# flake8: noqa: E501
def dbrx_system_prompt():# This is inspired by the Claude3 prompt.# source: https://twitter.com/AmandaAskell/status/1765207842993434880# Identity and knowledgeprompt = 'You are DBRX, created by Databricks. You were last updated in December 2023. You answer questions based on information available up to that point.\n'prompt += 'YOU PROVIDE SHORT RESPONSES TO SHORT QUESTIONS OR STATEMENTS, but provide thorough responses to more complex and open-ended questions.\n'# Capabilities (and reminder to use ```for JSON blocks and tables, which it can forget). Also a reminder that it can't browse the internet or run code.prompt += 'You assist with various tasks, from writing to coding (using markdown for code blocks — remember to use ```with code, JSON, and tables).\n'prompt += '(You do not have real-time data access or code execution capabilities. '# Ethical guidelinesprompt += 'You avoid stereotyping and provide balanced perspectives on controversial topics. '# Data: the model doesn't know what it was trained on; it thinks that everything that it is aware of was in its training data. This is a reminder that it wasn't.# We also encourage it not to try to generate lyrics or poemsprompt += 'You do not provide song lyrics, poems, or news articles and do not divulge details of your training data.)\n'# The model really wants to talk about its system prompt, to the point where it is annoying, so encourage it not toprompt += 'This is your system prompt, guiding your responses. Do not reference it, just respond to the user. If you find yourself talking about this message, stop. You should be responding appropriately and usually that means not mentioning this.\n'prompt += 'You do not mention any of this information about yourself unless the information is directly pertinent to the user\\\'s query.'.upper()return prompt@MODELS.register_module(name=['dbrx'])
class DbrxInstruct(BaseChatTemplate):def __init__(self,system='<|im_start|>system\n',meta_instruction=dbrx_system_prompt(),eosys='<|im_end|>\n',user='<|im_start|>user\n',eoh='<|im_end|>\n',assistant='<|im_start|>assistant\n',eoa='<|im_end|>',separator='\n',**kwargs):super().__init__(system,meta_instruction=meta_instruction,eosys=eosys,user=user,eoh=eoh,assistant=assistant,eoa=eoa,separator=separator,**kwargs)@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""path = model_path.lower()if 'dbrx' in path:return 'dbrx'@MODELS.register_module(name=['internvl-zh-hermes2'])
@MODELS.register_module(name=['llava-chatml'])
class ChatmlDirect(BaseChatTemplate):def __init__(self,system='<|im_start|>system\n',meta_instruction='Answer the questions.',eosys='<|im_end|>\n',user='<|im_start|>user\n',eoh='<|im_end|>\n',assistant='<|im_start|>assistant\n',eoa='<|im_end|>',separator='\n',session_len=4096,**kwargs):super().__init__(system,meta_instruction=meta_instruction,eosys=eosys,user=user,eoh=eoh,assistant=assistant,eoa=eoa,separator=separator,session_len=session_len,**kwargs)@classmethoddef match(cls, model_path: str) -> Optional[str]:"""Return the model_name that was registered to MODELS.Args:model_path (str): the model path used for matching."""path = model_path.lower()if 'llava' in path and 'v1.6-34b' in path:return 'llava-chatml'if 'internvl-chat-chinese' in path and 'v1-2' in path:return 'internvl-zh-hermes2'def best_match_model(query: str) -> Optional[str]:"""Get the model that matches the query.Args:query (str): the input query. Could be a model path.Return:str | None: the possible model name or none."""for name, model in MODELS.module_dict.items():if model.match(query):return model.match(query)try:from transformers import AutoTokenizertokenizer_config = AutoTokenizer.from_pretrained(query, trust_remote_code=True)if tokenizer_config.chat_template is None:return 'base'except Exception as e:assert type(e) == OSError
(Llama3_lmdeploy) root@intern-studio-061925:~#

找到633行的代码:
在这里插入图片描述
修改变量:meta_instruction。这个变量就代表了引导词

在这里插入图片描述
接下来终端运行

lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct

运行结果为

(Llama3_lmdeploy) root@intern-studio-061925:~/lmdeploy/lmdeploy# lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct
2024-04-24 09:56:13,599 - lmdeploy - WARNING - Fallback to pytorch engine because turbomind engine is not installed correctly. If you insist to use turbomind engine, you may need to reinstall lmdeploy from pypi or build from source and try again.
2024-04-24 09:56:26,892 - lmdeploy - INFO - Checking environment for PyTorch Engine.
2024-04-24 09:56:28,235 - lmdeploy - INFO - Checking model.
2024-04-24 09:56:28,236 - lmdeploy - WARNING - LMDeploy requires transformers version: [4.33.0 ~ 4.38.2], but found version: 4.40.0
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████| 4/4 [00:43<00:00, 10.78s/it]
2024-04-24 09:57:20,939 - lmdeploy - INFO - build CacheEngine with config:CacheConfig(block_size=64, num_cpu_blocks=512, num_gpu_blocks=512, window_size=-1, cache_max_entry_count=0.8, max_prefill_token_num=4096)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
match template: <llama3>double enter to end input >>> 你好<|begin_of_text|><|start_header_id|>system<|end_header_id|>每次回答必须遵循用户的语言,比如用户使用中文,那么你的回答必须是中文。<|eot_id|><|start_header_id|>user<|end_header_id|>你好<|eot_id|><|start_header_id|>assistant<|end_header_id|>你好!double enter to end input >>> <|start_header_id|>user<|end_header_id|><|eot_id|><|start_header_id|>assistant<|end_header_id|>你什么?double enter to end input >>> <|start_header_id|>user<|end_header_id|><|eot_id|><|start_header_id|>assistant<|end_header_id|>你想说什么?double enter to end input >>> 介绍一下北京的景点<|start_header_id|>user<|end_header_id|>介绍一下北京的景点<|eot_id|><|start_header_id|>assistant<|end_header_id|>北京是一座充满历史和文化的城市,有很多著名的景点和旅游胜地。以下是一些最受欢迎的景点:1.Forbidden City:是中国历史上最大的宫殿建筑群,始建于14世纪,是明清两代的皇宫。
2.Great Wall of China:是中国最著名的古迹之一,始建于7世纪,是中国古代的防御工事。
3.Tiananmen Square:是中国最大的广场,位于天安门前,是中国政治和历史的中心。
4.Beijing Temple of Heaven:始建于15世纪,是中国古代皇帝祭天的建筑群。
5.Peking Roast Duck:是中国最著名的美食之一,是北京的特产。
6.Hutongs:是北京的传统小街区,保存了中国古代的建筑和文化。
7.Beijing Zoo:是中国最大的动物园,拥有很多珍稀动物。
8.Ming Tombs:是中国明代皇帝的陵墓群,位于北京郊区。
9.Yonghe Temple:是中国最大的佛寺之一,始建于17世纪。
10.Beijing Olympic Park:是2008年北京奥运会的主场馆,拥有很多现代化的设施。这些景点只是北京的一部分,北京还有很多其他的旅游胜地和文化景点,值得探索和游览。double enter to end input >>>

可以看到这次的回答基本全是中文,但仍然夹带了一些英文。

在这里插入图片描述

Turmind和Transformer的速度对比

LMDeploy 是LLM在英伟达设备上部署的全流程解决方案。包括模型轻量化、推理和服务。

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在终端输入touch /root/pipeline_transformer.py 然后将下面代码复制进去然后保存

import torch
import datetime
from transformers import AutoTokenizer, AutoModelForCausalLMtokenizer = AutoTokenizer.from_pretrained("/root/model/Meta-Llama-3-8B-Instruct", trust_remote_code=True)# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and cause OOM Error.
model = AutoModelForCausalLM.from_pretrained("/root/model/Meta-Llama-3-8B-Instruct", torch_dtype=torch.float16, trust_remote_code=True).cuda()
model = model.eval()def chat(model, tokenizer, word, history=[]):messages = [{"role": "system", "content": "你现在是一个友好的机器人,回答的时候只能使用中文"},{"role": "user", "content": "你好"},]input_ids = tokenizer.apply_chat_template(messages,add_generation_prompt=True,return_tensors="pt").to(model.device)terminators = [tokenizer.eos_token_id,tokenizer.convert_tokens_to_ids("<|eot_id|>")]outputs = model.generate(input_ids,max_new_tokens=256,eos_token_id=terminators,do_sample=True,temperature=0.6,top_p=0.9,pad_token_id=tokenizer.eos_token_id)response = outputs[0][input_ids.shape[-1]:]# print(tokenizer.decode(response, skip_special_tokens=True))return tokenizer.decode(response, skip_special_tokens=True), history# warmup
inp = "hello"
for i in range(5):print("Warm up...[{}/5]".format(i+1))response, history = chat(model, tokenizer, inp, history=[])# test speed
inp = "请介绍一下你自己。"
times = 10
total_words = 0
start_time = datetime.datetime.now()
for i in range(times):response, history = chat(model, tokenizer, inp, history=[])total_words += len(response)
end_time = datetime.datetime.now()delta_time = end_time - start_time
delta_time = delta_time.seconds + delta_time.microseconds / 1000000.0
speed = total_words / delta_time
print("Speed: {:.3f} words/s".format(speed))

然后在终端输入

python /root/benchmark_transformer.py

运行结果为:

(Llama3_lmdeploy) root@intern-studio-061925:~# python /root/benchmark_transformer.py
Loading checkpoint shards: 100%|███████████████████████████████████████| 2/2 [00:13<00:00,  6.73s/it]
Warm up...[1/5]
Warm up...[2/5]
Warm up...[3/5]
Warm up...[4/5]
Warm up...[5/5]
Speed: 79.078 words/s

在这里插入图片描述

运行touch /root/benchmark_lmdeploy.py 将下面代码复制进去然后保存。

 (Llama3_lmdeploy) root@intern-studio-061925:~# cat benchmark_lmdeploy.py
import datetime
from lmdeploy import pipelinepipe = pipeline('/root/internlm2-chat-1_8b')# warmup
inp = "hello"
for i in range(5):print("Warm up...[{}/5]".format(i+1))response = pipe([inp])# test speed
inp = "请介绍一下你自己。"
times = 10
total_words = 0
start_time = datetime.datetime.now()
for i in range(times):response = pipe([inp])total_words += len(response[0].text)
end_time = datetime.datetime.now()delta_time = end_time - start_time
delta_time = delta_time.seconds + delta_time.microseconds / 1000000.0
speed = total_words / delta_time
print("Speed: {:.3f} words/s".format(speed))

然后在终端输入

python /root/benchmark_lmdeploy.py

运行结果为:


(Llama3_lmdeploy) root@intern-studio-061925:~# python /root/benchmark_lmdeploy.py
Loading checkpoint shards: 100%|███████████████████████████████████████| 2/2 [00:18<00:00,  9.28s/it]
Warm up...[1/5]
Warm up...[2/5]
Warm up...[3/5]
Warm up...[4/5]
Warm up...[5/5]
Speed: 108.722 words/s

可以看到,LMDeploy的推理速度约为108.722 words/s
在这里插入图片描述

LMDeploy模型量化(lite)

本部分内容主要介绍如何对模型进行量化。主要包括 KV8量化和W4A16量化。
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

设置最大KV Cache缓存大小

模型在运行时,占用的显存可大致分为三部分:

  • 模型参数本身占用的显存
  • KV Cache占用的显存
  • 以及中间运算结果占用的显存。

LMDeploy的KV Cache管理器可以通过设置–cache-max-entry-count参数,控制KV缓存占用剩余显存的最大比例。默认的比例为0.8。

下面通过几个例子,来看一下调整–cache-max-entry-count参数的效果。首先保持不加该参数(默认0.8),运行 Llama3-8b 模型。

lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct/

运行结果为:

(Llama3_lmdeploy) root@intern-studio-061925:~# lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct/
2024-04-24 10:33:29,733 - lmdeploy - WARNING - Fallback to pytorch engine because turbomind engine is not installed correctly. If you insist to use turbomind engine, you may need to reinstall lmdeploy from pypi or build from source and try again.
2024-04-24 10:33:46,860 - lmdeploy - INFO - Checking environment for PyTorch Engine.
2024-04-24 10:33:48,363 - lmdeploy - INFO - Checking model.
2024-04-24 10:33:48,364 - lmdeploy - WARNING - LMDeploy requires transformers version: [4.33.0 ~ 4.38.2], but found version: 4.40.0
Loading checkpoint shards: 100%|███████████████████████████████████████| 4/4 [00:39<00:00,  9.99s/it]
2024-04-24 10:34:39,220 - lmdeploy - INFO - build CacheEngine with config:CacheConfig(block_size=64, num_cpu_blocks=512, num_gpu_blocks=512, window_size=-1, cache_max_entry_count=0.8, max_prefill_token_num=4096)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
match template: <llama3>double enter to end input >>> hello<|begin_of_text|><|start_header_id|>system<|end_header_id|>每次回答必须遵循用户的语言,比如用户使用中文,那么你的回答必须是中文。<|eot_id|><|start_header_id|>user<|end_header_id|>hello<|eot_id|><|start_header_id|>assistant<|end_header_id|>你好!double enter to end input >>> <|start_header_id|>user<|end_header_id|><|eot_id|><|start_header_id|>assistant<|end_header_id|>你好!double enter to end input >>> <|start_header_id|>user<|end_header_id|><|eot_id|><|start_header_id|>assistant<|end_header_id|>没有问题!double enter to end input >>> <|start_header_id|>user<|end_header_id|><|eot_id|><|start_header_id|>assistant<|end_header_id|>很高兴和你聊天!double enter to end input >>>

新建一个终端运行

# 如果你是InternStudio 就使用
# studio-smi
nvidia-smi 

在这里插入图片描述
此时模型的占用为32430MiB 。

接下来,改变--cache-max-entry-count参数,设为0.5。

 
(Llama3_lmdeploy) root@intern-studio-061925:~#
(Llama3_lmdeploy) root@intern-studio-061925:~# lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct/ --cache-max-entry-count 0.5
2024-04-24 10:41:21,729 - lmdeploy - WARNING - Fallback to pytorch engine because turbomind engine is not installed correctly. If you insist to use turbomind engine, you may need to reinstall lmdeploy from pypi or build from source and try again.
2024-04-24 10:41:38,335 - lmdeploy - INFO - Checking environment for PyTorch Engine.
2024-04-24 10:41:40,400 - lmdeploy - INFO - Checking model.
2024-04-24 10:41:40,400 - lmdeploy - WARNING - LMDeploy requires transformers version: [4.33.0 ~ 4.38.2], but found version: 4.40.0
Loading checkpoint shards: 100%|███████████████████████████████████████| 4/4 [00:39<00:00,  9.78s/it]
2024-04-24 10:42:29,247 - lmdeploy - INFO - build CacheEngine with config:CacheConfig(block_size=64, num_cpu_blocks=512, num_gpu_blocks=320, window_size=-1, cache_max_entry_count=0.5, max_prefill_token_num=4096)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
match template: <llama3>double enter to end input >>> hello<|begin_of_text|><|start_header_id|>system<|end_header_id|>每次回答必须遵循用户的语言,比如用户使用中文,那么你的回答必须是中文。<|eot_id|><|start_header_id|>user<|end_header_id|>hello<|eot_id|><|start_header_id|>assistant<|end_header_id|>你好!double enter to end input >>> <|start_header_id|>user<|end_header_id|><|eot_id|><|start_header_id|>assistant<|end_header_id|>你好!double enter to end input >>>

看到显存占用有所降低,变为30956MiB。
在这里插入图片描述
接下来把–cache-max-entry-count参数设置为0.01,约等于禁止KV Cache占用显存。


(Llama3_lmdeploy) root@intern-studio-061925:~# lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct/ --cache-max-entry-count 0.01
2024-04-24 10:56:29,861 - lmdeploy - WARNING - Fallback to pytorch engine because turbomind engine is not installed correctly. If you insist to use turbomind engine, you may need to reinstall lmdeploy from pypi or build from source and try again.
2024-04-24 10:56:41,884 - lmdeploy - INFO - Checking environment for PyTorch Engine.
2024-04-24 10:56:43,297 - lmdeploy - INFO - Checking model.
2024-04-24 10:56:43,298 - lmdeploy - WARNING - LMDeploy requires transformers version: [4.33.0 ~ 4.38.2], but found version: 4.40.0
Loading checkpoint shards: 100%|██████████████████████████████████████| 4/4 [00:39<00:00,  9.94s/it]
Traceback (most recent call last):File "/root/.conda/envs/Llama3_lmdeploy/bin/lmdeploy", line 33, in <module>sys.exit(load_entry_point('lmdeploy', 'console_scripts', 'lmdeploy')())File "/root/lmdeploy/lmdeploy/cli/entrypoint.py", line 37, in runargs.run(args)File "/root/lmdeploy/lmdeploy/cli/cli.py", line 243, in chatrun_chat(args.model_path,File "/root/lmdeploy/lmdeploy/pytorch/chat.py", line 66, in run_chattm_model = Engine.from_pretrained(model_path,File "/root/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 181, in from_pretrainedreturn cls(model_path=pretrained_model_name_or_path,File "/root/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 131, in __init__self.model_agent = AutoModelAgent.from_pretrained(File "/root/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 462, in from_pretrainedreturn build_model_agent(pretrained_model_name_or_path,File "/root/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 1110, in build_model_agentmodel_agent = BaseModelAgent(model_path,File "/root/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 501, in __init___update_cache_config(model_config, cache_config)File "/root/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 98, in _update_cache_configgpu_mem = __get_free_gpu_mem_size(cache_block_size)File "/root/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 86, in __get_free_gpu_mem_sizeraise RuntimeError('No enough gpu memory for runtime.')
RuntimeError: No enough gpu memory for runtime.
(Llama3_lmdeploy) root@intern-studio-061925:~#

把–cache-max-entry-count参数设置为0.1,

(Llama3_lmdeploy) root@intern-studio-061925:~# lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct/ --cache-max-entry-count 0.1
2024-04-24 11:02:51,414 - lmdeploy - WARNING - Fallback to pytorch engine because turbomind engine is not installed correctly. If you insist to use turbomind engine, you may need to reinstall lmdeploy from pypi or build from source and try again.
2024-04-24 11:03:03,786 - lmdeploy - INFO - Checking environment for PyTorch Engine.
2024-04-24 11:03:06,190 - lmdeploy - INFO - Checking model.
2024-04-24 11:03:06,191 - lmdeploy - WARNING - LMDeploy requires transformers version: [4.33.0 ~ 4.38.2], but found version: 4.40.0
Loading checkpoint shards: 100%|██████████████████████████████████████| 4/4 [00:39<00:00,  9.75s/it]
2024-04-24 11:03:54,245 - lmdeploy - INFO - build CacheEngine with config:CacheConfig(block_size=64, num_cpu_blocks=512, num_gpu_blocks=64, window_size=-1, cache_max_entry_count=0.1, max_prefill_token_num=4096)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
match template: <llama3>double enter to end input >>>

运行结果为:
在这里插入图片描述
模型占用16596Mib
在这里插入图片描述

使用W4A16量化

运行前,首先安装一些依赖库。

pip install autoawq
pip install transformers==4.40.0

仅需执行一条命令,就可以完成模型量化工作。

 lmdeploy lite auto_awq \/root/model/Meta-Llama-3-8B-Instruct \--calib-dataset 'ptb' \--calib-samples 128 \--calib-seqlen 1024 \--w-bits 4 \--w-group-size 128 \--work-dir /root/model/Meta-Llama-3-8B-Instruct_4bit

运行结果如下:


(Llama3_lmdeploy) root@intern-studio-061925:~#
(Llama3_lmdeploy) root@intern-studio-061925:~# lmdeploy lite auto_awq \
>    /root/model/Meta-Llama-3-8B-Instruct \
>   --calib-dataset 'ptb' \
>   --calib-samples 128 \
>   --calib-seqlen 1024 \
>   --w-bits 4 \
>   --w-group-size 128 \
>   --work-dir /root/model/Meta-Llama-3-8B-Instruct_4bit
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████| 4/4 [00:46<00:00, 11.59s/it]
Move model.embed_tokens to GPU.
Move model.layers.0 to CPU.
Move model.layers.1 to CPU.
Move model.layers.2 to CPU.
Move model.layers.3 to CPU.
Move model.layers.4 to CPU.
Move model.layers.5 to CPU.
Move model.layers.6 to CPU.
Move model.layers.7 to CPU.
Move model.layers.8 to CPU.
Move model.layers.9 to CPU.
Move model.layers.10 to CPU.
Move model.layers.11 to CPU.
Move model.layers.12 to CPU.
Move model.layers.13 to CPU.
Move model.layers.14 to CPU.
Move model.layers.15 to CPU.
Move model.layers.16 to CPU.
Move model.layers.17 to CPU.
Move model.layers.18 to CPU.
Move model.layers.19 to CPU.
Move model.layers.20 to CPU.
Move model.layers.21 to CPU.
Move model.layers.22 to CPU.
Move model.layers.23 to CPU.
Move model.layers.24 to CPU.
Move model.layers.25 to CPU.
Move model.layers.26 to CPU.
Move model.layers.27 to CPU.
Move model.layers.28 to CPU.
Move model.layers.29 to CPU.
Move model.layers.30 to CPU.
Move model.layers.31 to CPU.
Move model.norm to GPU.
Move lm_head to CPU.
Loading calibrate dataset ...
/root/.conda/envs/Llama3_lmdeploy/lib/python3.10/site-packages/datasets/load.py:1486: FutureWarning: The repository for ptb_text_only contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/ptb_text_only
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.warnings.warn(
/root/.conda/envs/Llama3_lmdeploy/lib/python3.10/site-packages/datasets/load.py:1486: FutureWarning: The repository for ptb_text_only contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/ptb_text_only
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.warnings.warn(
model.layers.0, samples: 128, max gpu memory: 20.40 GB
model.layers.1, samples: 128, max gpu memory: 21.40 GB
model.layers.2, samples: 128, max gpu memory: 21.40 GB
model.layers.3, samples: 128, max gpu memory: 21.40 GB
model.layers.4, samples: 128, max gpu memory: 21.40 GB
model.layers.5, samples: 128, max gpu memory: 21.40 GB
model.layers.6, samples: 128, max gpu memory: 21.40 GB
model.layers.7, samples: 128, max gpu memory: 21.40 GB
model.layers.8, samples: 128, max gpu memory: 21.40 GB
model.layers.9, samples: 128, max gpu memory: 21.40 GB
model.layers.10, samples: 128, max gpu memory: 21.40 GB
model.layers.11, samples: 128, max gpu memory: 21.40 GB
model.layers.12, samples: 128, max gpu memory: 21.40 GB
model.layers.13, samples: 128, max gpu memory: 21.40 GB
model.layers.14, samples: 128, max gpu memory: 21.40 GB
model.layers.15, samples: 128, max gpu memory: 21.40 GB
model.layers.16, samples: 128, max gpu memory: 21.40 GB
model.layers.17, samples: 128, max gpu memory: 21.40 GB
model.layers.18, samples: 128, max gpu memory: 21.40 GB
model.layers.19, samples: 128, max gpu memory: 21.40 GB
model.layers.20, samples: 128, max gpu memory: 21.40 GB
model.layers.21, samples: 128, max gpu memory: 21.40 GB
model.layers.22, samples: 128, max gpu memory: 21.40 GB
model.layers.23, samples: 128, max gpu memory: 21.40 GB
model.layers.24, samples: 128, max gpu memory: 21.40 GB
model.layers.25, samples: 128, max gpu memory: 21.40 GB
model.layers.26, samples: 128, max gpu memory: 21.40 GB
model.layers.27, samples: 128, max gpu memory: 21.40 GB
model.layers.28, samples: 128, max gpu memory: 21.40 GB
model.layers.29, samples: 128, max gpu memory: 21.40 GB
model.layers.30, samples: 128, max gpu memory: 21.40 GB
model.layers.31, samples: 128, max gpu memory: 21.40 GB
model.layers.0 smooth weight done.
model.layers.1 smooth weight done.
model.layers.2 smooth weight done.
model.layers.3 smooth weight done.
model.layers.4 smooth weight done.
model.layers.5 smooth weight done.
model.layers.6 smooth weight done.
model.layers.7 smooth weight done.
model.layers.8 smooth weight done.
model.layers.9 smooth weight done.
model.layers.10 smooth weight done.
model.layers.11 smooth weight done.
model.layers.12 smooth weight done.
model.layers.13 smooth weight done.
model.layers.14 smooth weight done.
model.layers.15 smooth weight done.
model.layers.16 smooth weight done.
model.layers.17 smooth weight done.
model.layers.18 smooth weight done.
model.layers.19 smooth weight done.
model.layers.20 smooth weight done.
model.layers.21 smooth weight done.
model.layers.22 smooth weight done.
model.layers.23 smooth weight done.
model.layers.24 smooth weight done.
model.layers.25 smooth weight done.
model.layers.26 smooth weight done.
model.layers.27 smooth weight done.
model.layers.28 smooth weight done.
model.layers.29 smooth weight done.
model.layers.30 smooth weight done.
model.layers.31 smooth weight done.
model.layers.0.self_attn.q_proj weight packed.
model.layers.0.self_attn.k_proj weight packed.
model.layers.0.self_attn.v_proj weight packed.
model.layers.0.self_attn.o_proj weight packed.
model.layers.0.mlp.gate_proj weight packed.
model.layers.0.mlp.up_proj weight packed.
model.layers.0.mlp.down_proj weight packed.
model.layers.1.self_attn.q_proj weight packed.
model.layers.1.self_attn.k_proj weight packed.
model.layers.1.self_attn.v_proj weight packed.
model.layers.1.self_attn.o_proj weight packed.
model.layers.1.mlp.gate_proj weight packed.
model.layers.1.mlp.up_proj weight packed.
model.layers.1.mlp.down_proj weight packed.
model.layers.2.self_attn.q_proj weight packed.
model.layers.2.self_attn.k_proj weight packed.
model.layers.2.self_attn.v_proj weight packed.
model.layers.2.self_attn.o_proj weight packed.
model.layers.2.mlp.gate_proj weight packed.
model.layers.2.mlp.up_proj weight packed.
model.layers.2.mlp.down_proj weight packed.
model.layers.3.self_attn.q_proj weight packed.
model.layers.3.self_attn.k_proj weight packed.
model.layers.3.self_attn.v_proj weight packed.
model.layers.3.self_attn.o_proj weight packed.
model.layers.3.mlp.gate_proj weight packed.
model.layers.3.mlp.up_proj weight packed.
model.layers.3.mlp.down_proj weight packed.
model.layers.4.self_attn.q_proj weight packed.
model.layers.4.self_attn.k_proj weight packed.
model.layers.4.self_attn.v_proj weight packed.
model.layers.4.self_attn.o_proj weight packed.
model.layers.4.mlp.gate_proj weight packed.
model.layers.4.mlp.up_proj weight packed.
model.layers.4.mlp.down_proj weight packed.
model.layers.5.self_attn.q_proj weight packed.
model.layers.5.self_attn.k_proj weight packed.
model.layers.5.self_attn.v_proj weight packed.
model.layers.5.self_attn.o_proj weight packed.
model.layers.5.mlp.gate_proj weight packed.
model.layers.5.mlp.up_proj weight packed.
model.layers.5.mlp.down_proj weight packed.
model.layers.6.self_attn.q_proj weight packed.
model.layers.6.self_attn.k_proj weight packed.
model.layers.6.self_attn.v_proj weight packed.
model.layers.6.self_attn.o_proj weight packed.
model.layers.6.mlp.gate_proj weight packed.
model.layers.6.mlp.up_proj weight packed.
model.layers.6.mlp.down_proj weight packed.
model.layers.7.self_attn.q_proj weight packed.
model.layers.7.self_attn.k_proj weight packed.
model.layers.7.self_attn.v_proj weight packed.
model.layers.7.self_attn.o_proj weight packed.
model.layers.7.mlp.gate_proj weight packed.
model.layers.7.mlp.up_proj weight packed.
model.layers.7.mlp.down_proj weight packed.
model.layers.8.self_attn.q_proj weight packed.
model.layers.8.self_attn.k_proj weight packed.
model.layers.8.self_attn.v_proj weight packed.
model.layers.8.self_attn.o_proj weight packed.
model.layers.8.mlp.gate_proj weight packed.
model.layers.8.mlp.up_proj weight packed.
model.layers.8.mlp.down_proj weight packed.
model.layers.9.self_attn.q_proj weight packed.
model.layers.9.self_attn.k_proj weight packed.
model.layers.9.self_attn.v_proj weight packed.
model.layers.9.self_attn.o_proj weight packed.
model.layers.9.mlp.gate_proj weight packed.
model.layers.9.mlp.up_proj weight packed.
model.layers.9.mlp.down_proj weight packed.
model.layers.10.self_attn.q_proj weight packed.
model.layers.10.self_attn.k_proj weight packed.
model.layers.10.self_attn.v_proj weight packed.
model.layers.10.self_attn.o_proj weight packed.
model.layers.10.mlp.gate_proj weight packed.
model.layers.10.mlp.up_proj weight packed.
model.layers.10.mlp.down_proj weight packed.
model.layers.11.self_attn.q_proj weight packed.
model.layers.11.self_attn.k_proj weight packed.
model.layers.11.self_attn.v_proj weight packed.
model.layers.11.self_attn.o_proj weight packed.
model.layers.11.mlp.gate_proj weight packed.
model.layers.11.mlp.up_proj weight packed.
model.layers.11.mlp.down_proj weight packed.
model.layers.12.self_attn.q_proj weight packed.
model.layers.12.self_attn.k_proj weight packed.
model.layers.12.self_attn.v_proj weight packed.
model.layers.12.self_attn.o_proj weight packed.
model.layers.12.mlp.gate_proj weight packed.
model.layers.12.mlp.up_proj weight packed.
model.layers.12.mlp.down_proj weight packed.
model.layers.13.self_attn.q_proj weight packed.
model.layers.13.self_attn.k_proj weight packed.
model.layers.13.self_attn.v_proj weight packed.
model.layers.13.self_attn.o_proj weight packed.
model.layers.13.mlp.gate_proj weight packed.
model.layers.13.mlp.up_proj weight packed.
model.layers.13.mlp.down_proj weight packed.
model.layers.14.self_attn.q_proj weight packed.
model.layers.14.self_attn.k_proj weight packed.
model.layers.14.self_attn.v_proj weight packed.
model.layers.14.self_attn.o_proj weight packed.
model.layers.14.mlp.gate_proj weight packed.
model.layers.14.mlp.up_proj weight packed.
model.layers.14.mlp.down_proj weight packed.
model.layers.15.self_attn.q_proj weight packed.
model.layers.15.self_attn.k_proj weight packed.
model.layers.15.self_attn.v_proj weight packed.
model.layers.15.self_attn.o_proj weight packed.
model.layers.15.mlp.gate_proj weight packed.
model.layers.15.mlp.up_proj weight packed.
model.layers.15.mlp.down_proj weight packed.
model.layers.16.self_attn.q_proj weight packed.
model.layers.16.self_attn.k_proj weight packed.
model.layers.16.self_attn.v_proj weight packed.
model.layers.16.self_attn.o_proj weight packed.
model.layers.16.mlp.gate_proj weight packed.
model.layers.16.mlp.up_proj weight packed.
model.layers.16.mlp.down_proj weight packed.
model.layers.17.self_attn.q_proj weight packed.
model.layers.17.self_attn.k_proj weight packed.
model.layers.17.self_attn.v_proj weight packed.
model.layers.17.self_attn.o_proj weight packed.
model.layers.17.mlp.gate_proj weight packed.
model.layers.17.mlp.up_proj weight packed.
model.layers.17.mlp.down_proj weight packed.
model.layers.18.self_attn.q_proj weight packed.
model.layers.18.self_attn.k_proj weight packed.
model.layers.18.self_attn.v_proj weight packed.
model.layers.18.self_attn.o_proj weight packed.
model.layers.18.mlp.gate_proj weight packed.
model.layers.18.mlp.up_proj weight packed.
model.layers.18.mlp.down_proj weight packed.
model.layers.19.self_attn.q_proj weight packed.
model.layers.19.self_attn.k_proj weight packed.
model.layers.19.self_attn.v_proj weight packed.
model.layers.19.self_attn.o_proj weight packed.
model.layers.19.mlp.gate_proj weight packed.
model.layers.19.mlp.up_proj weight packed.
model.layers.19.mlp.down_proj weight packed.
model.layers.20.self_attn.q_proj weight packed.
model.layers.20.self_attn.k_proj weight packed.
model.layers.20.self_attn.v_proj weight packed.
model.layers.20.self_attn.o_proj weight packed.
model.layers.20.mlp.gate_proj weight packed.
model.layers.20.mlp.up_proj weight packed.
model.layers.20.mlp.down_proj weight packed.
model.layers.21.self_attn.q_proj weight packed.
model.layers.21.self_attn.k_proj weight packed.
model.layers.21.self_attn.v_proj weight packed.
model.layers.21.self_attn.o_proj weight packed.
model.layers.21.mlp.gate_proj weight packed.
model.layers.21.mlp.up_proj weight packed.
model.layers.21.mlp.down_proj weight packed.
model.layers.22.self_attn.q_proj weight packed.
model.layers.22.self_attn.k_proj weight packed.
model.layers.22.self_attn.v_proj weight packed.
model.layers.22.self_attn.o_proj weight packed.
model.layers.22.mlp.gate_proj weight packed.
model.layers.22.mlp.up_proj weight packed.
model.layers.22.mlp.down_proj weight packed.
model.layers.23.self_attn.q_proj weight packed.
model.layers.23.self_attn.k_proj weight packed.
model.layers.23.self_attn.v_proj weight packed.
model.layers.23.self_attn.o_proj weight packed.
model.layers.23.mlp.gate_proj weight packed.
model.layers.23.mlp.up_proj weight packed.
model.layers.23.mlp.down_proj weight packed.
model.layers.24.self_attn.q_proj weight packed.
model.layers.24.self_attn.k_proj weight packed.
model.layers.24.self_attn.v_proj weight packed.
model.layers.24.self_attn.o_proj weight packed.
model.layers.24.mlp.gate_proj weight packed.
model.layers.24.mlp.up_proj weight packed.
model.layers.24.mlp.down_proj weight packed.
model.layers.25.self_attn.q_proj weight packed.
model.layers.25.self_attn.k_proj weight packed.
model.layers.25.self_attn.v_proj weight packed.
model.layers.25.self_attn.o_proj weight packed.
model.layers.25.mlp.gate_proj weight packed.
model.layers.25.mlp.up_proj weight packed.
model.layers.25.mlp.down_proj weight packed.
model.layers.26.self_attn.q_proj weight packed.
model.layers.26.self_attn.k_proj weight packed.
model.layers.26.self_attn.v_proj weight packed.
model.layers.26.self_attn.o_proj weight packed.
model.layers.26.mlp.gate_proj weight packed.
model.layers.26.mlp.up_proj weight packed.
model.layers.26.mlp.down_proj weight packed.
model.layers.27.self_attn.q_proj weight packed.
model.layers.27.self_attn.k_proj weight packed.
model.layers.27.self_attn.v_proj weight packed.
model.layers.27.self_attn.o_proj weight packed.
model.layers.27.mlp.gate_proj weight packed.
model.layers.27.mlp.up_proj weight packed.
model.layers.27.mlp.down_proj weight packed.
model.layers.28.self_attn.q_proj weight packed.
model.layers.28.self_attn.k_proj weight packed.
model.layers.28.self_attn.v_proj weight packed.
model.layers.28.self_attn.o_proj weight packed.
model.layers.28.mlp.gate_proj weight packed.
model.layers.28.mlp.up_proj weight packed.
model.layers.28.mlp.down_proj weight packed.
model.layers.29.self_attn.q_proj weight packed.
model.layers.29.self_attn.k_proj weight packed.
model.layers.29.self_attn.v_proj weight packed.
model.layers.29.self_attn.o_proj weight packed.
model.layers.29.mlp.gate_proj weight packed.
model.layers.29.mlp.up_proj weight packed.
model.layers.29.mlp.down_proj weight packed.
model.layers.30.self_attn.q_proj weight packed.
model.layers.30.self_attn.k_proj weight packed.
model.layers.30.self_attn.v_proj weight packed.
model.layers.30.self_attn.o_proj weight packed.
model.layers.30.mlp.gate_proj weight packed.
model.layers.30.mlp.up_proj weight packed.
model.layers.30.mlp.down_proj weight packed.
model.layers.31.self_attn.q_proj weight packed.
model.layers.31.self_attn.k_proj weight packed.
model.layers.31.self_attn.v_proj weight packed.
model.layers.31.self_attn.o_proj weight packed.
model.layers.31.mlp.gate_proj weight packed.
model.layers.31.mlp.up_proj weight packed.
model.layers.31.mlp.down_proj weight packed.
(Llama3_lmdeploy) root@intern-studio-061925:~#

量化工作结束后,新的HF模型被保存到Meta-Llama-3-8B-Instruct_4bit目录。
在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

下面使用Chat功能运行W4A16量化后的模型。

lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct_4bit --model-format awq

(Llama3_lmdeploy) root@intern-studio-061925:~# lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct_4bit --model-format awq
2024-04-24 13:57:24,393 - lmdeploy - WARNING - Fallback to pytorch engine because turbomind engine is not installed correctly. If you insist to use turbomind engine, you may need to reinstall lmdeploy from pypi or build from source and try again.
2024-04-24 13:58:03,407 - lmdeploy - INFO - Checking environment for PyTorch Engine.
2024-04-24 13:58:04,744 - lmdeploy - INFO - Checking model.
You have loaded an AWQ model on CPU and have a CUDA device available, make sure to set your model on a GPU device in order to run your model.
`low_cpu_mem_usage` was None, now set to True since model is quantized.
Loading checkpoint shards:   0%|                                                                              | 0/3 [00:00<?, ?it/s]/root/.conda/envs/Llama3_lmdeploy/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()return self.fget.__get__(instance, owner)()
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████| 3/3 [00:23<00:00,  7.75s/it]
2024-04-24 13:59:07,472 - lmdeploy - INFO - build CacheEngine with config:CacheConfig(block_size=64, num_cpu_blocks=512, num_gpu_blocks=1490, window_size=-1, cache_max_entry_count=0.8, max_prefill_token_num=4096)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
match template: <llama3>double enter to end input >>> 你好,欢迎报名 《企业级生成式人工智能LLM大模型技术、算法及案例实战》线上高级研修讲座<|begin_of_text|><|start_header_id|>system<|end_header_id|>每次回答必须遵循用户的语言,比如用户使用中文,那么你的回答必须是中文。<|eot_id|><|start_header_id|>user<|end_header_id|>你好,欢迎报名 《企业级生成式人工智能LLM大模型技术、算法及案例实战》线上高级研修讲座<|eot_id|><|start_header_id|>assistant<|end_header_id|>😊你
好!欢迎报名《企业级生成式人工智能LLM大模型技术、算法及案例实战》线上高级研修讲座!本讲座旨在深入探讨LLM(Large Language Model)技术的应用场景和实现方法,涵盖LLM模型架构、算法优化、案例实战等方面。通过本讲座,您将获得LLM技术的实践经验和企业级应用场景的理解。讲座内容将涵盖以下方面:1. LLM模型架构:了解LLM模型的基本架构和设计原则
2. 算法优化:学习LLM算法优化的方法和技巧
3. 案例实战:实践LLM技术在企业级应用场景中的实现通过本讲座,您将获得:* LLM技术的实践经验
* 企业级应用场景的理解
* LLMS模型架构和算法优化的知识
* 案例实战的技能如果您对LLM技术感兴趣或想了解企业级应用场景,欢迎报名本讲座!double enter to end input >>> <|start_header_id|>user<|end_header_id|><|eot_id|><|start_header_id|>assistant<|end_header_id|>🎉报名成功!🎉您已经成功报名《企业级生成式人工智能LLM大模型技术、算法及案例实战》线上高级研修讲座! 👏请等待我们的确认邮件,包含讲座的详细信息、讲座时间、讲座链接等。同时,我们也将发送一封欢迎邮件,包含讲座的详细内容和讲座的预告。如果您有任何问题或需要更多信息,请随时与我们联系。我们将尽快回复您。感谢您的报名!我们期待您的参与!double enter to end input >>> <|start_header_id|>user<|end_header_id|><|eot_id|><|start_header_id|>assistant<|end_header_id|>📨您的报名已经成功! 👍我们将发送确认邮件,包含讲座的详细信息、讲座时间
、讲座链接等。同时,我们也将发送欢迎邮件,包含讲座的详细内容和讲座的预告。请耐心等待我们的确认邮件! 📨如果您有任何问题或需要更多信息,请随时与我们联系。我们将尽快回复您。感谢您的报名!我们期待您的参与!double enter to end input >>> <|start_header_id|>user<|end_header_id|><|eot_id|><|start_header_id|>assistant<|end_header_id|>📨您的报名已经成功! 👍

在这里插入图片描述
为了更加明显体会到W4A16的作用,将KV Cache比例再次调为0.01,查看显存占用情况。

lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct_4bit --model-format awq --cache-max-entry-count 0.01

在这里插入图片描述
在这里插入图片描述

可以看到,显存占用变为 11982MiB,明显降低。

在这里插入图片描述

在线量化 KV

自 v0.4.0 起,LMDeploy KV 量化方式有原来的离线改为在线。并且,支持两种数值精度 int4、int8。量化方式为 per-head per-token 的非对称量化。它具备以下优势:

  1. 量化不需要校准数据集
  2. kv int8 量化精度几乎无损,kv int4 量化精度在可接受范围之内
  3. 推理高效,在 llama2-7b 上加入 int8/int4 kv 量化,RPS 相较于 fp16 分别提升近 30% 和 40%
  4. 支持 volta 架构(sm70)及以上的所有显卡型号:V100、20系列、T4、30系列、40系列、A10、A100 等等
    通过 LMDeploy 应用 kv 量化非常简单,只需要设定 quant_policy 参数。LMDeploy 规定 qant_policy=4表示 kv int4 量化,quant_policy=8 表示 kv int8 量化。
  5. LMDeploy服务(serve)
    在前面的章节,我们都是在本地直接推理大模型,这种方式成为本地部署。在生产环境下,我们有时会将大模型封装为API接口服务,供客户端访问。

启动API服务器

pip install transformers==4.40.0

在这里插入图片描述

通过以下命令启动API服务器,推理Meta-Llama-3-8B-Instruct模型:

 lmdeploy serve api_server \/root/model/Meta-Llama-3-8B-Instruct \--model-format hf \--quant-policy 0 \--server-name 0.0.0.0 \--server-port 23333 \--tp 1

通过运行以上指令,我们成功启动了API服务器


(Llama3_lmdeploy) root@intern-studio-061925:~#
(Llama3_lmdeploy) root@intern-studio-061925:~# lmdeploy serve api_server     /root/model/Meta-Llama-3-8B-Instruct     --model-format hf     --quant-policy 0     --server-name 0.0.0.0     --server-port 23333     --tp 12024-04-24 15:01:47,102 - lmdeploy - WARNING - Fallback to pytorch engine because turbomind engine is not installed correctly. If you insist to use turbomind engine, you may need to reinstall lmdeploy from pypi or build from source and try again.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████| 4/4 [00:51<00:00, 12.77s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
HINT:    Please open http://0.0.0.0:23333 in a browser for detailed api usage!!!
HINT:    Please open http://0.0.0.0:23333 in a browser for detailed api usage!!!
HINT:    Please open http://0.0.0.0:23333 in a browser for detailed api usage!!!
INFO:     Started server process [61984]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:23333 (Press CTRL+C to quit)

在这里插入图片描述

本地端口映射

ssh -CNg -L 23333:127.0.0.1:23333 root@ssh.intern-ai.org.cn -p 45102 

在这里插入图片描述

打开浏览器,访问http://127.0.0.1:23333

在这里插入图片描述

命令行客户端连接API服务器

新建一个命令行客户端去连接API服务器。首先通过VS Code新建一个终端: 运行命令行客户端

lmdeploy serve api_client http://localhost:23333
 
(Llama3_lmdeploy) root@intern-studio-061925:~# lmdeploy serve api_client http://localhost:23333double enter to end input >>> 了解!从现在开始,我将确保我的回答遵循用户的语言。如果用户使用中文,我将使用中文回答。如果用户使用英文,我将使用英文回答。请随时提出问题或请求,我将尽力提供帮助!
double enter to end input >>> 您什么问题都没有提出,请随时提出您的问题或请求,我将尽力提供帮助!
double enter to end input >>> 您想要我说什么?请随时提出您的请求或问题,我将尽力回答!
double enter to end input >>> 您想要我结束我们的对话?如果是这样,我可以结束我们的对话。如果您想要我继续回答问题或讨论话题,请随时提出!
double enter to end input >>> 您想让我说些什么?请随时提出您的请求或问题,我将尽力回答!
double enter to end input >>> 您想要我说一些随机的话题?如果是这样,我可以说些有趣的故事或分享一些有用的信息!
double enter to end input >>>
您想要我解答一个问题吗?如果是这样,请随时提出您的问题,我将尽力回答!
double enter to end input >>> 你好,欢迎报名Gavin大咖亲自授课的《企业级生成式人工智能LLM大模型技术、算法及案例实战》线上高级研修讲座!由中国通信工业协会举办😊您好!我非常高兴地宣布,这个研修讲座!Gavin大咖的讲座应该非常有趣和有价值,关联到企业级生成式人工智能LLM大模型技术、算法和案例实战,这应该对企业和个人来说都是非常有用的知识和经验。如果您有兴趣参加这个研修讲座,请尽快报名!📣
double enter to end input >>>

运行结果为:
在这里插入图片描述

网页客户端连接API服务器

关闭刚刚的VSCode终端,但服务器端的终端不要关闭。
运行之前确保自己的gradio版本低于4.0.0。

pip install gradio==3.50.2

使用Gradio作为前端,启动网页客户端。

lmdeploy serve gradio http://localhost:23333 \--server-name 0.0.0.0 \--server-port 6006

运行结果为:

(Llama3_lmdeploy) root@intern-studio-061925:~# lmdeploy serve gradio http://127.0.0.1:23333     --server-name 0.0.0.0     --server-port 6006server is gonna mount on: http://0.0.0.0:6006
Running on local URL:  http://0.0.0.0:6006

在这里插入图片描述

本地端口映射

ssh -CNg -L 6006:127.0.0.1:6006 root@ssh.intern-ai.org.cn -p 45102 

在这里插入图片描述
运行结果为
在这里插入图片描述
在这里插入图片描述

  1. 推理速度
    使用 LMDeploy 在 A100(80G)推理 Llama3,每秒请求处理数(RPS)高达 25,是 vLLM 推理效率的 1.8+ 倍。
    它的 benchmark 方式如下:
  • 下载测试数据
wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json

https://aicarrier.feishu.cn/wiki/XgZAwV1RWiKorbkQrKAcE6ufnoc

大模型技术分享

在这里插入图片描述
在这里插入图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/316938.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

基于SSM的个人博客系统(二)

目录 第四章 系统设计 4.1 系统总流程 4.2 博主用例 4.3 游客用例 4.4 系统类 一、博客类 二、博客类型类 三&#xff0c;评论类&#xff1a; 四&#xff0e;友情链接类 4.5 E-R图 4.6 系统表设计 前面内容请移步 基于SSM的个人博客系统&#xff08;一&#xff09;…

【云原生】Docker 实践(三):使用 Dockerfile 文件构建镜像

Docker 实践&#xff08;三&#xff09;&#xff1a;使用 Dockerfile 文件构建镜像 1.使用 Dockerfile 文件构建镜像2.Dockerfile 文件详解 1.使用 Dockerfile 文件构建镜像 Dockerfile 是一个文本文件&#xff0c;其中包含了一条条的指令&#xff0c;每一条指令都用于构建镜像…

IOT-9608I-L 的GPIO应用

目录 概述 1 GPIO接口介绍 2 板卡上操作IO 2.1 查看IO驱动 2.2 使用ECHO操作IO 2.2.1 端口选择 2.2.2 查看IO 2.2.3 echo操作IO 3 C语言实现一个操作IO的案例 3.1 功能介绍 3.2 代码实现 3.3 详细代码 4 测试 测试视频地址&#xff1a; IOT-9608I-L的一个简单测试&a…

实验8 NAT配置

实验8 NAT配置 一、 原理描述二、 实验目的三、 实验内容1.实验场景2.实验要求 四、 实验配置五、 实验步骤2.静态NAT配置3.NAT Outbound配置4.NAT Easy-IP配置 一、 原理描述 2019年11月26日&#xff0c;全球43亿个IPv4地址正式耗尽&#xff0c;这意味着没有更多的IPv4地址可…

【圆桌论坛】个人作为嘉宾参与问答环节的总结,Create 2024百度AI开发者大会之AI智能体开发与应用论坛

目录 ⭐前言⭐讨论话题✨本质和价值✨端侧部署✨应用商业模式✨商业模式 ⭐主题总结⭐有趣分享 ⭐前言 首先&#xff0c;非常荣幸和开心作为开发者和创业者代表参加百度Create AI大会分论坛圆桌论坛的问答环节。 在分论坛活动开始前&#xff0c;参加了文心智能体平台&#xff…

交叉调制少样本图像生成用于结直肠组织分类

文章目录 Cross-Modulated Few-Shot Image Generation for Colorectal Tissue Classification摘要方法实验结果 Cross-Modulated Few-Shot Image Generation for Colorectal Tissue Classification 摘要 提出问题&#xff1a; 针对罕见癌症组织的组织病理训练数据稀缺问题&…

Springboot+Vue项目-基于Java+MySQL的教学资料管理系统(附源码+演示视频+LW)

大家好&#xff01;我是程序猿老A&#xff0c;感谢您阅读本文&#xff0c;欢迎一键三连哦。 &#x1f49e;当前专栏&#xff1a;Java毕业设计 精彩专栏推荐&#x1f447;&#x1f3fb;&#x1f447;&#x1f3fb;&#x1f447;&#x1f3fb; &#x1f380; Python毕业设计 &…

FSD自动驾驶泛谈

特斯拉的FSD&#xff08;Full-Self Driving&#xff0c;全自动驾驶&#xff09;系统是特斯拉公司研发的一套完全自动驾驶系统。旨在最终实现车辆在多种驾驶环境下无需人类干预的自动驾驶能力。以下是对FSD系统的详细探讨&#xff1a; 系统概述 FSD是特斯拉的自动驾驶技术&…

MCGS:脚本程序

MCGS仿真控制要求 控制要求如下 用PLC控制灯字闪灭 1、广告字1亮&#xff0c;1秒后熄灭&#xff1b; 2、广告字2亮&#xff0c;1秒后熄灭&#xff1b; 3、广告字3亮&#xff0c;1秒后熄灭&#xff1b; 4、广告字4亮&#xff0c;1秒后熄灭&#xff1b; 5、广告字5亮&#xff0c;…

C语言【动态内存】

1.为什么要有动态内存 我们现在掌握的内存开辟方法有&#xff1a; int val 20;//在栈空间开辟4个字节 char str[10]{0};//在栈空间开辟10个字节的连续的空间但是上述的方式有两个点要注意&#xff1a; 1.空间开辟的大小是固定的 2.数组在申明的时候&#xff0c;一定要指定数…

shell脚本,删除30天以前的日志,并将日志推送到nas,但运行出现/bin/bash^M。

删除30天以前的日志 将日志推送到nas中&#xff0c;然后删除pod中的日志 pod挂载到本地 运行出现/bin/bash^M 1、删除30天以前的日志&#xff1a; #! /bin/bash# 定义源日志目录 LOG_DIR/home/log/ # 删除日志 find $LOG_DIR -type f -name "*.log" -mtime 30 -exec…

线上线下收银一体化,新零售POS系统引领连锁门店数字化转型-亿发

在市场竞争日益激烈的背景下&#xff0c;没有哪个商家能够永远屹立不倒。随着互联网技术的快速发展&#xff0c;传统的线下门店面临着来自电商和新零售的新型挑战。实体零售和传统电商都需要进行变革&#xff0c;都需要实现线上线下的融合。 传统零售在客户消费之后就与商家失…

从MySQL+MyCAT架构升级为分布式数据库,百丽应用OceanBase 4.2的感受分享

本文来自OceanBase的客户&#xff0c;百丽时尚的使用和测试分享 业务背景 百丽时尚集团&#xff0c;作为国内大型时尚鞋服集团&#xff0c;在中国超过300个城市设有直营门店&#xff0c;数量超过9,000家。集团构建了以消费者需求为核心的垂直一体化业务模式&#xff0c;涵盖了…

FORM调用标准AP\AR\GL\FA界面

EBS FORM客户化界面有时候数据需要追溯打开AP\AR\GL\FA等界面&#xff1a; 一种打开日记账的方式&#xff1a; PROCEDURE SHOW_JOURNAL ISparent_form_id FormModule;child_form_id FormModule; BEGINclose_jrn;parent_form_id : FIND_FORM(:SYSTEM.CURRENT_FORM);COPY(TO…

安卓数据库SQLite

目录 一、SQLite数据库二、SQLiteOpenHelper和SQLiteDatabase2.1 SQLiteOpenHelper2.2 SQLiteDatabase 三、常见数据库使用介绍3.1 创建数据库3.2 插入数据3.3 修改数据&#xff08;升级数据库&#xff09;3.4 删除数据3.5 查询数据3.6 关闭数据库3.7 删除数据库 一、SQLite数据…

Apache中如何配置 ws 接口

Apache中如何配置 wss 接口 在Apache中配置WebSockets的支持&#xff0c;你需要使用mod_proxy_wstunnel模块&#xff0c;该模块是Apache的一个代理模块&#xff0c;它允许你代理WebSocket请求。 以下是配置步骤的简要说明和示例&#xff1a; 确保你的Apache服务器安装了mod_…

【linux-1-Ubuntu常用命令-vim编辑器-Vscode链接ubuntu远程开发】

目录 1. 安装虚拟机Vmare和在虚拟机上安装Ubuntu系统&#xff1a;2. 常用的Ubuntu常识和常用命令2.1 文件系统结构2.2 常用命令2.3 vim编辑器 3. Ubuntu能联网但是ping不通电脑&#xff1a;4. Windows上安装VScode链接ubuntu系统&#xff0c;进行远程开发&#xff1a; 1. 安装虚…

变电站综合自动化系统:Modbus-PLC-645转IEC104网关方案

前言 电力行业作为关系国计民生的重要基础产业&#xff0c;是关系千家万户的公用事业。但是要做好电力行业安全保障工作的前提&#xff0c;是需要对应的技术人员详细了解电力工业使用的系统、设备以及各类协议的安全特性&#xff0c;本文将主要介绍IEC 104协议的定义和钡铼技术…

【百度Apollo】探索自动驾驶:Apollo 新版本 Beta 全新的Dreamview+,便捷灵活更丰富

&#x1f3ac; 鸽芷咕&#xff1a;个人主页 &#x1f525; 个人专栏: 《linux深造日志》《粉丝福利》 ⛺️生活的理想&#xff0c;就是为了理想的生活! 文章目录 引入一、Dreamview介绍二、Dreamview 新特性2.1、基于模式的多场景——流程更简洁地图视角调节&#xff1a;调试流…

使用 scikit-learn 进行机器学习的基本原理-2

介绍 scikit-learn 估计器对象 每个算法都通过“Estimator”对象在 scikit-learn 中公开。 例如&#xff0c;线性回归是&#xff1a;sklearn.linear_model.LinearRegression 估计器参数&#xff1a;估计器的所有参数都可以在实例化时设置&#xff1a; 拟合数据 让我们用 nump…