MiniCPM-V2.6是由OpenBMB开发的一款多模态大型语言模型(MLLM),专为视觉-语言理解设计。
MiniCPM-V2.6模型能够处理图像、视频和文本输入,并提供高质量的文本输出。
MiniCPM-V 2.6模型在单图像理解方面超越了广泛使用的专有模型,如GPT-4o mini、GPT-4V、Gemini 1.5 Pro和Claude 3.5 Sonnet。
MiniCPM-V 2.6还能够执行多图像理解和上下文学习,并且在Mantis-Eval、BLINK、Mathverse mv和Sciverse mv等流行的多图像基准测试中取得了最先进的性能。
此外,MiniCPM-V 2.6还能够接受视频输入,进行对话并为时空信息提供密集的字幕,性能超过了GPT-4V、Claude 3.5 Sonnet和LLaVA-NeXT-Video-34B。
github项目地址:https://github.com/OpenBMB/MiniCPM-V。
一、环境安装
1、python环境
建议安装python版本在3.10以上。
2、pip库安装
pip install torch==2.1.2+cu118 torchvision==0.16.2+cu118 torchaudio==2.1.2 --extra-index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
3、MiniCPM-V-2_6模型下载:
git lfs install
git clone https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6
4、MiniCPM-V-2_6-gguf模型下载:
git lfs install
git clone https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6-gguf
5、MiniCPM-V-2_6-int4模型下载:
git lfs install
git clone https://www.modelscope.cn/models/openbmb/minicpm-v-2_6-int4
二、功能测试
1、运行测试:
(1)python代码调用测试
import torch
from PIL import Image
from modelscope import AutoModel, AutoTokenizer
import osdef load_model_and_tokenizer(model_name='OpenBMB/MiniCPM-V-2_6'):model = AutoModel.from_pretrained(model_name, trust_remote_code=True,attn_implementation='sdpa',torch_dtype=torch.bfloat16).eval().cuda()tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)return model, tokenizerdef load_image(image_path):try:with Image.open(image_path).convert('RGB') as image:return imageexcept Exception as e:print(f"Error loading image: {e}")return Nonedef generate_response(model, tokenizer, image, question, sampling=False, stream=False):msgs = [{'role': 'user', 'content': [image, question]}]res = model.chat(image=None,msgs=msgs,tokenizer=tokenizer,sampling=sampling,stream=stream)if stream:generated_text = ""for new_text in res:generated_text += new_textprint(new_text, flush=True, end='')return generated_textelse:return resdef main():model_name = 'OpenBMB/MiniCPM-V-2_6'image_path = 'image.png'question = 'What is in the image?'if not os.path.exists(image_path):print(f"Image path {image_path} does not exist.")returnmodel, tokenizer = load_model_and_tokenizer(model_name)image = load_image(image_path)if image is None:returnresponse = generate_response(model, tokenizer, image, question)print(response)# if you want to use streamingprint("\nStreaming response:")generate_response(model, tokenizer, image, question, sampling=True, stream=True)if __name__ == "__main__":main()
未完......
更多详细的欢迎关注:杰哥新技术