超越GPT4V，最强多模态MiniCPM-V2.6模型分享

MiniCPM-V2.6是由OpenBMB开发的一款多模态大型语言模型（MLLM），专为视觉-语言理解设计。

MiniCPM-V2.6模型能够处理图像、视频和文本输入，并提供高质量的文本输出。

MiniCPM-V 2.6模型在单图像理解方面超越了广泛使用的专有模型，如GPT-4o mini、GPT-4V、Gemini 1.5 Pro和Claude 3.5 Sonnet。

MiniCPM-V 2.6还能够执行多图像理解和上下文学习，并且在Mantis-Eval、BLINK、Mathverse mv和Sciverse mv等流行的多图像基准测试中取得了最先进的性能。

此外，MiniCPM-V 2.6还能够接受视频输入，进行对话并为时空信息提供密集的字幕，性能超过了GPT-4V、Claude 3.5 Sonnet和LLaVA-NeXT-Video-34B。

github项目地址：https://github.com/OpenBMB/MiniCPM-V。

一、环境安装

1、python环境

建议安装python版本在3.10以上。

2、pip库安装

pip install torch==2.1.2+cu118 torchvision==0.16.2+cu118 torchaudio==2.1.2 --extra-index-url https://download.pytorch.org/whl/cu118

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

3、MiniCPM-V-2_6模型下载：

git lfs install

git clone https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6

4、MiniCPM-V-2_6-gguf模型下载：

git lfs install

git clone https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6-gguf

5、MiniCPM-V-2_6-int4模型下载：

git lfs install

git clone https://www.modelscope.cn/models/openbmb/minicpm-v-2_6-int4

二、功能测试

1、运行测试：

（1）python代码调用测试

import torch
from PIL import Image
from modelscope import AutoModel, AutoTokenizer
import osdef load_model_and_tokenizer(model_name='OpenBMB/MiniCPM-V-2_6'):model = AutoModel.from_pretrained(model_name, trust_remote_code=True,attn_implementation='sdpa',torch_dtype=torch.bfloat16).eval().cuda()tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)return model, tokenizerdef load_image(image_path):try:with Image.open(image_path).convert('RGB') as image:return imageexcept Exception as e:print(f"Error loading image: {e}")return Nonedef generate_response(model, tokenizer, image, question, sampling=False, stream=False):msgs = [{'role': 'user', 'content': [image, question]}]res = model.chat(image=None,msgs=msgs,tokenizer=tokenizer,sampling=sampling,stream=stream)if stream:generated_text = ""for new_text in res:generated_text += new_textprint(new_text, flush=True, end='')return generated_textelse:return resdef main():model_name = 'OpenBMB/MiniCPM-V-2_6'image_path = 'image.png'question = 'What is in the image?'if not os.path.exists(image_path):print(f"Image path {image_path} does not exist.")returnmodel, tokenizer = load_model_and_tokenizer(model_name)image = load_image(image_path)if image is None:returnresponse = generate_response(model, tokenizer, image, question)print(response)# if you want to use streamingprint("\nStreaming response:")generate_response(model, tokenizer, image, question, sampling=True, stream=True)if __name__ == "__main__":main()

未完......

更多详细的欢迎关注：杰哥新技术