基于Qwen2-VL模型针对LaTeX OCR任务进行微调训练 - 原模型 单图推理

基于Qwen2-VL模型针对LaTeX OCR任务进行微调训练 - 原模型 单图推理

flyish

输入

在这里插入图片描述

输出

[‘这是一幅中国传统山水画,描绘了一座高耸的山峰,周围环绕着树木和植被。画面下方有一片开阔的田野,远处的山峦在薄雾中若隐若现。画面上方有几行书法题字,可能是画家的签名或诗文。画面右上角和左下角有几枚印章,可能是画家或收藏者的印章。整体色调以淡雅的青绿色为主,给人一种宁静、自然的感觉。’]

使用flash-attn 减少资源消耗

pip install flash-attn --no-build-isolation

代码如下

from PIL import Image
import requests
import torch
from torchvision import io
from typing import Dict
from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from modelscope import snapshot_download# 下载模型快照并指定保存目录
model_dir = snapshot_download("qwen/Qwen2-VL-7B-Instruct")# 加载模型到可用设备(CPU或GPU),并使用自动精度(根据设备自动选择)
model = Qwen2VLForConditionalGeneration.from_pretrained(model_dir, torch_dtype="auto", device_map="auto",attn_implementation="flash_attention_2"
)# 打印模型的所有属性及其值(不包括方法)
print("\nModel Attributes and Their Values:")
attributes = dir(model)
for attr in attributes:try:value = getattr(model, attr)if not callable(value):print(f"{attr}: {value}")except AttributeError:continue# 打印实际使用的 torch_dtype 
print(f"Actual torch_dtype: {model.dtype}")# 输出模型的结构
print("\nModel Configuration:")
print(model.config)
# 加载图像处理器
processor = AutoProcessor.from_pretrained(model_dir,low_cpu_mem_usage=False)# 图像的URL
#url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"# 从给定的URL加载图像
#image = Image.open(requests.get(url, stream=True).raw)# 从本地获取文件
image_path = './QueHuaQiuSe1.png'
image = Image.open(image_path)# 定义对话历史,包括用户输入的文本和图像 Describe this image.
conversation = [{"role": "user","content": [{"type": "image",},{"type": "text", "text": "描述这张图像。"},],}
]# 使用处理器应用聊天模板,并添加生成提示
text_prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)# 预处理输入数据,将文本和图像转换为模型可以接受的格式
inputs = processor(text=[text_prompt], images=[image], padding=True, return_tensors="pt"
)# 将输入数据移动到CUDA设备上(如果可用的话)
inputs = inputs.to("cuda")# 推理:生成输出文本
output_ids = model.generate(**inputs, max_new_tokens=128)  # 最大新生成token数量为128# 提取生成的token ID,去掉输入的原始token ID
generated_ids = [output_ids[len(input_ids) :]for input_ids, output_ids in zip(inputs.input_ids, output_ids)
]# 解码生成的token ID为人类可读的文本
output_text = processor.batch_decode(generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True
)# 打印生成的描述文本
print(output_text)

模型的属性

Downloading Model to directory: /home/sss/.cache/modelscope/hub/qwen/Qwen2-VL-7B-Instruct
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00,  1.45it/s]Model Attributes and Their Values:
T_destination: ~T_destination
__annotations__: {'dump_patches': <class 'bool'>, '_version': <class 'int'>, 'training': <class 'bool'>, '_parameters': typing.Dict[str, typing.Optional[torch.nn.parameter.Parameter]], '_buffers': typing.Dict[str, typing.Optional[torch.Tensor]], '_non_persistent_buffers_set': typing.Set[str], '_backward_pre_hooks': typing.Dict[int, typing.Callable], '_backward_hooks': typing.Dict[int, typing.Callable], '_is_full_backward_hook': typing.Optional[bool], '_forward_hooks': typing.Dict[int, typing.Callable], '_forward_hooks_with_kwargs': typing.Dict[int, bool], '_forward_hooks_always_called': typing.Dict[int, bool], '_forward_pre_hooks': typing.Dict[int, typing.Callable], '_forward_pre_hooks_with_kwargs': typing.Dict[int, bool], '_state_dict_hooks': typing.Dict[int, typing.Callable], '_load_state_dict_pre_hooks': typing.Dict[int, typing.Callable], '_state_dict_pre_hooks': typing.Dict[int, typing.Callable], '_load_state_dict_post_hooks': typing.Dict[int, typing.Callable], '_modules': typing.Dict[str, typing.Optional[ForwardRef('Module')]], 'call_super_init': <class 'bool'>, '_compiled_call_impl': typing.Optional[typing.Callable], 'forward': typing.Callable[..., typing.Any], '__call__': typing.Callable[..., typing.Any]}
__dict__: {'training': False, '_parameters': {}, '_buffers': {}, '_non_persistent_buffers_set': set(), '_backward_pre_hooks': OrderedDict(), '_backward_hooks': OrderedDict(), '_is_full_backward_hook': None, '_forward_hooks': OrderedDict(), '_forward_hooks_with_kwargs': OrderedDict(), '_forward_hooks_always_called': OrderedDict(), '_forward_pre_hooks': OrderedDict(), '_forward_pre_hooks_with_kwargs': OrderedDict(), '_state_dict_hooks': OrderedDict(), '_state_dict_pre_hooks': OrderedDict(), '_load_state_dict_pre_hooks': OrderedDict(), '_load_state_dict_post_hooks': OrderedDict(), '_modules': {'visual': Qwen2VisionTransformerPretrainedModel((patch_embed): PatchEmbed((proj): Conv3d(3, 1280, kernel_size=(2, 14, 14), stride=(2, 14, 14), bias=False))(rotary_pos_emb): VisionRotaryEmbedding()(blocks): ModuleList((0-31): 32 x Qwen2VLVisionBlock((norm1): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)(norm2): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)(attn): VisionFlashAttention2((qkv): Linear(in_features=1280, out_features=3840, bias=True)(proj): Linear(in_features=1280, out_features=1280, bias=True))(mlp): VisionMlp((fc1): Linear(in_features=1280, out_features=5120, bias=True)(act): QuickGELUActivation()(fc2): Linear(in_features=5120, out_features=1280, bias=True))))(merger): PatchMerger((ln_q): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)(mlp): Sequential((0): Linear(in_features=5120, out_features=5120, bias=True)(1): GELU(approximate='none')(2): Linear(in_features=5120, out_features=3584, bias=True)))
), 'model': Qwen2VLModel((embed_tokens): Embedding(152064, 3584)(layers): ModuleList((0-27): 28 x Qwen2VLDecoderLayer((self_attn): Qwen2VLFlashAttention2((q_proj): Linear(in_features=3584, out_features=3584, bias=True)(k_proj): Linear(in_features=3584, out_features=512, bias=True)(v_proj): Linear(in_features=3584, out_features=512, bias=True)(o_proj): Linear(in_features=3584, out_features=3584, bias=False)(rotary_emb): Qwen2VLRotaryEmbedding())(mlp): Qwen2MLP((gate_proj): Linear(in_features=3584, out_features=18944, bias=False)(up_proj): Linear(in_features=3584, out_features=18944, bias=False)(down_proj): Linear(in_features=18944, out_features=3584, bias=False)(act_fn): SiLU())(input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)(post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)))(norm): Qwen2RMSNorm((3584,), eps=1e-06)(rotary_emb): Qwen2VLRotaryEmbedding()
), 'lm_head': Linear(in_features=3584, out_features=152064, bias=False)}, 'config': Qwen2VLConfig {"_attn_implementation_autoset": true,"_name_or_path": "/home/sss/.cache/modelscope/hub/qwen/Qwen2-VL-7B-Instruct","architectures": ["Qwen2VLForConditionalGeneration"],"attention_dropout": 0.0,"bos_token_id": 151643,"eos_token_id": 151645,"hidden_act": "silu","hidden_size": 3584,"image_token_id": 151655,"initializer_range": 0.02,"intermediate_size": 18944,"max_position_embeddings": 32768,"max_window_layers": 28,"model_type": "qwen2_vl","num_attention_heads": 28,"num_hidden_layers": 28,"num_key_value_heads": 4,"rms_norm_eps": 1e-06,"rope_scaling": {"mrope_section": [16,24,24],"rope_type": "default","type": "default"},"rope_theta": 1000000.0,"sliding_window": 32768,"tie_word_embeddings": false,"torch_dtype": "bfloat16","transformers_version": "4.47.0","use_cache": true,"use_sliding_window": false,"video_token_id": 151656,"vision_config": {"in_chans": 3,"model_type": "qwen2_vl","spatial_patch_size": 14},"vision_end_token_id": 151653,"vision_start_token_id": 151652,"vision_token_id": 151654,"vocab_size": 152064
}
, 'name_or_path': '/home/sss/.cache/modelscope/hub/qwen/Qwen2-VL-7B-Instruct', 'warnings_issued': {}, 'generation_config': GenerationConfig {"attn_implementation": "flash_attention_2","bos_token_id": 151643,"do_sample": true,"eos_token_id": [151645,151643],"pad_token_id": 151643,"temperature": 0.01,"top_k": 1,"top_p": 0.001
}
, '_keep_in_fp32_modules': None, 'vocab_size': 152064, 'rope_deltas': None, '_is_hf_initialized': True, '_old_forward': <bound method Qwen2VLForConditionalGeneration.forward of Qwen2VLForConditionalGeneration((visual): Qwen2VisionTransformerPretrainedModel((patch_embed): PatchEmbed((proj): Conv3d(3, 1280, kernel_size=(2, 14, 14), stride=(2, 14, 14), bias=False))(rotary_pos_emb): VisionRotaryEmbedding()(blocks): ModuleList((0-31): 32 x Qwen2VLVisionBlock((norm1): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)(norm2): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)(attn): VisionFlashAttention2((qkv): Linear(in_features=1280, out_features=3840, bias=True)(proj): Linear(in_features=1280, out_features=1280, bias=True))(mlp): VisionMlp((fc1): Linear(in_features=1280, out_features=5120, bias=True)(act): QuickGELUActivation()(fc2): Linear(in_features=5120, out_features=1280, bias=True))))(merger): PatchMerger((ln_q): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)(mlp): Sequential((0): Linear(in_features=5120, out_features=5120, bias=True)(1): GELU(approximate='none')(2): Linear(in_features=5120, out_features=3584, bias=True))))(model): Qwen2VLModel((embed_tokens): Embedding(152064, 3584)(layers): ModuleList((0-27): 28 x Qwen2VLDecoderLayer((self_attn): Qwen2VLFlashAttention2((q_proj): Linear(in_features=3584, out_features=3584, bias=True)(k_proj): Linear(in_features=3584, out_features=512, bias=True)(v_proj): Linear(in_features=3584, out_features=512, bias=True)(o_proj): Linear(in_features=3584, out_features=3584, bias=False)(rotary_emb): Qwen2VLRotaryEmbedding())(mlp): Qwen2MLP((gate_proj): Linear(in_features=3584, out_features=18944, bias=False)(up_proj): Linear(in_features=3584, out_features=18944, bias=False)(down_proj): Linear(in_features=18944, out_features=3584, bias=False)(act_fn): SiLU())(input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)(post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)))(norm): Qwen2RMSNorm((3584,), eps=1e-06)(rotary_emb): Qwen2VLRotaryEmbedding())(lm_head): Linear(in_features=3584, out_features=152064, bias=False)
)>, '_hf_hook': AlignDevicesHook(execution_device=0, offload=False, io_same_device=True, offload_buffers=False, place_submodules=False, skip_keys='past_key_values'), 'forward': functools.partial(<function add_hook_to_module.<locals>.new_forward at 0x727f2aec7e20>, Qwen2VLForConditionalGeneration((visual): Qwen2VisionTransformerPretrainedModel((patch_embed): PatchEmbed((proj): Conv3d(3, 1280, kernel_size=(2, 14, 14), stride=(2, 14, 14), bias=False))(rotary_pos_emb): VisionRotaryEmbedding()(blocks): ModuleList((0-31): 32 x Qwen2VLVisionBlock((norm1): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)(norm2): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)(attn): VisionFlashAttention2((qkv): Linear(in_features=1280, out_features=3840, bias=True)(proj): Linear(in_features=1280, out_features=1280, bias=True))(mlp): VisionMlp((fc1): Linear(in_features=1280, out_features=5120, bias=True)(act): QuickGELUActivation()(fc2): Linear(in_features=5120, out_features=1280, bias=True))))(merger): PatchMerger((ln_q): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)(mlp): Sequential((0): Linear(in_features=5120, out_features=5120, bias=True)(1): GELU(approximate='none')(2): Linear(in_features=5120, out_features=3584, bias=True))))(model): Qwen2VLModel((embed_tokens): Embedding(152064, 3584)(layers): ModuleList((0-27): 28 x Qwen2VLDecoderLayer((self_attn): Qwen2VLFlashAttention2((q_proj): Linear(in_features=3584, out_features=3584, bias=True)(k_proj): Linear(in_features=3584, out_features=512, bias=True)(v_proj): Linear(in_features=3584, out_features=512, bias=True)(o_proj): Linear(in_features=3584, out_features=3584, bias=False)(rotary_emb): Qwen2VLRotaryEmbedding())(mlp): Qwen2MLP((gate_proj): Linear(in_features=3584, out_features=18944, bias=False)(up_proj): Linear(in_features=3584, out_features=18944, bias=False)(down_proj): Linear(in_features=18944, out_features=3584, bias=False)(act_fn): SiLU())(input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)(post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)))(norm): Qwen2RMSNorm((3584,), eps=1e-06)(rotary_emb): Qwen2VLRotaryEmbedding())(lm_head): Linear(in_features=3584, out_features=152064, bias=False)
)), 'to': <function Module.to at 0x727f28b1c7c0>, 'cuda': <function Module.cuda at 0x727f28b1c860>, 'hf_device_map': {'visual': 0, 'model.embed_tokens': 0, 'model.layers.0': 0, 'model.layers.1': 0, 'model.layers.2': 0, 'model.layers.3': 0, 'model.layers.4': 0, 'model.layers.5': 0, 'model.layers.6': 0, 'model.layers.7': 0, 'model.layers.8': 0, 'model.layers.9': 0, 'model.layers.10': 0, 'model.layers.11': 1, 'model.layers.12': 1, 'model.layers.13': 1, 'model.layers.14': 1, 'model.layers.15': 1, 'model.layers.16': 1, 'model.layers.17': 1, 'model.layers.18': 1, 'model.layers.19': 1, 'model.layers.20': 1, 'model.layers.21': 1, 'model.layers.22': 1, 'model.layers.23': 1, 'model.layers.24': 1, 'model.layers.25': 1, 'model.layers.26': 1, 'model.layers.27': 1, 'model.norm': 1, 'model.rotary_emb': 1, 'lm_head': 1}}
__doc__: None
__module__: transformers.models.qwen2_vl.modeling_qwen2_vl
__weakref__: None
_auto_class: None
_backward_hooks: OrderedDict()
_backward_pre_hooks: OrderedDict()
_buffers: {}
_compiled_call_impl: None
_forward_hooks: OrderedDict()
_forward_hooks_always_called: OrderedDict()
_forward_hooks_with_kwargs: OrderedDict()
_forward_pre_hooks: OrderedDict()
_forward_pre_hooks_with_kwargs: OrderedDict()
_hf_hook: AlignDevicesHook(execution_device=0, offload=False, io_same_device=True, offload_buffers=False, place_submodules=False, skip_keys='past_key_values')
_hf_peft_config_loaded: False
_is_full_backward_hook: None
_is_hf_initialized: True
/home/sss/anaconda3/lib/python3.12/site-packages/transformers/modeling_utils.py:5055: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` insteadwarnings.warn(
_is_quantized_training_enabled: False
_is_stateful: False
_keep_in_fp32_modules: None
_keep_in_fp32_modules: None
_keys_to_ignore_on_load_missing: None
_keys_to_ignore_on_load_unexpected: None
_keys_to_ignore_on_save: None
_load_state_dict_post_hooks: OrderedDict()
_load_state_dict_pre_hooks: OrderedDict()
_modules: {'visual': Qwen2VisionTransformerPretrainedModel((patch_embed): PatchEmbed((proj): Conv3d(3, 1280, kernel_size=(2, 14, 14), stride=(2, 14, 14), bias=False))(rotary_pos_emb): VisionRotaryEmbedding()(blocks): ModuleList((0-31): 32 x Qwen2VLVisionBlock((norm1): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)(norm2): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)(attn): VisionFlashAttention2((qkv): Linear(in_features=1280, out_features=3840, bias=True)(proj): Linear(in_features=1280, out_features=1280, bias=True))(mlp): VisionMlp((fc1): Linear(in_features=1280, out_features=5120, bias=True)(act): QuickGELUActivation()(fc2): Linear(in_features=5120, out_features=1280, bias=True))))(merger): PatchMerger((ln_q): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)(mlp): Sequential((0): Linear(in_features=5120, out_features=5120, bias=True)(1): GELU(approximate='none')(2): Linear(in_features=5120, out_features=3584, bias=True)))
), 'model': Qwen2VLModel((embed_tokens): Embedding(152064, 3584)(layers): ModuleList((0-27): 28 x Qwen2VLDecoderLayer((self_attn): Qwen2VLFlashAttention2((q_proj): Linear(in_features=3584, out_features=3584, bias=True)(k_proj): Linear(in_features=3584, out_features=512, bias=True)(v_proj): Linear(in_features=3584, out_features=512, bias=True)(o_proj): Linear(in_features=3584, out_features=3584, bias=False)(rotary_emb): Qwen2VLRotaryEmbedding())(mlp): Qwen2MLP((gate_proj): Linear(in_features=3584, out_features=18944, bias=False)(up_proj): Linear(in_features=3584, out_features=18944, bias=False)(down_proj): Linear(in_features=18944, out_features=3584, bias=False)(act_fn): SiLU())(input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)(post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)))(norm): Qwen2RMSNorm((3584,), eps=1e-06)(rotary_emb): Qwen2VLRotaryEmbedding()
), 'lm_head': Linear(in_features=3584, out_features=152064, bias=False)}
_no_split_modules: ['Qwen2VLDecoderLayer', 'Qwen2VLVisionBlock']
_non_persistent_buffers_set: set()
_parameters: {}
_skip_keys_device_placement: past_key_values
_state_dict_hooks: OrderedDict()
_state_dict_pre_hooks: OrderedDict()
_supports_cache_class: True
_supports_flash_attn_2: True
_supports_flex_attn: False
_supports_quantized_cache: False
_supports_sdpa: True
_supports_static_cache: True
_tied_weights_keys: ['lm_head.weight']
_tp_plan: None
_version: 1
base_model_prefix: model
call_super_init: False
config: Qwen2VLConfig {"_attn_implementation_autoset": true,"_name_or_path": "/home/sss/.cache/modelscope/hub/qwen/Qwen2-VL-7B-Instruct","architectures": ["Qwen2VLForConditionalGeneration"],"attention_dropout": 0.0,"bos_token_id": 151643,"eos_token_id": 151645,"hidden_act": "silu","hidden_size": 3584,"image_token_id": 151655,"initializer_range": 0.02,"intermediate_size": 18944,"max_position_embeddings": 32768,"max_window_layers": 28,"model_type": "qwen2_vl","num_attention_heads": 28,"num_hidden_layers": 28,"num_key_value_heads": 4,"rms_norm_eps": 1e-06,"rope_scaling": {"mrope_section": [16,24,24],"rope_type": "default","type": "default"},"rope_theta": 1000000.0,"sliding_window": 32768,"tie_word_embeddings": false,"torch_dtype": "bfloat16","transformers_version": "4.47.0","use_cache": true,"use_sliding_window": false,"video_token_id": 151656,"vision_config": {"in_chans": 3,"model_type": "qwen2_vl","spatial_patch_size": 14},"vision_end_token_id": 151653,"vision_start_token_id": 151652,"vision_token_id": 151654,"vocab_size": 152064
}device: cuda:0
dtype: torch.bfloat16
dummy_inputs: {'input_ids': tensor([[7, 6, 0, 0, 1],[1, 2, 3, 0, 0],[0, 0, 0, 4, 5]])}
dump_patches: False
framework: pt
generation_config: GenerationConfig {"attn_implementation": "flash_attention_2","bos_token_id": 151643,"do_sample": true,"eos_token_id": [151645,151643],"pad_token_id": 151643,"temperature": 0.01,"top_k": 1,"top_p": 0.001
}hf_device_map: {'visual': 0, 'model.embed_tokens': 0, 'model.layers.0': 0, 'model.layers.1': 0, 'model.layers.2': 0, 'model.layers.3': 0, 'model.layers.4': 0, 'model.layers.5': 0, 'model.layers.6': 0, 'model.layers.7': 0, 'model.layers.8': 0, 'model.layers.9': 0, 'model.layers.10': 0, 'model.layers.11': 1, 'model.layers.12': 1, 'model.layers.13': 1, 'model.layers.14': 1, 'model.layers.15': 1, 'model.layers.16': 1, 'model.layers.17': 1, 'model.layers.18': 1, 'model.layers.19': 1, 'model.layers.20': 1, 'model.layers.21': 1, 'model.layers.22': 1, 'model.layers.23': 1, 'model.layers.24': 1, 'model.layers.25': 1, 'model.layers.26': 1, 'model.layers.27': 1, 'model.norm': 1, 'model.rotary_emb': 1, 'lm_head': 1}
is_gradient_checkpointing: False
is_parallelizable: False
`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.
main_input_name: input_ids
model_tags: None
name_or_path: /home/sss/.cache/modelscope/hub/qwen/Qwen2-VL-7B-Instruct
rope_deltas: None
supports_gradient_checkpointing: True
supports_tp_plan: False
training: False
vocab_size: 152064
warnings_issued: {}
Actual torch_dtype: torch.bfloat16Model Configuration:
Qwen2VLConfig {"_attn_implementation_autoset": true,"_name_or_path": "/home/sss/.cache/modelscope/hub/qwen/Qwen2-VL-7B-Instruct","architectures": ["Qwen2VLForConditionalGeneration"],"attention_dropout": 0.0,"bos_token_id": 151643,"eos_token_id": 151645,"hidden_act": "silu","hidden_size": 3584,"image_token_id": 151655,"initializer_range": 0.02,"intermediate_size": 18944,"max_position_embeddings": 32768,"max_window_layers": 28,"model_type": "qwen2_vl","num_attention_heads": 28,"num_hidden_layers": 28,"num_key_value_heads": 4,"rms_norm_eps": 1e-06,"rope_scaling": {"mrope_section": [16,24,24],"rope_type": "default","type": "default"},"rope_theta": 1000000.0,"sliding_window": 32768,"tie_word_embeddings": false,"torch_dtype": "bfloat16","transformers_version": "4.47.0","use_cache": true,"use_sliding_window": false,"video_token_id": 151656,"vision_config": {"in_chans": 3,"model_type": "qwen2_vl","spatial_patch_size": 14},"vision_end_token_id": 151653,"vision_start_token_id": 151652,"vision_token_id": 151654,"vocab_size": 152064
}

输入

在这里插入图片描述

输出

[‘这张图片展示了一个宁静的湖泊景色。湖面平静,反射出天空和远处的山峰。湖的中央有一个小岛,上面有一些树木和灌木丛。岛的周围是清澈的水面,可以看到一些水生植物。远处可以看到一些建筑物,可能是住宅区或城市的一部分。天空晴朗,呈现出淡蓝色,给人一种宁静和平静的感觉。’]

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/488741.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

LLM - 多模态大模型的开源评估工具 VLMEvalKit 部署与测试 教程

欢迎关注我的CSDN&#xff1a;https://spike.blog.csdn.net/ 本文地址&#xff1a;https://spike.blog.csdn.net/article/details/144353087 免责声明&#xff1a;本文来源于个人知识与公开资料&#xff0c;仅用于学术交流&#xff0c;欢迎讨论&#xff0c;不支持转载。 VLMEva…

心觉:个人成长的密码,牛顿和量子力学已给出

Hi&#xff0c;我是心觉&#xff0c;带你用潜意识化解各种焦虑、内耗&#xff0c;建立无敌自信&#xff1b;教你财富精准显化的实操方法&#xff1b;关注我,伴你一路成长&#xff01; 每日一省写作258/1000天 在探索个人成长的奥秘时&#xff0c;你可曾想过&#xff0c;那些看似…

电子商务人工智能指南 5/6 - 丰富的产品数据

介绍 81% 的零售业高管表示&#xff0c; AI 至少在其组织中发挥了中等至完全的作用。然而&#xff0c;78% 的受访零售业高管表示&#xff0c;很难跟上不断发展的 AI 格局。 近年来&#xff0c;电子商务团队加快了适应新客户偏好和创造卓越数字购物体验的需求。采用 AI 不再是一…

JS进阶DAY3|事件(三)事件委托

目录 一、事件委托 1.1 概念 1.2 代码示例 二、tab栏切换案例 一、事件委托 1.1 概念 事件委托是一种在JavaScript中常用的技术&#xff0c;它利用了DOM事件冒泡的原理。事件冒泡是指当在DOM树中较低层次的元素上发生事件时&#xff0c;这个事件会向上冒泡到更高层次的元素…

Y3编辑器文档4:触发器1(对话、装备、特效、行为树、排行榜、不同步问题)

文章目录 一、触发器简介1.1 触发器界面1.2 ECA语句编辑及快捷键1.3 参数设置1.4 变量设置1.5 实体触发器1.6 函数库与触发器复用 二、触发器的多层结构2.1 子触发器&#xff08;在游戏内对新的事件进行注册&#xff09;2.2 触发器变量作用域2.3 复合条件2.4 循环2.5 计时器2.6…

LoRA:低秩分解微调与代码

传统的微调&#xff0c;即微调全量参数&#xff0c;就是上面的公式&#xff0c;但是我们可以通过两个矩阵&#xff0c;来模拟这个全量的矩阵&#xff0c;如果原来的W是(N * N)维度&#xff0c;我们可以通过两个(N * R) 和 (R * N)的矩阵矩阵乘&#xff0c;来模拟微调的结果。 …

鸿蒙系统-前端0帧起手

鸿蒙系统-前端0帧起手 先search 一番 找到对应的入门文档1. 运行项目遇到问题 如下 &#xff08;手动设计npm 的 registry 运行 npm config set registry https://registry.npmjs.org/&#xff09;2.运行后不支持一些模拟器 配置一下&#xff08;如下图&#xff0c;运行成功&am…

Java面试之实现多线程(二)

此篇接上一篇Java面试之什么是多线程(一) Java多线程是Java语言中的一个重要特性&#xff0c;它可以实现并发处理、提高程序的性能和响应能力。开发者需要了解多线程的概念和机制&#xff0c;并采用合适的多线程编程模型和同步机制&#xff0c;以保证程序的正确性和稳定性。Jav…

酒店/电影推荐系统里面如何应用深度学习如CNN?

【1】酒店推荐系统里面如何应用CNN&#xff1f;具体过程是什么 在酒店推荐系统中应用卷积神经网络&#xff08;CNN&#xff09;并不是一个常见的选择&#xff0c;因为 CNN 主要用于处理具有空间结构的数据&#xff0c;如图像、音频和某些类型的序列数据。然而&#xff0c;在某…

三、nginx实现lnmp+discuz论坛

lnmp l&#xff1a;linux操作系统 n&#xff1a;nginx前端页面 m&#xff1a;mysql数据库&#xff0c;账号密码&#xff0c;数据库等等都保存在这个数据库里面 p&#xff1a;php——nginx擅长处理的是静态页面&#xff0c;页面登录账户&#xff0c;需要请求到数据库&#…

03篇--二值化与自适应二值化

二值化 定义 何为二值化&#xff1f;顾名思义&#xff0c;就是将图像中的像素值改为只有两种值&#xff0c;黑与白。此为二值化。 二值化操作的图像只能是灰度图&#xff0c;意思就是二值化也是一个二维数组&#xff0c;它与灰度图都属于单信道&#xff0c;仅能表示一种色调…

CV之UIGM之OmniGen:《OmniGen: Unified Image Generation》翻译与解读

CV之UIGM之OmniGen&#xff1a;《OmniGen: Unified Image Generation》翻译与解读 导读&#xff1a;这篇论文介绍了OmniGen&#xff0c;一个用于统一图像生成的扩散模型。 >> 背景痛点&#xff1a;目前的图像生成模型大多专注于特定任务&#xff0c;例如文本到图像生成。…

数据分析python小工具录入产品信息到Excel

在没有后台管理系统的时候&#xff0c;有时候为了方便起见&#xff0c;想提供一个输入框让运营人员直接输入&#xff0c;然后数据就会以数据库的形式存进数据库 效果图&#xff1a; 输入用户名 输入数据 输入信息后点击添加到表格&#xff0c;检查后方便批量保存到excel …

scala的泛型2

package test55 //隐式转换 //1.隐式函数 //2.隐式类 //3.隐式对象 //4.函数的隐式参数//泛型&#xff1a;类型参数化。 //Pair 约定一对数据 class Pair[T](var x:T, var y:T) //泛型的应用场景&#xff1a; //1.泛型函数 //2.泛型类 //3.泛型特质 object test2 {def main(arg…

【刷题22】BFS解决最短路问题

目录 一、边权为1的最短路问题二、迷宫中离入口最近的出口三、最小基因变化四、单词接龙五、为高尔夫比赛砍树 一、边权为1的最短路问题 如图&#xff1a;从A到I&#xff0c;怎样走路径最短 一个队列一个哈希表队列&#xff1a;一层一层递进&#xff0c;直到目的地为止哈希表&…

Google Cloud Database Option(数据库选项说明)

关系数据库 在关系数据库中&#xff0c;信息存储在表、行和列中&#xff0c;这通常最适合结构化数据。因此&#xff0c;它们用于数据结构不经常更改的应用程序。与大多数关系数据库交互时使用 SQL&#xff08;结构化查询语言&#xff09;。它们为数据提供 ACID 一致性模式&am…

【Java 学习】面向程序的三大特性:封装、继承、多态

引言 1. 封装1.1 什么是封装呢&#xff1f;1.2 访问限定符1.3 使用封装 2. 继承2.1 为什么要有继承&#xff1f;2.2 继承的概念2.3 继承的语法2.4 访问父类成员2.4.1 子类中访问父类成员的变量2.4.2 访问父类的成员方法 2.5 super关键字2.6 子类的构造方法 3. 多态3.1 多态的概…

PAT甲级-1114 Family Property

题目 题目大意 共有n个户主&#xff0c;每个户主的房产按照“ 户主id 父亲id 母亲id 孩子个数 孩子的id 房产数 房产面积 ”的格式给出。如果父亲或母亲不存在&#xff0c;值为-1。每个户主及其父亲母亲孩子可以构成一个家庭&#xff0c;不同户主如果有相同的家人&#xff0c;…

如何不重启修改K8S containerd容器的内存限制(Cgroup方法)

1. 使用crictl 查看容器ID crictl ps2. 查看Cgroup位置 crictl inspect 容器ID3. 到容器Cgroup的目录下 使用上个命令就能找到CgroupPath 4 . 到cgroup目录下 正确目录是 : /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podf68e18…

《计算机视觉:瓶颈之辩与未来之路》

一、计算机视觉的崛起 计算机视觉是使用计算机模仿人类视觉系统的科学&#xff0c;让计算机拥有类似人类提取、处理、理解和分析图像以及图像序列的能力。它是一个多学科交叉的领域&#xff0c;与机器视觉、图像处理、人工智能、机器学习等领域密切相关。 计算机视觉行业可分为…