Visualglm-6b

【官方教程】VisualGLM技术讲解_哔哩哔哩_bilibili报告文件下载: https://pan.baidu.com/s/1gfdpyfT6EVnygMPDO_iwvQ?pwd=8wpc 提取码: 8wpcVisualGLM-6B 是一个开源的,支持图像、中文和英文的多模态对话语言模型,语言模型基于ChatGLM-6B,具有 62 亿参数;图像部分通过训练BLIP2-Qformer 构建起视觉模型与语言模型的桥梁,整体模型共78亿参数。, 视频播放量 6647、弹幕量 3、点赞数 267、投硬币枚数 139、收藏人数 615、转发人数 109, 视频作者 ChatGLM, 作者简介 让机器像人一样思考,相关视频:【清华NLP】刘知远团队大模型公开课全网首发|带你从入门到实战,【官方教程】ChatGLM-6B 微调:P-Tuning,LoRA,Full parameter,超强动画,一步一步深入浅出解释Transformer原理!,MiniGPT4横空出世,本地电脑直接部署运行,一出手就霸榜Github!!,【官方教程】XrayGLM微调实践,(加强后的GPT-3.5)能力媲美4.0,无次数限制。仅供技术交流。,惊出!ChatGPT4.0被开源?不仅GitHub星标38.4k!还上了Trending周榜~~,吹爆!这可能是2023最新的GPT4公开讲座了,1小时讲明白GPT4是如何工作的,以及使用GPT4打造智能程序,看完就对GPT4全面了解!人工智能|机器学习,LangChain + GLM =本地知识库,pytorch入门19 - 本地知识库LLM对话系统(langchain-ChatGLM项目)- 源码分析(1/2)https://www.bilibili.com/video/BV14L411q7fk/?spm_id_from=333.999.0.0&vd_source=4aed82e35f26bb600bc5b46e65e25c22

通过VQVAE将图像变成token

vit对量化比较敏感,text侧还可以,4bit也是针对chatglm

模型:

VisualGLMModel((mixins): ModuleDict((chatglm-final): ChatGLMFinalMixin((lm_head): Linear(in_features=4096, out_features=130528, bias=False))(chatglm-attn): ChatGLMAttnMixin((rotary_emb): RotaryEmbedding())(chatglm-layer): ChatGLMLayerMixin()(eva): ImageMixin((model): BLIP2((vit): EVAViT((mixins): ModuleDict((patch_embedding): ImagePatchEmbeddingMixin((proj): Conv2d(3, 1408, kernel_size=(14, 14), stride=(14, 14)))(pos_embedding): InterpolatedPositionEmbeddingMixin()(cls): LNFinalyMixin((ln_vision): LayerNorm((1408,), eps=1e-05, elementwise_affine=True)))(transformer): BaseTransformer((embedding_dropout): Dropout(p=0.1, inplace=False)(word_embeddings): VocabParallelEmbedding()(position_embeddings): Embedding(257, 1408)(layers): ModuleList((0): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(1): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(2): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(3): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(4): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(5): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(6): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(7): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(8): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(9): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(10): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(11): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(12): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(13): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(14): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(15): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(16): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(17): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(18): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(19): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(20): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(21): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(22): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(23): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(24): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(25): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(26): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(27): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(28): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(29): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(30): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(31): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(32): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(33): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(34): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(35): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(36): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(37): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(38): BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False))))))(qformer): QFormer((mixins): ModuleDict()(transformer): BaseTransformer((embedding_dropout): Dropout(p=0.1, inplace=False)(word_embeddings): VocabParallelEmbedding()(position_embeddings): None(layers): ModuleList((0): BaseTransformerLayer((input_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(cross_attention): CrossAttention((query): ColumnParallelLinear()(key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_cross_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(1): BaseTransformerLayer((input_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(2): BaseTransformerLayer((input_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(cross_attention): CrossAttention((query): ColumnParallelLinear()(key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_cross_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(3): BaseTransformerLayer((input_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(4): BaseTransformerLayer((input_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(cross_attention): CrossAttention((query): ColumnParallelLinear()(key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_cross_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(5): BaseTransformerLayer((input_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(6): BaseTransformerLayer((input_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(cross_attention): CrossAttention((query): ColumnParallelLinear()(key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_cross_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(7): BaseTransformerLayer((input_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(8): BaseTransformerLayer((input_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(cross_attention): CrossAttention((query): ColumnParallelLinear()(key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_cross_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(9): BaseTransformerLayer((input_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(10): BaseTransformerLayer((input_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(cross_attention): CrossAttention((query): ColumnParallelLinear()(key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_cross_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(11): BaseTransformerLayer((input_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False))))(final_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)))(glm_proj): Linear(in_features=768, out_features=4096, bias=True)))(auto-regressive): CachedAutoregressiveMixin())(transformer): BaseTransformer((embedding_dropout): Dropout(p=0, inplace=False)(word_embeddings): VocabParallelEmbedding()(position_embeddings): Embedding(2048, 4096)(layers): ModuleList((0): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(1): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(2): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(3): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(4): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(5): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(6): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(7): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(8): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(9): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(10): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(11): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(12): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(13): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(14): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(15): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(16): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(17): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(18): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(19): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(20): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(21): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(22): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(23): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(24): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(25): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(26): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False)))(27): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0, inplace=False))))(final_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True))
)

推理:

model,model_args=AutoModel.from_pretrained("visualglm")->
quantize(model.transformer,)->
prompt='<img>/root/autodl-tmp/VisualGLM-6B/examples/2.jpeg</img>'->
'<img>/root/autodl-tmp/VisualGLM-6B/examples/2.jpeg</img>问:描述这张图片。\n答:'->
image_position:5,image_path:'/root/autodl-tmp/VisualGLM-6B/examples/2.jpeg',text:'<img></img>问:描述这张图片。\n答:'->
BlipImageEvalProcessor(224)->
- transform=transforms.Compose([transforms.Resize([]),transforms.ToTensor(),self.normalize])->
image=processor(image.convert('RGB'))-> [3,224,224]
image=image.unsqueeze(0)->[1,3,224,224]
input0=tokenizer.encode(prompt[:image_position],add_special_tokens=False)->:<img>,[46,2265,33]
input1=[tokenizer.pad_token_id]*model.image_length->[3]*32
input2=tokenizer.encode(prompt[image_position:],)->'</img>问:描述这张图片。\n答:'[98, 2265, 33, 64286, 12, 67194, 74366, 65842, 63823, 4, 67342, 12]
inputs=sum([input0,input1,input2],[])->[46, 2265, 33, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 98, 2265, 33, 64286, 12, 67194, 74366, 65842, 63823, 4, 67342, 12]
inputs=torch.tensor(tokenizer.build_inputs_with_special_tokens(inputs))->
mask_position=len(inputs)-2->47
context_length=len(inputs)-1->48
get_masks_and_position_ids_glm->
seq=torch.cat([])->2048
[tokenizer.eos_token_id]:[13005],top_p:0.4,top_k:100,temerature:0.8,repetition_penalty:1.2->
output=filling_sequence(model,seq,batch_size=1,get_masks_and_position_ids,strategy,pre_image,image)->
filling_sequence=sat.generation.autoregressive_sampling.filling_sequence
== 生成初始的 tokens、attention_mask 和 position_ids
- tokens,attention_mask,position_ids=get_masks_and_position_ids(seq)->
-- tokens:,1x2048;attention_mask:1x2048x2048,attention_mask.tril_()对角线,1x1x2048x2048;
-- 2d position_ids:2x2048->1x2x2048,
-- tokens:1x2048,attention_mask:1x1x2048x2048,position_ids:1x2x2048->
- tokens=tokens[...,:context_length]:1x49->
== 初始化generation
- index:0,counter:48
== step-by-step generation
- logits,*output_per_layers=model(tokens[:,index:],position_ids=position_ids[...,index:counter+1],attention_mask=attention_mask[...,index:counter+1,:counter+1],mems,log_attention_weights)->
-- model=sat.model.official.chatglm_model.ChatGLMModel.forward
-- input_ids:1x49,position_ids:1x2x49,attention_mask:1x1x49x49->
-- super().forward(input_ids,attention_mask,position_ids,past_key_values)->
-- sat.model.base_model.BaseModel.forward()->
-- transformer.hooks.clear()->
-- transformer.hooks.update(hooks)->通过hook把qformer给整合进去了
-- transformer()->
- logits:1x49x130528,len(output_per_layers):28 output_per_layers[0]:past_key_values,mem_kv->
- mem_kv[0]:1x49x8192,len(mem_kv):28->
== sampling
- tokens,mems=strategy.forward(logits,tokens,memes)->
-- sat.generation.sampling_strategies.base_strategy.BaseStrategy.forward()->
-- logits:1x130528,tokens:1x49,mems:28x1x49x8192->
-- logits=top_k_logits(logits,top_k,top_p)->
-- sat.generation.sampling_strategies.base_strategy.top_k_logits->
-- 利用top_k,top_p来决定保留哪些token,在 top_k > 0 的情况下,该函数首先在 logits 中找到分值最大的前 k 个 token,然后将分值小于第 k 个 token 的 token 对应 logits 的值设置为一个很小的值,filter_value。这样就可以移除所有概率值小于第 k 个 token 的 token。
在 top_p > 0.0 的情况下,该函数先将 logits 进行排序,然后计算 softmax 函数,获取每个 token 及其前缀的累计概率。接着,在 cumulative_probs 大于 top_p 的位置,将 sorted_indices_to_remove 对应的项设为 True,表示要移除该位置及其之后的 token。移除时,将对应 logits 的值设置为 filter_value,从而达到移除的目的。
- 使用 logists 调用 top_k_logits 函数来移除低概率的 token,并使用 softmax 计算每个 token 的概率分布。接着,它使用 multinomial 方法基于这些概率分布从批次中进行抽样,得出下一个 token 的预测值 pred。
- probs=F.softmax(logits,dim=1)->1x130528
- pred=torch.multinoimal(prob,num_samples=1)->[51]
- tokens=torch.cat((tokens,pred.view(tokens.shape[0],1)),dim=1)
- strategy.finalize(tokens,mems)->
response=tokenizer.decode(output_list[0])->
response=process_reponse(response)->

注意这个网络和前面的图是能够联系在一起的,先使用blip2中的vit进行图像特征提取,是一个39层的transformer,再用qformer进行特征融合,是一个12层的transformer,在后面是chatglm,是一个28层的transformer。

微调:

model,args=FineTuneVisualGLMModel.from_pretrained("visualglm-6b")->
tokenizer=get_tokenizer(args)->
ChatGLMTokenizer(name_or_path='/root/.cache/huggingface/hub/models--THUDM--chatglm-6b/snapshots/1d240ba371910e9282298d4592532d7f0f3e9f3e', vocab_size=130344, model_max_length=1000000000000000019884624838656, is_fast=False, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<sop>', 'eos_token': '<eop>', 'unk_token': '<unk>', 'pad_token': '<pad>', 'mask_token': '[MASK]'}, clean_up_tokenization_spaces=True)
training_main()->
- train_data,val_data,test_data=make_loaders(args,hooks['create_dataset_function'])->
- model,optimizer=setup_model_untrainable_params_and_optimizer(model)->
- lr_scheduler=get_learning_rate_scheduler(optimizer)->
- summary_writer=get_sampler_writer()->
- iteration,skipped=train(model,optimizer,lr_scheduler,...)->
-- lm_loss,skipped_iter,metrics=train_step(train_data_iterator,model,optimizer,lr_scheduler)->
-- forward_ret=forward_step(data_iterator,model)->
-- tokens,labels,image,pre_image=get_batch(data_iteror...)->
-- data:[input_ids,labels,image,pre_image]:4x320,4x320,4x3x224x224,3->
-- finetune_visualglm.forward_step->sat.model.official.chatglm_model.ChatGLMModel.forward->
sat.model.base_model.BaseModel.forward->sat.model.transformer.BaseTransformer.forward->
-- hidden_states = self.hooks['word_embedding_forward'](input_ids, output_cross_layer=output_cross_layer, **kw_args)-> 4x257x1408
-- position_embeddings = self.hooks['position_embedding_forward'](position_ids, output_cross_layer=output_cross_layer, **kw_args)-> 1x257x1408
-- hidden_states = hidden_states + position_embeddings ->4x257x1408
-- hidden_states = self.embedding_dropout(hidden_states)-> 4x257x1408
-- layer_ret = layer(*args, layer_id=torch.tensor(i), **kw_args, **output_cross_layer,                        output_this_layer=output_this_layer_obj, output_cross_layer=output_cross_layer_obj)->
-- sat.model.transformer.BaseTransformerLayer.forward->sat.transformer_defaults->
--- self = self.transformer.layers[kw_args['layer_id']] kwargs:'layer_id','image','output_this_layer','output_cross_layer'
self=BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False))
)
--- attention_input = self.input_layernorm(hidden_states):4x257x1408->
SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False)
)
--- attention_output=self.attention(attention_input,mask,**kw_args)->
--- sat.model.transformer.SelfAttention.forward->sat.transformer_defaults.attention_forward_default->
--- self = self.transformer.layers[kw_args['layer_id']].attention->
--- mixed_raw_layer = self.query_key_value(hidden_states)-> hidden_states:4x257x1408,mixed_raw_layer:4x257x4224->
--- (mixed_query_layer,mixed_key_layer,mixed_value_layer) = split_tensor_along_last_dim(mixed_raw_layer, 3):4x257x1408->
--- query_layer = self._transpose_for_scores(mixed_query_layer):4x16x257x88
key_layer = self._transpose_for_scores(mixed_key_layer)
value_layer = self._transpose_for_scores(mixed_value_layer)->
--- context_layer = attention_fn(query_layer, key_layer, value_layer, mask, dropout_fn, **kw_args):4x16x257x88->
--- output = self.dense(context_layer):context_layer/output:4x257x1408->
--- output=self.output_dropout(output)->
-- hidden_states = hidden_states + attention_output:4x257x1408->
-- mlp_input = self.post_attention_layernorm(hidden_states):4x257x1408->
-- mlp_output = self.mlp(mlp_input, **kw_args):4x257x1408->
-- output = hidden_states + mlp_output:4x257x1408->
--- decoder模式
--- encoder_outputs = kw_args['encoder_outputs']:4x257x1408->
--- attention_output = self.cross_attention(mlp_input, **kw_args)->
--- sat.model.transformer.CrossAttention.forward->sat.transformer_defaults.cross_attention_forward_default->
CrossAttention((query): ColumnParallelLinear()(key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False)
)
--- hidden_states:4x32x768,corss_attention_mask:1x1,encoder_outputs:4x257x1408->
--- mixed_query_layer = self.query(hidden_states):4x32x768,图像的
--- mixed_x_layer = self.key_value(encoder_outputs):4x257x1536,text侧
--- (mixed_key_layer, mixed_value_layer) = split_tensor_along_last_dim(mixed_x_layer, 2) :4x257x768->
--- query_layer = self._transpose_for_scores(mixed_query_layer):4x12x32x64
--- key_layer = self._transpose_for_scores(mixed_key_layer)
--- value_layer = self._transpose_for_scores(mixed_value_layer)->
--- context_layer = attention_fn(query_layer, key_layer, value_layer, cross_attention_mask, dropout_fn, cross_attention=True, **kw_args):4x12x32x64->
--- output=self.dense(context_layer):context_layer/output:4x32x768->
--- output = self.output_dropout(output):4x32x768->
--- 会把self.transformer里的模块都过一篇
--- model.blip2.BLIP2.forward->
--- enc = self.vit(image) image:4x3x224x224,enc:4x257x1408 ->
EVAViT((mixins): ModuleDict((patch_embedding): ImagePatchEmbeddingMixin((proj): Conv2d(3, 1408, kernel_size=(14, 14), stride=(14, 14)))(pos_embedding): InterpolatedPositionEmbeddingMixin()(cls): LNFinalyMixin((ln_vision): LayerNorm((1408,), eps=1e-05, elementwise_affine=True)))(transformer): BaseTransformer((embedding_dropout): Dropout(p=0.1, inplace=False)(word_embeddings): VocabParallelEmbedding()(position_embeddings): Embedding(257, 1408)(layers): ModuleList((0-38): 39 x BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))))
)
--- out=self.qformer(enc)[0] out:
blip2.QFormer.forward->
---- input_ids:4x32,attention_mask:1x1,cross_attention_mask:1x1->
---- sat.model.transformer.BaseTransformer.forward->
--- self.glm_proj(out) 
--- model.visualglm.ImageMixin.word_ebedding_forward->
--- 

代码中确实没见到作者用blip2中的三种训练,用的仅是lm的自回归损失,也没看到对比损失。

FineTuneVisualGLMModel((mixins): ModuleDict((chatglm-final): ChatGLMFinalMixin((lm_head): Linear(in_features=4096, out_features=130528, bias=False))(chatglm-attn): ChatGLMAttnMixin((rotary_emb): RotaryEmbedding())(chatglm-layer): ChatGLMLayerMixin()(eva): ImageMixin((model): BLIP2((vit): EVAViT((mixins): ModuleDict((patch_embedding): ImagePatchEmbeddingMixin((proj): Conv2d(3, 1408, kernel_size=(14, 14), stride=(14, 14)))(pos_embedding): InterpolatedPositionEmbeddingMixin()(cls): LNFinalyMixin((ln_vision): LayerNorm((1408,), eps=1e-05, elementwise_affine=True)))(transformer): BaseTransformer((embedding_dropout): Dropout(p=0.1, inplace=False)(word_embeddings): VocabParallelEmbedding()(position_embeddings): Embedding(257, 1408)(layers): ModuleList((0-38): 39 x BaseTransformerLayer((input_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((1408,), eps=1e-06, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False))))))(qformer): QFormer((mixins): ModuleDict()(transformer): BaseTransformer((embedding_dropout): Dropout(p=0.1, inplace=False)(word_embeddings): VocabParallelEmbedding()(position_embeddings): None(layers): ModuleList((0): BaseTransformerLayer((input_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(cross_attention): CrossAttention((query): ColumnParallelLinear()(key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_cross_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(1): BaseTransformerLayer((input_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(2): BaseTransformerLayer((input_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(cross_attention): CrossAttention((query): ColumnParallelLinear()(key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_cross_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(3): BaseTransformerLayer((input_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(4): BaseTransformerLayer((input_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(cross_attention): CrossAttention((query): ColumnParallelLinear()(key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_cross_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(5): BaseTransformerLayer((input_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(6): BaseTransformerLayer((input_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(cross_attention): CrossAttention((query): ColumnParallelLinear()(key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_cross_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(7): BaseTransformerLayer((input_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(8): BaseTransformerLayer((input_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(cross_attention): CrossAttention((query): ColumnParallelLinear()(key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_cross_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(9): BaseTransformerLayer((input_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(10): BaseTransformerLayer((input_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(cross_attention): CrossAttention((query): ColumnParallelLinear()(key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_cross_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(11): BaseTransformerLayer((input_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False))))(final_layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)))(glm_proj): Linear(in_features=768, out_features=4096, bias=True)))(lora): LoraMixin())(transformer): BaseTransformer((embedding_dropout): Dropout(p=0.1, inplace=False)(word_embeddings): VocabParallelEmbedding()(position_embeddings): Embedding(2048, 4096)(layers): ModuleList((0): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): LoraLinear((original): HackColumnParallelLinear()(matrix_A): HackParameterList((0): Parameter containing: [torch.float16 of size 10x4096 (GPU 0)](1): Parameter containing: [torch.float16 of size 10x4096 (GPU 0)](2): Parameter containing: [torch.float16 of size 10x4096 (GPU 0)])(matrix_B): HackParameterList((0): Parameter containing: [torch.float16 of size 4096x10 (GPU 0)](1): Parameter containing: [torch.float16 of size 4096x10 (GPU 0)](2): Parameter containing: [torch.float16 of size 4096x10 (GPU 0)]))(attention_dropout): Dropout(p=0.1, inplace=False)(dense): LoraLinear((original): HackRowParallelLinear()(matrix_A): HackParameterList(  (0): Parameter containing: [torch.float16 of size 10x4096 (GPU 0)])(matrix_B): HackParameterList(  (0): Parameter containing: [torch.float16 of size 4096x10 (GPU 0)]))(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(1-13): 13 x BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(14): BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): LoraLinear((original): HackColumnParallelLinear()(matrix_A): HackParameterList((0): Parameter containing: [torch.float16 of size 10x4096 (GPU 0)](1): Parameter containing: [torch.float16 of size 10x4096 (GPU 0)](2): Parameter containing: [torch.float16 of size 10x4096 (GPU 0)])(matrix_B): HackParameterList((0): Parameter containing: [torch.float16 of size 4096x10 (GPU 0)](1): Parameter containing: [torch.float16 of size 4096x10 (GPU 0)](2): Parameter containing: [torch.float16 of size 4096x10 (GPU 0)]))(attention_dropout): Dropout(p=0.1, inplace=False)(dense): LoraLinear((original): HackRowParallelLinear()(matrix_A): HackParameterList(  (0): Parameter containing: [torch.float16 of size 10x4096 (GPU 0)])(matrix_B): HackParameterList(  (0): Parameter containing: [torch.float16 of size 4096x10 (GPU 0)]))(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False)))(15-27): 13 x BaseTransformerLayer((input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(attention): SelfAttention((query_key_value): ColumnParallelLinear()(attention_dropout): Dropout(p=0.1, inplace=False)(dense): RowParallelLinear()(output_dropout): Dropout(p=0.1, inplace=False))(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(mlp): MLP((dense_h_to_4h): ColumnParallelLinear()(dense_4h_to_h): RowParallelLinear()(dropout): Dropout(p=0.1, inplace=False))))(final_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True))
)

visualglm中核心的Qformer模块

推理时:

 训练时:

 

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/44878.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

7月最新大模型排名!3700道保密试题、20个大模型参与评测|SuperCLUE

7月最新大模型排名&#xff01;3700道保密试题、20个大模型参与评测&#xff5c;SuperCLUE CLUE中文语言理解测评基准 中文通用大模型综合性评测基准SuperCLUE 2023年7月榜单 7月25日&#xff0c;SuperCLUE发布大模型7月榜单。 SuperCLUE: A Benchmark for Foundation Mo…

2022春招,算法岗最全面试攻略,吃透28个必问题直接速通大厂

算法是比较复杂又基础的学科&#xff0c;每个学编程的人都会学习大量的算法。而根据统计&#xff0c;以下这28个问题是面试中最容易遇到的&#xff0c;本文给出了一些基本答案&#xff0c;供算法方向工程师或对此感兴趣的程序员参考。 除了文章提到的这些题目之外我还整理了很多…

新鲜出炉的 NLP 算法岗社招面试经验分享

最近终于做好了选择&#xff0c;决定从杭州“搬迁”到了上海&#xff0c;一切安顿好之后&#xff0c;终于有功夫可以好好整理一下近期面试遇到的一些问题以及自己的一些小经验啦&#xff0c;希望对同样有跳槽需求的同行小伙伴们有些小小的帮助。 【注】文末提供面试技术交流群…

【算法岗面试】某小厂V面试题

文章目录 一、关于Bert模型以及蒸馏的问题&#xff1a;1.1 蒸馏的思想&#xff0c;为什么要蒸馏&#xff1f;1.2 蒸馏中的学生模型是&#xff1f;1.3 有哪些蒸馏方式?1.4 Bert 的输入是什么&#xff1f;1.5 字向量的 embedding 怎么训练得到的&#xff1f; 二、关于 transform…

2022秋招,算法岗最全面试攻略,吃透28个必问题直接速通大厂

算法是比较复杂又基础的学科&#xff0c;每个学编程的人都会学习大量的算法。而根据统计&#xff0c;以下这28个问题是面试中最容易遇到的&#xff0c;本文给出了一些基本答案&#xff0c;供算法方向工程师或对此感兴趣的程序员参考。 除了文章提到的这些题目之外我还整理了很多…

算法岗面试题目汇总

目录 阿里巴巴一面 阿里巴巴二面 oppo一面 笨鸟科技 京东二面&#xff1a; 算法题&#xff1a; 阿里巴巴一面 特征值怎么去除掉行业和市值的影响&#xff1f;去残差是什么意思&#xff1f; cnn的那个项目数据处理是怎么做的&#xff1f; 卷积神经网络预测股票走势项目内…

大数据岗位和算法岗,面试官最爱问的10大问题

目录 1. 什么是数据结构&#xff1f; 2. 描述数据结构的类型&#xff1f; 3. 什么是线性数据结构&#xff1f;举例说明 4. 数据结构有哪些应用&#xff1f; 5、文件结构和存储结构有什么区别&#xff1f; 6、什么是多维数组&#xff1f; 7. 什么是链表数据结构&#xf…

量化岗经典面试题——赛马

本文源自&#xff1a;微信公众号QuantJob https://mp.weixin.qq.com/s/pO_6ZGKzCcNr2IJN7fH74A 有25匹马&#xff0c;每匹都以不同于其它马的恒定速度奔跑。由于赛道只有5条&#xff0c;每场比赛最多可有5匹马。如果你需要找3匹跑得最快的马&#xff0c;需要多少场比赛才能找…

算法岗必须人手一篇顶会?超详细面经:无论文、无实习拿下腾讯CV算法岗

点击上方“迈微AI研习社”&#xff0c;选择“星标★”公众号 重磅干货&#xff0c;第一时间送达 从迈微社友群中了解到&#xff0c;很多社友还是在校学生&#xff0c;并且有好些同学现在面临求职的阶段&#xff0c;特向大家推荐清雨卢同学的历程总结&#xff0c;应该会给大家一…

2019算法岗面试经验汇总

作者&#xff1a;太蔡了来源&#xff1a;牛客网&#xff0c;Jerry的算法和NLP 背景&#xff1a;211本&#xff0c;C9硕&#xff0c;都是非科班。主要投CV的算法岗&#xff0c;无竞赛&#xff0c;无论文&#xff0c;两-三个实验室CV相关项目&#xff0c;一段旷视的暑期实习经历。…

腾讯148道面试题,(程序员必备学习方向)全会拿45Koffer没问题

相信你可能经历过这些&#xff1a; 已经工作两三年了&#xff0c;每个项目都会加班加点全力以赴去完成&#xff0c;薪资增长幅度却不如人意。 听说年后离职的老同事&#xff0c;金三刚拿下高薪offer&#xff0c;年薪直奔50万了。 由于现在的公司接触不到新技术&#xff0c;对自…

九龙证券|地产股突然爆发!李蓓再度公开唱多,北上资金却在减持

李蓓又发声了&#xff0c;继续看好地产股&#xff01; 4月7日&#xff0c;明星私募基金经理李蓓在半夏出资官微发文&#xff0c;就地产职业和地产股出资时机共享了她的最新观念。李蓓以为&#xff0c;地产职业在阅历供应侧变革后&#xff0c;未来在供需层面存在剪刀差&#xff…

金标股份冲刺A股上市:计划募资约6亿元,许光荣为董事长

近日&#xff0c;上海金标文化创意股份有限公司&#xff08;下称“金标股份”&#xff09;递交招股书&#xff0c;准备在深圳证券交易所主板上市。本次冲刺上市&#xff0c;金标股份计划募资5.96亿元&#xff0c;东方证券为其保荐机构。 据招股书介绍&#xff0c;金标股份是一…

股东刚减持,股价却起飞?用Python量化A股解禁数据,利空出尽是利好? | 邢不行

2019年6月11日&#xff0c;宁德时代上市一周年之际&#xff0c;有45%的股票迎来了解禁。 这些由大股东、高管、早期投资者持有的股份&#xff0c;原先无法交易&#xff0c;但从这一天起就可以自由卖出了。 很多人出于对解禁后巨大卖盘的担忧纷纷提前卖出&#xff0c;导致宁德时…

【雅思口语】安娜口语学习记录 Part2

第二部分&#xff1a;主题卡片陈述 在这一部分&#xff0c;考官会递给考生一张主题卡片&#xff0c;卡片上附有问题和相关观点。拿到卡片后&#xff0c;考生有一分钟时间准备&#xff0c;同时可以在提供的草纸上作笔记(不可以拿出考场)。 然后&#xff0c;考生应该就所给的话题…

AI 大牛颜水成,加入智源研究院,正组建一支超强神秘团队

作者 | 李梅 编辑 | 陈彩娴 来源&#xff1a;AI科技评论 据智源研究院消息&#xff0c;近日&#xff0c;原Sea集团首席科学家颜水成已离职&#xff0c;加入智源研究院&#xff0c;任访问首席科学家。 目前&#xff0c;在颜水成的个人主页上&#xff0c;他正发出人才邀请&#x…

人工智能发展月报(2023年4月)

本报告依托科技情报大数据挖掘与服务系统平台AMiner、新闻事件分析挖掘和搜索系统NewsMiner&#xff0c;以及人工智能主流新闻网站及公众号&#xff0c;从AI学术会议、重大科研进展、人物动态、最新报告发布等角度&#xff0c;分析挖掘了每月人工智能领域所发生的、对AI领域技术…

怪异盒模型 border-box 真的“一无是处”吗?

前端Q 我是winty&#xff0c;专注分享前端知识和各类前端资源&#xff0c;乐于分享各种有趣的事&#xff0c;关注我&#xff0c;一起做个有趣的人&#xff5e; 公众号 点击上方 前端Q&#xff0c;关注公众号 回复加群&#xff0c;加入前端Q技术交流群 image.png 我们都知道有两…

机器学习入门篇

AI入门 故事 图灵测试 测试者和被测试者&#xff08;人和机器&#xff09; 隔开&#xff0c;通过一些装置互相随意提问&#xff0c;如果&#xff08;5min内&#xff09;超过30%的测试者不能确定是人还是机器&#xff0c;那么这台机器就认为具有人类智能。 达特茅斯会议-人工智…

Midjourney V5 与 V4 哪个更好?综合评测,Prompt 全公开!

【CSDN 编者按】最近 AI 绘画工具新版本 Midjourney V5 一经发布&#xff0c;便火爆朋友圈&#xff0c;今天我们就来评测一下 V5 与 V4 的区别~ 原文链接&#xff1a;https://medium.com/catmus2048/midjourney-v5-%E6%AF%94-v4-%E6%9B%B4%E5%A5%BD%E5%90%97-%E7%BB%BC%E5%90%8…