open-cd中的changerformer网络结构分析

open-cd

目录

  • open-cd
    • 1.安装
    • 2.源码结构分析
      • 主干网络
      • 1.1 主干网络类
      • 2.neck
      • 2.Decoder
      • 3.测试模型
      • 6. changer主干网络
  • 总结

在这里插入图片描述

该开源库基于:
mmcv
mmseg
mmdet
mmengine

1.安装

在安装过程中遇到的问题:
1.pytorch版本问题,open-cd采用的mmcv版本比较低,建议安装2.3以下版本pytorch,太高了mmcv可能不太适配,先安装pytorch,在安装mmcv,我在安装时用的版本

pytorch                   2.1.2           py3.9_cuda12.1_cudnn8_0    pytorch
mmcv                      2.1.0                    pypi_0    pypi

mmcv安装方式
在这里插入图片描述
该方式同样适用于解决:

note: This error originates from a subprocess, and is likely not a problem with pip.ERROR: Failed building wheel for mmcvRunning setup.py clean for mmcv
Failed to build mmcv
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (mmcv)

参考

之后参照博主的的安装步骤安装opencd的开原文件即可(建议安装源文件,直接 opencd第三方包形式后期不方便调试):

# Install OpenMMLab Toolkits as Python packages
pip install -U openmim
mim install mmengine
mim install "mmpretrain>=1.0.0rc7"
pip install "mmsegmentation>=1.2.2"
pip install "mmdet>=3.0.0"# Install Opencd
git clone https://github.com/likyoo/open-cd.git
cd open-cd
pip install -v -e .

2.源码结构分析

在这里插入图片描述
该库主要的文件时1.config参数文件,2.opencd模型架构文件,3.训练推理分析工具,4.mmlab
这里主要介绍2.opencd模型框架文件,文件下包含:
在这里插入图片描述
其中,model文件夹下包含模型结构基础文件:
在这里插入图片描述
变化检测大致遵循语义分割的编码结构、neck结构、以及解码结构。
如果使用过mmsegmentation机会发现,backbone中存放着雨参数文件对应的主干网络,相应的是neck,decoder,这里changer_detector里面是主要的模型架构如Encoder-Decoder。
open-cd与mmseg这类参数化的文件非常适合进行模型复现或者进行工程化应用;但对一些科研小白,特别是非计算机专业的科研小白需要改进网络就不太友好;这里直观的作用下模型架构的组合使用,方便大家理解和魔改(~~别越看越迷糊就行)
这里以changerformer-mitb0为例:


前提是已完成open-cd的安装官方issue
下面内容摘自opencd/model/
删除每个类前面的注册表装饰器 @MODELS.register_module(),报错提示类已注册

主干网络

# Copyright (c) OpenMMLab. All rights reserved.
import math
import warningsimport torch
import torch.nn as nn
import torch.utils.checkpoint as cp
from mmcv.cnn import Conv2d, build_activation_layer, build_norm_layer
from mmcv.cnn.bricks.drop import build_dropout
from mmcv.cnn.bricks.transformer import MultiheadAttention
from mmengine.model import BaseModule, ModuleList, Sequential
from mmengine.model.weight_init import (constant_init, normal_init,trunc_normal_init)from mmseg.registry import MODELS
## 下面两个依赖在opencd/model/utils中可以找到
from .embed import PatchEmbed
from .shape_convert import  nchw_to_nlc, nlc_to_nchwclass MixFFN(BaseModule):"""An implementation of MixFFN of Segformer.The differences between MixFFN & FFN:1. Use 1X1 Conv to replace Linear layer.2. Introduce 3X3 Conv to encode positional information.Args:embed_dims (int): The feature dimension. Same as`MultiheadAttention`. Defaults: 256.feedforward_channels (int): The hidden dimension of FFNs.Defaults: 1024.act_cfg (dict, optional): The activation config for FFNs.Default: dict(type='ReLU')ffn_drop (float, optional): Probability of an element to bezeroed in FFN. Default 0.0.dropout_layer (obj:`ConfigDict`): The dropout_layer usedwhen adding the shortcut.init_cfg (obj:`mmcv.ConfigDict`): The Config for initialization.Default: None."""def __init__(self,embed_dims,feedforward_channels,act_cfg=dict(type='GELU'),ffn_drop=0.,dropout_layer=None,init_cfg=None):super().__init__(init_cfg)self.embed_dims = embed_dimsself.feedforward_channels = feedforward_channelsself.act_cfg = act_cfgself.activate = build_activation_layer(act_cfg)in_channels = embed_dimsfc1 = Conv2d(in_channels=in_channels,out_channels=feedforward_channels,kernel_size=1,stride=1,bias=True)# 3x3 depth wise conv to provide positional encode informationpe_conv = Conv2d(in_channels=feedforward_channels,out_channels=feedforward_channels,kernel_size=3,stride=1,padding=(3 - 1) // 2,bias=True,groups=feedforward_channels)fc2 = Conv2d(in_channels=feedforward_channels,out_channels=in_channels,kernel_size=1,stride=1,bias=True)drop = nn.Dropout(ffn_drop)layers = [fc1, pe_conv, self.activate, drop, fc2, drop]self.layers = Sequential(*layers)self.dropout_layer = build_dropout(dropout_layer) if dropout_layer else torch.nn.Identity()def forward(self, x, hw_shape, identity=None):out = nlc_to_nchw(x, hw_shape)out = self.layers(out)out = nchw_to_nlc(out)if identity is None:identity = xreturn identity + self.dropout_layer(out)class EfficientMultiheadAttention(MultiheadAttention):"""An implementation of Efficient Multi-head Attention of Segformer.This module is modified from MultiheadAttention which is a module frommmcv.cnn.bricks.transformer.Args:embed_dims (int): The embedding dimension.num_heads (int): Parallel attention heads.attn_drop (float): A Dropout layer on attn_output_weights.Default: 0.0.proj_drop (float): A Dropout layer after `nn.MultiheadAttention`.Default: 0.0.dropout_layer (obj:`ConfigDict`): The dropout_layer usedwhen adding the shortcut. Default: None.init_cfg (obj:`mmcv.ConfigDict`): The Config for initialization.Default: None.batch_first (bool): Key, Query and Value are shape of(batch, n, embed_dim)or (n, batch, embed_dim). Default: False.qkv_bias (bool): enable bias for qkv if True. Default True.norm_cfg (dict): Config dict for normalization layer.Default: dict(type='LN').sr_ratio (int): The ratio of spatial reduction of Efficient Multi-headAttention of Segformer. Default: 1."""def __init__(self,embed_dims,num_heads,attn_drop=0.,proj_drop=0.,dropout_layer=None,init_cfg=None,batch_first=True,qkv_bias=False,norm_cfg=dict(type='LN'),sr_ratio=1):super().__init__(embed_dims,num_heads,attn_drop,proj_drop,dropout_layer=dropout_layer,init_cfg=init_cfg,batch_first=batch_first,bias=qkv_bias)self.sr_ratio = sr_ratioif sr_ratio > 1:self.sr = Conv2d(in_channels=embed_dims,out_channels=embed_dims,kernel_size=sr_ratio,stride=sr_ratio)# The ret[0] of build_norm_layer is norm name.self.norm = build_norm_layer(norm_cfg, embed_dims)[1]# handle the BC-breaking from https://github.com/open-mmlab/mmcv/pull/1418 # noqafrom mmseg import digit_version, mmcv_versionif mmcv_version < digit_version('1.3.17'):warnings.warn('The legacy version of forward function in''EfficientMultiheadAttention is deprecated in''mmcv>=1.3.17 and will no longer support in the''future. Please upgrade your mmcv.')self.forward = self.legacy_forwarddef forward(self, x, hw_shape, identity=None):x_q = xif self.sr_ratio > 1:x_kv = nlc_to_nchw(x, hw_shape)x_kv = self.sr(x_kv)x_kv = nchw_to_nlc(x_kv)x_kv = self.norm(x_kv)else:x_kv = xif identity is None:identity = x_q# Because the dataflow('key', 'query', 'value') of# ``torch.nn.MultiheadAttention`` is (num_query, batch,# embed_dims), We should adjust the shape of dataflow from# batch_first (batch, num_query, embed_dims) to num_query_first# (num_query ,batch, embed_dims), and recover ``attn_output``# from num_query_first to batch_first.if self.batch_first:x_q = x_q.transpose(0, 1)x_kv = x_kv.transpose(0, 1)out = self.attn(query=x_q, key=x_kv, value=x_kv)[0]if self.batch_first:out = out.transpose(0, 1)return identity + self.dropout_layer(self.proj_drop(out))def legacy_forward(self, x, hw_shape, identity=None):"""multi head attention forward in mmcv version < 1.3.17."""x_q = xif self.sr_ratio > 1:x_kv = nlc_to_nchw(x, hw_shape)x_kv = self.sr(x_kv)x_kv = nchw_to_nlc(x_kv)x_kv = self.norm(x_kv)else:x_kv = xif identity is None:identity = x_q# `need_weights=True` will let nn.MultiHeadAttention# `return attn_output, attn_output_weights.sum(dim=1) / num_heads`# The `attn_output_weights.sum(dim=1)` may cause cuda error. So, we set# `need_weights=False` to ignore `attn_output_weights.sum(dim=1)`.# This issue - `https://github.com/pytorch/pytorch/issues/37583` report# the error that large scale tensor sum operation may cause cuda error.out = self.attn(query=x_q, key=x_kv, value=x_kv, need_weights=False)[0]return identity + self.dropout_layer(self.proj_drop(out))class TransformerEncoderLayer(BaseModule):"""Implements one encoder layer in Segformer.Args:embed_dims (int): The feature dimension.num_heads (int): Parallel attention heads.feedforward_channels (int): The hidden dimension for FFNs.drop_rate (float): Probability of an element to be zeroed.after the feed forward layer. Default 0.0.attn_drop_rate (float): The drop out rate for attention layer.Default 0.0.drop_path_rate (float): stochastic depth rate. Default 0.0.qkv_bias (bool): enable bias for qkv if True.Default: True.act_cfg (dict): The activation config for FFNs.Default: dict(type='GELU').norm_cfg (dict): Config dict for normalization layer.Default: dict(type='LN').batch_first (bool): Key, Query and Value are shape of(batch, n, embed_dim)or (n, batch, embed_dim). Default: False.init_cfg (dict, optional): Initialization config dict.Default:None.sr_ratio (int): The ratio of spatial reduction of Efficient Multi-headAttention of Segformer. Default: 1.with_cp (bool): Use checkpoint or not. Using checkpoint will savesome memory while slowing down the training speed. Default: False."""def __init__(self,embed_dims,num_heads,feedforward_channels,drop_rate=0.,attn_drop_rate=0.,drop_path_rate=0.,qkv_bias=True,act_cfg=dict(type='GELU'),norm_cfg=dict(type='LN'),batch_first=True,sr_ratio=1,with_cp=False):super().__init__()# The ret[0] of build_norm_layer is norm name.self.norm1 = build_norm_layer(norm_cfg, embed_dims)[1]self.attn = EfficientMultiheadAttention(embed_dims=embed_dims,num_heads=num_heads,attn_drop=attn_drop_rate,proj_drop=drop_rate,dropout_layer=dict(type='DropPath', drop_prob=drop_path_rate),batch_first=batch_first,qkv_bias=qkv_bias,norm_cfg=norm_cfg,sr_ratio=sr_ratio)# The ret[0] of build_norm_layer is norm name.self.norm2 = build_norm_layer(norm_cfg, embed_dims)[1]self.ffn = MixFFN(embed_dims=embed_dims,feedforward_channels=feedforward_channels,ffn_drop=drop_rate,dropout_layer=dict(type='DropPath', drop_prob=drop_path_rate),act_cfg=act_cfg)self.with_cp = with_cpdef forward(self, x, hw_shape):def _inner_forward(x):x = self.attn(self.norm1(x), hw_shape, identity=x)x = self.ffn(self.norm2(x), hw_shape, identity=x)return xif self.with_cp and x.requires_grad:x = cp.checkpoint(_inner_forward, x)else:x = _inner_forward(x)return x# @MODELS.register_module()
class MixVisionTransformer(BaseModule):"""The backbone of Segformer.This backbone is the implementation of `SegFormer: Simple andEfficient Design for Semantic Segmentation withTransformers <https://arxiv.org/abs/2105.15203>`_.Args:in_channels (int): Number of input channels. Default: 3.embed_dims (int): Embedding dimension. Default: 768.num_stags (int): The num of stages. Default: 4.num_layers (Sequence[int]): The layer number of each transformer encodelayer. Default: [3, 4, 6, 3].num_heads (Sequence[int]): The attention heads of each transformerencode layer. Default: [1, 2, 4, 8].patch_sizes (Sequence[int]): The patch_size of each overlapped patchembedding. Default: [7, 3, 3, 3].strides (Sequence[int]): The stride of each overlapped patch embedding.Default: [4, 2, 2, 2].sr_ratios (Sequence[int]): The spatial reduction rate of eachtransformer encode layer. Default: [8, 4, 2, 1].out_indices (Sequence[int] | int): Output from which stages.Default: (0, 1, 2, 3).mlp_ratio (int): ratio of mlp hidden dim to embedding dim.Default: 4.qkv_bias (bool): Enable bias for qkv if True. Default: True.drop_rate (float): Probability of an element to be zeroed.Default 0.0attn_drop_rate (float): The drop out rate for attention layer.Default 0.0drop_path_rate (float): stochastic depth rate. Default 0.0norm_cfg (dict): Config dict for normalization layer.Default: dict(type='LN')act_cfg (dict): The activation config for FFNs.Default: dict(type='GELU').pretrained (str, optional): model pretrained path. Default: None.init_cfg (dict or list[dict], optional): Initialization config dict.Default: None.with_cp (bool): Use checkpoint or not. Using checkpoint will savesome memory while slowing down the training speed. Default: False."""def __init__(self,in_channels=3,embed_dims=64,num_stages=4,num_layers=[3, 4, 6, 3],num_heads=[1, 2, 4, 8],patch_sizes=[7, 3, 3, 3],strides=[4, 2, 2, 2],sr_ratios=[8, 4, 2, 1],out_indices=(0, 1, 2, 3),mlp_ratio=4,qkv_bias=True,drop_rate=0.,attn_drop_rate=0.,drop_path_rate=0.,act_cfg=dict(type='GELU'),norm_cfg=dict(type='LN', eps=1e-6),pretrained=None,init_cfg=None,with_cp=False):super().__init__(init_cfg=init_cfg)assert not (init_cfg and pretrained), \'init_cfg and pretrained cannot be set at the same time'if isinstance(pretrained, str):warnings.warn('DeprecationWarning: pretrained is deprecated, ''please use "init_cfg" instead')self.init_cfg = dict(type='Pretrained', checkpoint=pretrained)elif pretrained is not None:raise TypeError('pretrained must be a str or None')self.embed_dims = embed_dimsself.num_stages = num_stagesself.num_layers = num_layersself.num_heads = num_headsself.patch_sizes = patch_sizesself.strides = stridesself.sr_ratios = sr_ratiosself.with_cp = with_cpassert num_stages == len(num_layers) == len(num_heads) \== len(patch_sizes) == len(strides) == len(sr_ratios)self.out_indices = out_indicesassert max(out_indices) < self.num_stages# transformer encoderdpr = [x.item()for x in torch.linspace(0, drop_path_rate, sum(num_layers))]  # stochastic num_layer decay rulecur = 0self.layers = ModuleList()for i, num_layer in enumerate(num_layers):embed_dims_i = embed_dims * num_heads[i]patch_embed = PatchEmbed(in_channels=in_channels,embed_dims=embed_dims_i,kernel_size=patch_sizes[i],stride=strides[i],padding=patch_sizes[i] // 2,norm_cfg=norm_cfg)layer = ModuleList([TransformerEncoderLayer(embed_dims=embed_dims_i,num_heads=num_heads[i],feedforward_channels=mlp_ratio * embed_dims_i,drop_rate=drop_rate,attn_drop_rate=attn_drop_rate,drop_path_rate=dpr[cur + idx],qkv_bias=qkv_bias,act_cfg=act_cfg,norm_cfg=norm_cfg,with_cp=with_cp,sr_ratio=sr_ratios[i]) for idx in range(num_layer)])in_channels = embed_dims_i# The ret[0] of build_norm_layer is norm name.norm = build_norm_layer(norm_cfg, embed_dims_i)[1]self.layers.append(ModuleList([patch_embed, layer, norm]))cur += num_layerdef init_weights(self):if self.init_cfg is None:for m in self.modules():if isinstance(m, nn.Linear):trunc_normal_init(m, std=.02, bias=0.)elif isinstance(m, nn.LayerNorm):constant_init(m, val=1.0, bias=0.)elif isinstance(m, nn.Conv2d):fan_out = m.kernel_size[0] * m.kernel_size[1] * m.out_channelsfan_out //= m.groupsnormal_init(m, mean=0, std=math.sqrt(2.0 / fan_out), bias=0)else:super().init_weights()def forward(self, x):outs = []for i, layer in enumerate(self.layers):x, hw_shape = layer[0](x)for block in layer[1]:x = block(x, hw_shape)x = layer[2](x)x = nlc_to_nchw(x, hw_shape)if i in self.out_indices:outs.append(x)return outs

1.1 主干网络类

from torch import nn
from CDmodel.ly_utils.MixVisionTransformer import MixVisionTransformerclass backbone(nn.Module):def __init__(self):super(backbone,self).__init__()self.model = MixVisionTransformer(in_channels=3,embed_dims=32,num_stages=4,num_layers=[2, 2, 2, 2],num_heads=[1, 2, 5, 8],patch_sizes=[7, 3, 3, 3],sr_ratios=[8, 4, 2, 1],out_indices=(0, 1, 2, 3),mlp_ratio=4,qkv_bias=True,drop_rate=0.0,attn_drop_rate=0.0,drop_path_rate=0.1)def forward(self,x1):return self.model.forward(x1)

2.neck

# Copyright (c) Open-CD. All rights reserved.
import torch
from torch import nn
from opencd.registry import MODELSclass FeatureFusionNeck(nn.Module):"""Feature Fusion Neck.Args:policy (str): The operation to fuse features. candidatesare `concat`, `sum`, `diff` and `Lp_distance`.in_channels (Sequence(int)): Input channels.channels (int): Channels after modules, before conv_seg.out_indices (tuple[int]): Output from which layer."""def __init__(self,policy = 'concat',in_channels=None,channels=None,out_indices=(0, 1, 2, 3)):super(FeatureFusionNeck,self).__init__()self.policy = policyself.in_channels = in_channelsself.channels = channelsself.out_indices = out_indices@staticmethoddef fusion(x1, x2, policy):"""Specify the form of feature fusion"""_fusion_policies = ['concat', 'sum', 'diff', 'abs_diff']assert policy in _fusion_policies, 'The fusion policies {} are ' \'supported'.format(_fusion_policies)if policy == 'concat':x = torch.cat([x1, x2], dim=1)elif policy == 'sum':x = x1 + x2elif policy == 'diff':x = x2 - x1elif policy == 'abs_diff':x = torch.abs(x1 - x2)return xdef forward(self, x1, x2):"""Forward function."""assert len(x1) == len(x2), "The features x1 and x2 from the" \"backbone should be of equal length"outs = []for i in range(len(x1)):out = self.fusion(x1[i], x2[i], self.policy)outs.append(out)outs = [outs[i] for i in self.out_indices]return tuple(outs)

2.Decoder

changerformer解码头采用的是’mmseg.SegformerHead’
这个得在mmsegmentation/mmseg/model/head里面找
找到后,删除损失函数、预测推理保留下面部分内容

# Copyright (c) OpenMMLab. All rights reserved.
import warnings
from abc import ABCMeta, abstractmethod
from typing import List, Tuple
from mmcv.cnn import ConvModule
import torch
import torch.nn as nn
from mmengine.model import BaseModule
from torch import Tensorfrom mmseg.registry import MODELS
from mmseg.structures import build_pixel_sampler
from mmseg.utils import ConfigType, SampleList
# from ..losses import accuracy
from layer.resize import resizeclass BaseDecodeHead(nn.Module):"""Base class for BaseDecodeHead.1. The ``init_weights`` method is used to initialize decode_head'smodel parameters. After segmentor initialization, ``init_weights``is triggered when ``segmentor.init_weights()`` is called externally.2. The ``loss`` method is used to calculate the loss of decode_head,which includes two steps: (1) the decode_head model performs forwardpropagation to obtain the feature maps (2) The ``loss_by_feat`` methodis called based on the feature maps to calculate the loss... code:: textloss(): forward() -> loss_by_feat()3. The ``predict`` method is used to predict segmentation results,which includes two steps: (1) the decode_head model performs forwardpropagation to obtain the feature maps (2) The ``predict_by_feat`` methodis called based on the feature maps to predict segmentation resultsincluding post-processing... code:: textpredict(): forward() -> predict_by_feat()Args:in_channels (int|Sequence[int]): Input channels.channels (int): Channels after modules, before conv_seg.num_classes (int): Number of classes.out_channels (int): Output channels of conv_seg. Default: None.threshold (float): Threshold for binary segmentation in the case of`num_classes==1`. Default: None.dropout_ratio (float): Ratio of dropout layer. Default: 0.1.conv_cfg (dict|None): Config of conv layers. Default: None.norm_cfg (dict|None): Config of norm layers. Default: None.act_cfg (dict): Config of activation layers.Default: dict(type='ReLU')in_index (int|Sequence[int]): Input feature index. Default: -1ignore_index (int | None): The label index to be ignored. When usingmasked BCE loss, ignore_index should be set to None. Default: 255.align_corners (bool): align_corners argument of F.interpolate.Default: False.The all mlp Head of segformer.This head is the implementation of`Segformer <https://arxiv.org/abs/2105.15203>` _.Args:interpolate_mode: The interpolate mode of MLP head upsample operation.Default: 'bilinear'."""def __init__(self,in_channels = [v * 2 for v in [32, 64, 160, 256]],channels = 256,num_classes = 2,out_channels=None,threshold=None,dropout_ratio=0.1,conv_cfg=None,norm_cfg=dict(type='SyncBN', requires_grad=True),act_cfg=dict(type='ReLU'),in_index=[0, 1, 2, 3],input_transform='multiple_select',ignore_index=255,interpolate_mode='bilinear',align_corners=False,):super(BaseDecodeHead,self).__init__()# self._init_inputs(in_channels, in_index, input_transform)self.in_channels = in_channelsself.channels = channelsself.dropout_ratio = dropout_ratioself.conv_cfg = conv_cfgself.norm_cfg = norm_cfgself.act_cfg = act_cfgself.in_index = in_indexself.input_transform = input_transformself.ignore_index = ignore_indexself.align_corners = align_cornersself.interpolate_mode = interpolate_modenum_inputs = len(self.in_channels)assert num_inputs == len(self.in_index)self.convs = nn.ModuleList()self.convs = nn.ModuleList()for i in range(num_inputs):self.convs.append(ConvModule(in_channels=self.in_channels[i],out_channels=self.channels,kernel_size=1,stride=1,norm_cfg=self.norm_cfg,act_cfg=self.act_cfg))self.fusion_conv = ConvModule(in_channels=self.channels * num_inputs,out_channels=self.channels,kernel_size=1,norm_cfg=self.norm_cfg)if out_channels is None:if num_classes == 2:warnings.warn('For binary segmentation, we suggest using''`out_channels = 1` to define the output''channels of segmentor, and use `threshold`''to convert `seg_logits` into a prediction''applying a threshold')out_channels = num_classesif out_channels != num_classes and out_channels != 1:raise ValueError('out_channels should be equal to num_classes,''except binary segmentation set out_channels == 1 and'f'num_classes == 2, but got out_channels={out_channels}'f'and num_classes={num_classes}')if out_channels == 1 and threshold is None:threshold = 0.3warnings.warn('threshold is not defined for binary, and defaults''to 0.3')self.num_classes = num_classesself.out_channels = out_channelsself.threshold = thresholdself.conv_seg = nn.Conv2d(channels, self.out_channels, kernel_size=1)if dropout_ratio > 0:self.dropout = nn.Dropout2d(dropout_ratio)else:self.dropout = Nonedef _transform_inputs(self, inputs):"""Transform inputs for decoder.Args:inputs (list[Tensor]): List of multi-level img features.Returns:Tensor: The transformed inputs"""if self.input_transform == 'resize_concat':inputs = [inputs[i] for i in self.in_index]upsampled_inputs = [resize(input=x,size=inputs[0].shape[2:],mode='bilinear',align_corners=self.align_corners) for x in inputs]inputs = torch.cat(upsampled_inputs, dim=1)elif self.input_transform == 'multiple_select':inputs = [inputs[i] for i in self.in_index]else:inputs = inputs[self.in_index]return inputs@abstractmethoddef forward(self, inputs):"""Placeholder of forward function."""# Receive 4 stage backbone feature map: 1/4, 1/8, 1/16, 1/32inputs = self._transform_inputs(inputs)outs = []for idx in range(len(inputs)):x = inputs[idx]conv = self.convs[idx]outs.append(resize(input=conv(x),size=inputs[0].shape[2:],mode=self.interpolate_mode,align_corners=self.align_corners))out = self.fusion_conv(torch.cat(outs, dim=1))out = self.cls_seg(out)return outdef cls_seg(self, feat):"""Classify each pixel."""if self.dropout is not None:feat = self.dropout(feat)output = self.conv_seg(feat)return output

3.测试模型


import torch
from torch import nn
# from torchsummary import summary
from torchinfo import summaryfrom backbone1 import backbone
from neck import FeatureFusionNeck
from head import BaseDecodeHeadclass siamencoderdecoder(nn.Module):def __init__(self):super(siamencoderdecoder, self).__init__()self.backbone = backbone()# self.backbone2 = backbone()self.neck = FeatureFusionNeck()self.head = BaseDecodeHead()def backboneforward(self,x1,x2):# 孪生主干网络特征提取x1,x2 = self.backbone(x1),self.backbone(x2)return x1,x2def forward(self,x1,x2):x1,x2 = self.backboneforward(x1,x2)x = self.neck(x1,x2)logit = self.head(x)return logitif __name__ == "__main__":inputs = torch.rand(3, 224, 224)model = siamencoderdecoder()device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')model.to(device)summary(model,input_size=[(1,3,256,256),(1,3,256,256)])

summary

========================================================================================================================
Layer (type:depth-idx)                                                 Output Shape              Param #
========================================================================================================================
siamencoderdecoder                                                     [1, 2, 64, 64]            --
├─backbone: 1-1                                                        --                        (recursive)
│    └─MixVisionTransformer: 2-2                                       --                        (recursive)
│    │    └─ModuleList: 3-2                                            --                        (recursive)
├─backbone: 1-2                                                        [1, 32, 64, 64]           3,319,392
│    └─MixVisionTransformer: 2-2                                       --                        (recursive)
│    │    └─ModuleList: 3-2                                            --                        (recursive)
├─FeatureFusionNeck: 1-3                                               [1, 64, 64, 64]           --
├─BaseDecodeHead: 1-4                                                  [1, 2, 64, 64]            --
│    └─ModuleList: 2-3                                                 --                        --
│    │    └─ConvModule: 3-3                                            [1, 256, 64, 64]          16,896
│    │    └─ConvModule: 3-4                                            [1, 256, 32, 32]          33,280
│    │    └─ConvModule: 3-5                                            [1, 256, 16, 16]          82,432
│    │    └─ConvModule: 3-6                                            [1, 256, 8, 8]            131,584
│    └─ConvModule: 2-4                                                 [1, 256, 64, 64]          --
│    │    └─Conv2d: 3-7                                                [1, 256, 64, 64]          262,144
│    │    └─SyncBatchNorm: 3-8                                         [1, 256, 64, 64]          512
│    │    └─ReLU: 3-9                                                  [1, 256, 64, 64]          --
│    └─Dropout2d: 2-5                                                  [1, 256, 64, 64]          --
│    └─Conv2d: 2-6                                                     [1, 2, 64, 64]            514
========================================================================================================================
Total params: 7,166,146
Trainable params: 7,166,146
Non-trainable params: 0
Total mult-adds (G): 2.09
========================================================================================================================
Input size (MB): 1.57
Forward/backward pass size (MB): 141.75
Params size (MB): 12.29
Estimated Total Size (MB): 155.62
========================================================================================================================Process finished with exit code 0

6. changer主干网络

opencd作者提了一个主干网络特征交换的idea,通过继承ResNet实现的网络结构,并在主网络中给出了使用实例:

Example:
# >>> from opencd.models import IA_ResNet
# >>> import torch
# >>> self = IA_ResNet(depth=18)
# >>> self.eval()
# >>> inputs = torch.rand(1, 3, 32, 32)
# >>> level_outputs = self.forward(inputs, inputs)
# >>> for level_out in level_outputs:
# … print(tuple(level_out.shape))
(1, 128, 8, 8)
(1, 256, 4, 4)
(1, 512, 2, 2)
(1, 1024, 1, 1)
“”"

# Copyright (c) Open-CD. All rights reserved.
import torch
import torch.nn as nnfrom mmseg.models.backbones import ResNet
from opencd.registry import MODELS# @MODELS.register_module()
class IA_ResNet(ResNet):"""Interaction ResNet backbone.Args:interaction_cfg (Sequence[dict]): Interaction strategies for the stages.The length should be the same as `num_stages`. The details can befound in `opencd/models/ly_utils/interaction_layer.py`.Default: (None, None, None, None).depth (int): Depth of resnet, from {18, 34, 50, 101, 152}.in_channels (int): Number of input image channels. Default: 3.stem_channels (int): Number of stem channels. Default: 64.base_channels (int): Number of base channels of res layer. Default: 64.num_stages (int): Resnet stages, normally 4. Default: 4.strides (Sequence[int]): Strides of the first block of each stage.Default: (1, 2, 2, 2).dilations (Sequence[int]): Dilation of each stage.Default: (1, 1, 1, 1).out_indices (Sequence[int]): Output from which stages.Default: (0, 1, 2, 3).style (str): `pytorch` or `caffe`. If set to "pytorch", the stride-twolayer is the 3x3 conv layer, otherwise the stride-two layer isthe first 1x1 conv layer. Default: 'pytorch'.deep_stem (bool): Replace 7x7 conv in input stem with 3 3x3 conv.Default: False.avg_down (bool): Use AvgPool instead of stride conv whendownsampling in the bottleneck. Default: False.frozen_stages (int): Stages to be frozen (stop grad and set eval mode).-1 means not freezing any parameters. Default: -1.conv_cfg (dict | None): Dictionary to construct and config conv layer.When conv_cfg is None, cfg will be set to dict(type='Conv2d').Default: None.norm_cfg (dict): Dictionary to construct and config norm layer.Default: dict(type='BN', requires_grad=True).norm_eval (bool): Whether to set norm layers to eval mode, namely,freeze running stats (mean and var). Note: Effect on Batch Normand its variants only. Default: False.dcn (dict | None): Dictionary to construct and config DCN conv layer.When dcn is not None, conv_cfg must be None. Default: None.stage_with_dcn (Sequence[bool]): Whether to set DCN conv for eachstage. The length of stage_with_dcn is equal to num_stages.Default: (False, False, False, False).plugins (list[dict]): List of plugins for stages, each dict contains:- cfg (dict, required): Cfg dict to build plugin.- position (str, required): Position inside block to insert plugin,options: 'after_conv1', 'after_conv2', 'after_conv3'.- stages (tuple[bool], optional): Stages to apply plugin, lengthshould be same as 'num_stages'.Default: None.multi_grid (Sequence[int]|None): Multi grid dilation rates of laststage. Default: None.contract_dilation (bool): Whether contract first dilation of each layerDefault: False.with_cp (bool): Use checkpoint or not. Using checkpoint will save somememory while slowing down the training speed. Default: False.zero_init_residual (bool): Whether to use zero init for last norm layerin resblocks to let them behave as identity. Default: True.pretrained (str, optional): model pretrained path. Default: None.init_cfg (dict or list[dict], optional): Initialization config dict.Default: None.Example:# >>> from opencd.models import IA_ResNet# >>> import torch# >>> self = IA_ResNet(depth=18)# >>> self.eval()# >>> inputs = torch.rand(1, 3, 32, 32)# >>> level_outputs = self.forward(inputs, inputs)# >>> for level_out in level_outputs:# ...     print(tuple(level_out.shape))(1, 128, 8, 8)(1, 256, 4, 4)(1, 512, 2, 2)(1, 1024, 1, 1)"""def __init__(self,interaction_cfg=(None, None, None, None),**kwargs):super().__init__(**kwargs)assert self.num_stages == len(interaction_cfg), \'The length of the `interaction_cfg` should be same as the `num_stages`.'# cross-correlationself.ccs = []for ia_cfg in interaction_cfg:if ia_cfg is None:ia_cfg = dict(type='TwoIdentity')self.ccs.append(MODELS.build(ia_cfg))self.ccs = nn.ModuleList(self.ccs)def forward(self, x1, x2):"""Forward function."""def _stem_forward(x):if self.deep_stem:x = self.stem(x)else:x = self.conv1(x)x = self.norm1(x)x = self.relu(x)x = self.maxpool(x)return xx1 = _stem_forward(x1)x2 = _stem_forward(x2)outs = []for i, layer_name in enumerate(self.res_layers):res_layer = getattr(self, layer_name)x1 = res_layer(x1)x2 = res_layer(x2)x1, x2 = self.ccs[i](x1, x2)if i in self.out_indices:outs.append(torch.cat([x1, x2], dim=1))return tuple(outs)# @MODELS.register_module()
class IA_ResNetV1c(IA_ResNet):"""ResNetV1c variant described in [1]_.Compared with default ResNet(ResNetV1b), ResNetV1c replaces the 7x7 conv inthe input stem with three 3x3 convs. For more details please refer to `Bagof Tricks for Image Classification with Convolutional Neural Networks<https://arxiv.org/abs/1812.01187>`_."""def __init__(self, **kwargs):super(IA_ResNetV1c, self).__init__(deep_stem=True, avg_down=False, **kwargs)# @MODELS.register_module()
class IA_ResNetV1d(IA_ResNet):"""ResNetV1d variant described in [1]_.Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv inthe input stem with three 3x3 convs. And in the downsampling block, a 2x2avg_pool with stride 2 is added before conv, whose stride is changed to 1."""def __init__(self, **kwargs):super(IA_ResNetV1d, self).__init__(deep_stem=True, avg_down=True, **kwargs)

下面在上述环境下我们来调用一下

import torch
from IA_ResNet import IA_ResNetV1c
backbone = IA_ResNetV1c(depth=18)
backbone.eval()
inputs = torch.rand(1, 3, 256, 256)
level_outputs = backbone.forward(inputs, inputs)
for level_out in level_outputs:print(tuple(level_out.shape))
# output:
(1, 128, 64, 64)
(1, 256, 32, 32)
(1, 512, 16, 16)
(1, 1024, 8, 8)

总结

通过上述内容,我们可以根据参数文件中的内容提取opencd中任意网络结构,或采用timm来设置主干网络结构,或添加到自己的训练框架中如pytorch_segmentation中进行训练。相应的,我们可以进一步去学习mmalb的框架结构

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/450548.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Axure重要元件一——动态面板

亲爱的小伙伴&#xff0c;在您浏览之前&#xff0c;烦请关注一下&#xff0c;在此深表感谢&#xff01; 本节课&#xff1a;动态面板 课程内容&#xff1a;认识动态面板、动态面板基本操作 应用场景&#xff1a;特定窗口、重要交互、长页面、容器等 一、认识动态面板 动态…

flutter TabBar自定义指示器(带文字的指示器、上弦弧形指示器、条形背景指示器、渐变色的指示器)

带文字的TabBar指示器 1.绘制自定义TabBar的绿色带白色文字的指示器 2.将底部灰色文字与TabrBar层叠&#xff0c;并调整高度位置与胶囊指示器重叠 自定义的带文字的TabBar指示器 import package:atui/jade/utils/JadeColors.dart; import package:flutter/material.dart; im…

用户界面设计:视觉美学与交互逻辑的融合

1、什么是用户界面 用户界面&#xff08;UI&#xff09;是人与机器之间沟通的桥梁&#xff0c;同时也是用户体验&#xff08;UX&#xff09;的重要组成部分。用户界面设计包括两个核心要素&#xff1a;视觉设计&#xff08;即产品的外观和感觉&#xff09;和交互设计&#xff…

【JavaEE初阶】深入理解TCP协议中的封装分用以及UDP和TCP在网络编程的区别

前言 &#x1f31f;&#x1f31f;本期讲解关于TCP/UDP协议的原理理解~~~ &#x1f308;上期博客在这里&#xff1a;【JavaEE初阶】入门视角-网络原理的基础理论的了解-CSDN博客 &#x1f308;感兴趣的小伙伴看一看小编主页&#xff1a;GGBondlctrl-CSDN博客 &#x1f525; …

Android Framework AMS(09)service组件分析-3(bindService和unbindService关键流程分析)

该系列文章总纲链接&#xff1a;专题总纲目录 Android Framework 总纲 本章关键点总结 & 说明&#xff1a; 说明&#xff1a;上上一章节主要解读应用层service组件启动的2种方式startService和bindService&#xff0c;以及从APP层到AMS调用之间的打通。上一章节我们关注了s…

K-means 算法、层次聚类、密度聚类对鸢尾花(Iris)数据进行聚类

目录 1.基础知识 1.1 K-Means 算法 1.2 层次聚类&#xff08;Hierarchical Clustering&#xff09; 1.3 密度聚类&#xff08;DBSCAN&#xff09; 1.4 距离和相似度度量方法 1.5 总结&#xff1a; 2.K-means 算法对鸢尾花&#xff08;Iris&#xff09;数据进行聚类 2.1…

【动手学电机驱动】TI InstaSPIN-FOC(5)Lab04 电机力矩闭环控制

TI InstaSPIN-FOC&#xff08;1&#xff09;电机驱动和控制测试平台 TI InstaSPIN-FOC&#xff08;2&#xff09;Lab01 闪灯实验 TI InstaSPIN-FOC&#xff08;3&#xff09;Lab03a 测量电压电流漂移量 TI InstaSPIN-FOC&#xff08;4&#xff09;Lab02b 电机参数辨识 TI Insta…

智慧供排水管网在线监测为城市安全保驾护航

一、方案背景 随着城市化进程的不断推进&#xff0c;城市供排水管网作为城市基础设施的关键组成部分&#xff0c;其安全稳定的运行对于确保城市居民的日常生活、工业生产活动以及整个生态环境的健康具有至关重要的作用。近年来&#xff0c;由于各种原因&#xff0c;城市供排水管…

Mycat 详细介绍及入门实战,解决数据库性能问题

一、基本原理 1、数据分片 &#xff08;1&#xff09;、水平分片 Mycat 将一个大表的数据按照一定的规则拆分成多个小表&#xff0c;分布在不同的数据库节点上。例如&#xff0c;可以根据某个字段的值进行哈希取模&#xff0c;将数据均匀的分布到不同的节点上。 这样做的好处…

安卓开发中轮播图和其指示器的设置

在安卓开发中&#xff0c;轮播图&#xff08;Carousel&#xff09;是一种常见的UI组件&#xff0c;用于展示一系列图片或内容&#xff0c;用户可以左右滑动来切换不同的视图。轮播图通常用于展示广告、新闻、产品图片等。 轮播图的指示器&#xff08;Indicator&#xff09;则是…

k3s安装指定版本以及离线安装(docker)

首先下载你所需要版本的k3s安装包&#xff0c;目录结构如下所示&#xff0c;我这里是v1.19.15k3s2。 1.首先赋予可执行权限后进行安装。 # k3s 需要赋予可执行权限 sudo chmod x k3s sudo chmod x k3s-install.sh2.然后将k3s的二进制文件复制到/usr/local/bin/ cp k3s /us…

【Kafka】Kafka源码解析之producer过程解读

从本篇开始 打算用三篇文章 分别介绍下Producer生产消费&#xff0c;Consumer消费消息 以及Spring是如何集成Kafka 三部分&#xff0c;致于对于Broker的源码解析&#xff0c;因为是scala语言写的&#xff0c;暂时不打算进行学习分享。 总体介绍 clients : 保存的是Kafka客户端…

华为携手竹云发布海外一网通办解决方案,助力海外政务数智化发展

10月14日&#xff0c;第44届GITEX GLOBAL展会&#xff08;GITEX GLOBAL 2024&#xff09;在迪拜世界贸易中心盛大开幕。作为全球最具影响力的科技和创业盛会之一&#xff0c;本届活动吸引180多个国家的6500余家全球知名企业集聚迪拜&#xff0c;展示涵盖人工智能、网络安全、移…

【Linux】解答:为什么创建目录文件,硬链接数是2;创建普通文件时,硬链接数是1?(超详细图文)

前言 大家好吖&#xff0c;欢迎来到 YY 滴Linux系列 &#xff0c;热烈欢迎&#xff01; 本章主要内容面向接触过C的老铁 主要内容含&#xff1a; 欢迎订阅 YY滴C专栏&#xff01;更多干货持续更新&#xff01;以下是传送门&#xff01; YY的《C》专栏YY的《C11》专栏YY的《Lin…

spring boot热部署

使用热部署解决了每次都需要重新启动的问题&#xff0c;但不过热部署的在对于改动比较小时速度可能快一些&#xff0c;改动大的话尽量停止 1.使用热部署之前需要在pom.xml文件中导入依赖 <dependency><groupId>org.springframework.boot</groupId><artifa…

DS链式二叉树的遍历(11)

文章目录 前言一、链式二叉树的结构结构定义手动搭建 二、二叉树的遍历三种常见遍历(前序、中序、后序)层序遍历 总结 前言 堆是特殊的二叉树&#xff0c;可二叉树本身也很值得研究~   正文开始&#xff01; 一、链式二叉树的结构 前文也提到了二叉树一共有两种&#xff0c;空…

人工智能创造出大量新型蛋白质

每周跟踪AI热点新闻动向和震撼发展 想要探索生成式人工智能的前沿进展吗&#xff1f;订阅我们的简报&#xff0c;深入解析最新的技术突破、实际应用案例和未来的趋势。与全球数同行一同&#xff0c;从行业内部的深度分析和实用指南中受益。不要错过这个机会&#xff0c;成为AI领…

【线性回归分析】:基于实验数据的模型构建与可视化

目录 线性回归分析&#xff1a;基于实验数据的模型构建与可视化 1. 数据准备 2. 构建线性回归模型 3. 可视化 数据分析的核心 构建预测模型 应用场景 预测模型中的挑战 结论 线性回归分析&#xff1a;基于实验数据的模型构建与可视化 在数据分析领域&#xff0c;线性…

《拿下奇怪的前端报错》:1比特丢失导致的音视频播放时长无限增长-浅析http分片传输核心和一个坑点

问题背景 在一个使用MongoDB GridFS实现文件存储和分片读取的项目中&#xff0c;同事遇到了一个令人困惑的问题&#xff1a;音频文件总是丢失最后几秒&#xff0c;视频文件也出现类似情况。更奇怪的是&#xff0c;播放器显示的总时长为无限大。这个问题困扰了团队成员几天&…

wps安装教程

WPS office完整版是一款由金山推出的免费办公软件&#xff0c;软件小巧安装快&#xff0c;占用内存极小&#xff0c;启动速度快。WPS office完整版包含WPS文字、WPS表格、WPS演示三大功能模块&#xff0c;让我们轻松办公。WPS的功能是依据OFFICE用户的使用习惯而设计&#xff0…