R(2+1)D理解与MindSpore框架下的实现

一、R(2+1)D算法原理介绍

论文地址:[1711.11248] A Closer Look at Spatiotemporal Convolutions for Action Recognition (arxiv.org)

Tran等人在2018年发表在CVPR 的文章《A Closer Look at Spatiotemporal Convolutions for Action Recognition》提出了R(2+1)D,表明将三位卷积核分解为独立的空间和时间分量可以显著提高精度,R(2+1)D中的卷积模块将 N × t × d × d N \times t \times d \times d N×t×d×d 的3D卷积拆分为 N × 1 × d × d N \times 1 \times d \times d N×1×d×d 的2D空间卷积和 M × t × 1 × 1 M \times t \times 1 \times 1 M×t×1×1 的1D时间卷积,其中N和M为卷积核的个数,超参数M决定了信号在空间卷积和时间卷积之间投影的中间子空间的维数,论文中将M的值设置为:
M i = ⌊ t d 2 N i − 1 N i d 2 N i − 1 + t N i ⌋ M_{i}= \left \lfloor \frac{td^{2}N_{i-1}N_{i}}{d^{2}N_{i-1}+tN_{i}} \right \rfloor Mi=d2Ni1+tNitd2Ni1Ni

i表示残差网络中第i个卷积块,通过这种方式以保证(2+1)D模块中的参数量近似于3D卷积的参数量。

在这里插入图片描述
与全三维卷积相比,(2+1)D分解有两个优点,首先,尽管没有改变参数的数量,但由于每个块中2D和1D卷积之间的额外激活函数,网络中的非线性数量增加了一倍,非线性数量的增加了可以表示的函数的复杂性。第二个好处在于,将3D卷积强制转换为单独的空间和时间分量,使优化变得更容易,这表现在与相同参数量的3D卷积网络相比,(2+1)D网络的训练误差更低。

下表展示了18层和34层的R3D网络的架构,在R3D中,使用(2+1)D卷积代替3D卷积就能得到对应层数的R(2+1)D网络。

在这里插入图片描述
实验部分在Kinetics 上比较了不同形式的卷积的动作识别准确性,如下表所示。所有模型都基于 ResNet18,并在 8 帧或 16 帧剪辑输入上从头开始训练,结果表明R(2+1)D 的精度优于所有其他模型。

在这里插入图片描述
在Kinetics上与sota方法比较的结果如下表所示。当在 RGB 输入上从头开始训练时,R(2+1)D 比 I3D 高出 4.5%,在 Sports-1M 上预训练的 R(2+1)D 也比在 ImageNet 上预训练的 I3D 高 2.2%。

在这里插入图片描述

二、R(2+1)D的mindspore代码实现

功能函数说明

数据预处理

  1. 使用GeneratorDataset读取了视频数据集文件,输出batch_size=16的指定帧数的三通道图片。

  2. 数据前处理包括混洗、归一化。

  3. 数据增强包括video_random_crop类实现的随机裁剪、video_resize类实现的调整大小、video_random_horizontal_flip实现的随机水平翻转。

模型主干

  1. R2Plus1d18中,输入首先经过一个 (2+1)D卷积模块,经过一个最大池化层,之后通过4个由(2+1)D卷积模块组成的residual block,再经过平均池化层、展平层最后到全连接层。

  2. 最先的(2+1)D卷积模块具体为卷积核大小为(1,7,7)的Conv3d再接一个卷积核大小为(3,1,1)的Conv3d,卷积层之间是Batch Normalization和Relu层。

  3. R2Plus1d18中包含4个residual block,每个block在模型中都堆叠两次,同时每个block都由两个(2+1)D卷积模块组成,每个(2+1)D卷积都由一个卷积核大小为(1,3,3)的Conv3d再接一个卷积核大小为(3,1,1)的Conv3d组成,卷积层之间仍然是Batch Normalization和Relu层,block的输入和输出之间是残差连接的结构。

具体模型搭建中各个类的作用为:

  • Unit3D类实现了输入经过Conv3d、BN、Relu、Pooling层的结构,其中BN层、Relu层和Pooling层是可选的。
class Unit3D(nn.Cell):"""Conv3d fused with normalization and activation blocks definition.Args:in_channels (int):  The number of channels of input frame images.out_channels (int):  The number of channels of output frame images.kernel_size (tuple): The size of the conv3d kernel.stride (Union[int, Tuple[int]]): Stride size for the first convolutional layer. Default: 1.pad_mode (str): Specifies padding mode. The optional values are "same", "valid", "pad".Default: "pad".padding (Union[int, Tuple[int]]): Implicit paddings on both sides of the input x.If `pad_mode` is "pad" and `padding` is not specified by user, then the paddingsize will be `(kernel_size - 1) // 2` for C, H, W channel.dilation (Union[int, Tuple[int]]): Specifies the dilation rate to use for dilatedconvolution. Default: 1group (int): Splits filter into groups, in_channels and out_channels must be divisibleby the number of groups. Default: 1.activation (Optional[nn.Cell]): Activation function which will be stacked on top of thenormalization layer (if not None), otherwise on top of the conv layer. Default: nn.ReLU.norm (Optional[nn.Cell]): Norm layer that will be stacked on top of the convolutionlayer. Default: nn.BatchNorm3d.pooling (Optional[nn.Cell]): Pooling layer (if not None) will be stacked on top of all theformer layers. Default: None.has_bias (bool): Whether to use Bias.Returns:Tensor, output tensor.Examples:Unit3D(in_channels=in_channels, out_channels=out_channels[0], kernel_size=(1, 1, 1))"""def __init__(self,in_channels: int,out_channels: int,kernel_size: Union[int, Tuple[int]] = 3,stride: Union[int, Tuple[int]] = 1,pad_mode: str = 'pad',padding: Union[int, Tuple[int]] = 0,dilation: Union[int, Tuple[int]] = 1,group: int = 1,activation: Optional[nn.Cell] = nn.ReLU,norm: Optional[nn.Cell] = nn.BatchNorm3d,pooling: Optional[nn.Cell] = None,has_bias: bool = False) -> None:super().__init__()if pad_mode == 'pad' and padding == 0:padding = tuple((k - 1) // 2 for k in six_padding(kernel_size))else:padding = 0layers = [nn.Conv3d(in_channels=in_channels,out_channels=out_channels,kernel_size=kernel_size,stride=stride,pad_mode=pad_mode,padding=padding,dilation=dilation,group=group,has_bias=has_bias)]if norm:layers.append(norm(out_channels))if activation:layers.append(activation())self.pooling = Noneif pooling:self.pooling = poolingself.features = nn.SequentialCell(layers)def construct(self, x):""" construct unit3d"""output = self.features(x)if self.pooling:output = self.pooling(output)return output
  • Inflate3D类使用Unit3D实现了(2+1)D卷积模块。
class Inflate3D(nn.Cell):"""Inflate3D block definition.Args:in_channel (int):  The number of channels of input frame images.out_channel (int):  The number of channels of output frame images.mid_channel (int): The number of channels of inner frame images.kernel_size (tuple): The size of the spatial-temporal convolutional layer kernels.stride (Union[int, Tuple[int]]): Stride size for the second convolutional layer. Default: 1.conv2_group (int): Splits filter into groups for the second conv layer,in_channels and out_channelsmust be divisible by the number of groups. Default: 1.norm (Optional[nn.Cell]): Norm layer that will be stacked on top of the convolutionlayer. Default: nn.BatchNorm3d.activation (List[Optional[Union[nn.Cell, str]]]): Activation function which will be stackedon top of the normalization layer (if not None), otherwise on top of the conv layer.Default: nn.ReLU, None.inflate (int): Whether to inflate two conv3d layers and with different kernel size.Returns:Tensor, output tensor.Examples:>>> from mindvision.msvideo.models.blocks import Inflate3D>>> Inflate3D(3, 64, 64)"""def __init__(self,in_channel: int,out_channel: int,mid_channel: int = 0,stride: tuple = (1, 1, 1),kernel_size: tuple = (3, 3, 3),conv2_group: int = 1,norm: Optional[nn.Cell] = nn.BatchNorm3d,activation: List[Optional[Union[nn.Cell, str]]] = (nn.ReLU, None),inflate: int = 1,):super(Inflate3D, self).__init__()if not norm:norm = nn.BatchNorm3dself.in_channel = in_channelif mid_channel == 0:self.mid_channel = (in_channel * out_channel * kernel_size[1] * kernel_size[2] * 3) // \(in_channel * kernel_size[1] * kernel_size[2] + 3 * out_channel)else:self.mid_channel = mid_channelself.inflate = inflateif self.inflate == 0:conv1_kernel_size = (1, 1, 1)conv2_kernel_size = (1, kernel_size[1], kernel_size[2])elif self.inflate == 1:conv1_kernel_size = (kernel_size[0], 1, 1)conv2_kernel_size = (1, kernel_size[1], kernel_size[2])elif self.inflate == 2:conv1_kernel_size = (1, 1, 1)conv2_kernel_size = (kernel_size[0], kernel_size[1], kernel_size[2])self.conv1 = Unit3D(self.in_channel,self.mid_channel,stride=(1, 1, 1),kernel_size=conv1_kernel_size,norm=norm,activation=activation[0])self.conv2 = Unit3D(self.mid_channel,self.mid_channel,stride=stride,kernel_size=conv2_kernel_size,group=conv2_group,norm=norm,activation=activation[1])def construct(self, x):x = self.conv1(x)x = self.conv2(x)return x
  • Resnet3D类实现了输入经过Unit3D、Max Pooling再接4个residual block的结构,residual block的堆叠数量可以通过参数进行指定。
class ResNet3D(nn.Cell):"""ResNet3D architecture.Args:block (Optional[nn.Cell]): THe block for network.layer_nums (Tuple[int]): The numbers of block in different layers.stage_channels (Tuple[int]): Output channel for every res stage.Default: [64, 128, 256, 512].stage_strides (Tuple[Tuple[int]]): Strides for every res stage.Default:[[1, 1, 1],[1, 2, 2],[1, 2, 2],[1, 2, 2]].group (int): The number of Group convolutions. Default: 1.base_width (int): The width of per group. Default: 64.norm (nn.Cell, optional): The module specifying the normalization layer to use.Default: None.down_sample(nn.Cell, optional): Residual block in every resblock, it can transfer the inputfeature into the same channel of output. Default: Unit3D.kwargs (dict, optional): Key arguments for "make_res_layer" and resblocks.Inputs:- **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, T_{in}, H_{in}, W_{in})`.Outputs:Tensor of shape :math:`(N, 2048, 7, 7, 7)`Supported Platforms:``GPU``Examples:>>> import numpy as np>>> import mindspore as ms>>> from mindvision.msvideo.models.backbones import ResNet3D, ResidualBlock3D>>> net = ResNet(ResidualBlock3D, [3, 4, 23, 3])>>> x = ms.Tensor(np.ones([1, 3, 16, 224, 224]), ms.float32)>>> output = net(x)>>> print(output.shape)(1, 2048, 7, 7)About ResNet:The ResNet is to ease the training of networks that are substantially deeper thanthose used previously.The model explicitly reformulate the layers as learning residual functions withreference to the layer inputs, instead of learning unreferenced functions."""def __init__(self,block: Optional[nn.Cell],layer_nums: Tuple[int],stage_channels: Tuple[int] = (64, 128, 256, 512),stage_strides: Tuple[Tuple[int]] = ((1, 1, 1),(1, 2, 2),(1, 2, 2),(1, 2, 2)),group: int = 1,base_width: int = 64,norm: Optional[nn.Cell] = None,down_sample: Optional[nn.Cell] = Unit3D,**kwargs) -> None:super().__init__()if not norm:norm = nn.BatchNorm3dself.norm = normself.in_channels = stage_channels[0]self.group = groupself.base_with = base_widthself.down_sample = down_sampleself.conv1 = Unit3D(3, self.in_channels, kernel_size=7, stride=2, norm=norm)self.max_pool = ops.MaxPool3D(kernel_size=3, strides=2, pad_mode='same')self.layer1 = self._make_layer(block,stage_channels[0],layer_nums[0],stride=stage_strides[0],norm=self.norm,**kwargs)self.layer2 = self._make_layer(block,stage_channels[1],layer_nums[1],stride=stage_strides[1],norm=self.norm,**kwargs)self.layer3 = self._make_layer(block,stage_channels[2],layer_nums[2],stride=stage_strides[2],norm=self.norm,**kwargs)self.layer4 = self._make_layer(block,stage_channels[3],layer_nums[3],stride=stage_strides[3],norm=self.norm,**kwargs)def _make_layer(self,block: Optional[nn.Cell],channel: int,block_nums: int,stride: Tuple[int] = (1, 2, 2),norm: Optional[nn.Cell] = nn.BatchNorm3d,**kwargs):"""Block layers."""down_sample = Noneif stride[1] != 1 or self.in_channels != channel * block.expansion:down_sample = self.down_sample(self.in_channels,channel * block.expansion,kernel_size=1,stride=stride,norm=norm,activation=None)self.stride = stridebkwargs = [{} for _ in range(block_nums)]  # block specified key word argstemp_args = kwargs.copy()for pname, pvalue in temp_args.items():if isinstance(pvalue, (list, tuple)):Validator.check_equal_int(len(pvalue), block_nums, f'len({pname})')for idx, v in enumerate(pvalue):bkwargs[idx][pname] = vkwargs.pop(pname)layers = []layers.append(block(self.in_channels,channel,stride=self.stride,down_sample=down_sample,group=self.group,base_width=self.base_with,norm=norm,**(bkwargs[0]),**kwargs))self.in_channels = channel * block.expansionfor i in range(1, block_nums):layers.append(block(self.in_channels,channel,stride=(1, 1, 1),group=self.group,base_width=self.base_with,norm=norm,**(bkwargs[i]),**kwargs))return nn.SequentialCell(layers)def construct(self, x):"""Resnet3D construct."""x = self.conv1(x)x = self.max_pool(x)x = self.layer1(x)x = self.layer2(x)x = self.layer3(x)x = self.layer4(x)return x
  • R2Plus1dNet类继承了Resnet3D类,主要是使用了Resnet3D中的4个residual block,实现了输入经过(2+1)D、Max Pooling,再通过4个residual block,最后经过平均池化层、展平层到全连接层的结构。
class R2Plus1dNet(ResNet3D):"""Generic R(2+1)d generator.Args:block (Optional[nn.Cell]): THe block for network.layer_nums (Tuple[int]): The numbers of block in different layers.stage_channels (Tuple[int]): Output channel for every res stage. Default: (64, 128, 256, 512).stage_strides (Tuple[Tuple[int]]): Strides for every res stage.Default:((1, 1, 1),(2, 2, 2),(2, 2, 2),(2, 2, 2).conv12 (nn.Cell, optional): Conv1 and conv2 config in resblock. Default: Conv2Plus1D.base_width (int): The width of per group. Default: 64.norm (nn.Cell, optional): The module specifying the normalization layer to use. Default: None.num_classes(int): Number of categories in the action recognition dataset.keep_prob(float): Dropout probability in classification stage.kwargs (dict, optional): Key arguments for "make_res_layer" and resblocks.Returns:Tensor, output tensor.Examples:>>> from mindvision.msvideo.models.backbones.r2plus1d import *>>> from mindvision.msvideo.models.backbones.resnet3d import ResidualBlockBase3D>>> data = Tensor(np.random.randn(2, 3, 16, 112, 112), dtype=mindspore.float32)>>>>>> net = R2Plus1dNet(block=ResidualBlockBase3D, layer_nums=[2, 2, 2, 2])>>>>>> predict = net(data)>>> print(predict.shape)"""def __init__(self,block: Optional[nn.Cell],layer_nums: Tuple[int],stage_channels: Tuple[int] = (64, 128, 256, 512),stage_strides: Tuple[Tuple[int]] = ((1, 1, 1),(2, 2, 2),(2, 2, 2),(2, 2, 2)),num_classes: int = 400,**kwargs) -> None:super().__init__(block=block,layer_nums=layer_nums,stage_channels=stage_channels,stage_strides=stage_strides,conv12=Conv2Plus1d,**kwargs)self.conv1 = nn.SequentialCell([nn.Conv3d(3, 45,kernel_size=(1, 7, 7),stride=(1, 2, 2),pad_mode='pad',padding=(0, 0, 3, 3, 3, 3),has_bias=False),nn.BatchNorm3d(45),nn.ReLU(),nn.Conv3d(45, 64,kernel_size=(3, 1, 1),stride=(1, 1, 1),pad_mode='pad',padding=(1, 1, 0, 0, 0, 0),has_bias=False),nn.BatchNorm3d(64),nn.ReLU()])self.avgpool = AdaptiveAvgPool3D((1, 1, 1))self.flatten = nn.Flatten()self.classifier = nn.Dense(stage_channels[-1] * block.expansion,num_classes)# init weightsself._initialize_weights()def construct(self, x):"""R2Plus1dNet construct."""x = self.conv1(x)x = self.layer1(x)x = self.layer2(x)x = self.layer3(x)x = self.layer4(x)x = self.avgpool(x)x = self.flatten(x)x = self.classifier(x)return xdef _initialize_weights(self):"""Init the weight of Conv3d and Dense in the net."""for _, cell in self.cells_and_names():if isinstance(cell, nn.Conv3d):cell.weight.set_data(init.initializer(init.HeNormal(math.sqrt(5), mode='fan_out', nonlinearity='relu'),cell.weight.shape, cell.weight.dtype))if cell.bias:cell.bias.set_data(init.initializer(init.Zero(), cell.bias.shape, cell.bias.dtype))elif isinstance(cell, nn.BatchNorm2d):cell.gamma.set_data(init.initializer(init.One(), cell.gamma.shape, cell.gamma.dtype))cell.beta.set_data(init.initializer(init.Zero(), cell.beta.shape, cell.beta.dtype))
  • R2Plus1d18类继承了R2Plu1dNet类,主要的作用是指定residual block的堆叠次数,在此类中指定的数量即为每个block都堆叠两次。
class R2Plus1d18(R2Plus1dNet):"""The class of R2Plus1d-18 uses the registration mechanism to register,need to use the yaml configuration file to call."""def __init__(self, **kwargs):super(R2Plus1d18, self).__init__(block=ResidualBlockBase3D,layer_nums=(2, 2, 2, 2),**kwargs)

三、可执行案例

notebook文件链接

数据集准备

代码仓库使用 Kinetics400 数据集进行训练和验证。

预训练模型

预训练模型是在 kinetics400 数据集上训练,下载地址:r2plus1d18_kinetic400.ckpt

环境准备

git clone https://gitee.com/yanlq46462828/zjut_mindvideo.git
cd zjut_mindvideo# Please first install mindspore according to instructions on the official website: https://www.mindspore.cn/installpip install -r requirements.txt
pip install -e .

训练流程

from mindspore import nn
from mindspore import context, load_checkpoint, load_param_into_net
from mindspore.context import ParallelMode
from mindspore.communication import init, get_rank, get_group_size
from mindspore.train import Model
from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor
from mindspore.nn.loss import SoftmaxCrossEntropyWithLogitsfrom msvideo.utils.check_param import Validator,Rel
数据集加载

通过基于VideoDataset编写的Kinetic400类来加载kinetic400数据集。

from msvideo.data.kinetics400 import Kinetic400
# Data Pipeline.
dataset = Kinetic400(path='/home/publicfile/kinetics-400',split="train",seq=32,num_parallel_workers=1,shuffle=True,batch_size=6,repeat_num=1)
ckpt_save_dir = './r2plus1d'
/home/publicfile/kinetics-400/cls2index.json
数据处理

通过VideoRescale对视频进行缩放,利用VideoResize改变大小,再用VideoRandomCrop对Resize后的视频进行随机裁剪,再用VideoRandomHorizontalFlip根据概率对视频进行水平翻转,利用VideoReOrder对维度进行变换,再用VideoNormalize进行归一化处理。

from msvideo.data.transforms import VideoRandomCrop, VideoRandomHorizontalFlip, VideoRescale
from msvideo.data.transforms import VideoNormalize, VideoResize, VideoReOrdertransforms = [VideoRescale(shift=0.0),VideoResize([128, 171]),VideoRandomCrop([112, 112]),VideoRandomHorizontalFlip(0.5),VideoReOrder([3, 0, 1, 2]),VideoNormalize(mean=[0.43216, 0.394666, 0.37645],std=[0.22803, 0.22145, 0.216989])]
dataset.transform = transforms
dataset_train = dataset.run()
Validator.check_int(dataset_train.get_dataset_size(), 0, Rel.GT)
step_size = dataset_train.get_dataset_size()
[WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-10:30:59.929.412 [mindspore/dataset/core/validator_helpers.py:804] 'Compose' from mindspore.dataset.transforms.py_transforms is deprecated from version 1.8 and will be removed in a future version. Use 'Compose' from mindspore.dataset.transforms instead.
网络构建
from msvideo.models.r2plus1d import R2Plus1d18
# Create model
network = R2Plus1d18(num_classes=400)
from msvideo.schedule.lr_schedule import warmup_cosine_annealing_lr_v1
# Set learning rate scheduler.
learning_rate = warmup_cosine_annealing_lr_v1(lr=0.01,steps_per_epoch=step_size,warmup_epochs=4,max_epoch=100,t_max=100,eta_min=0)
# Define optimizer.
network_opt = nn.Momentum(network.trainable_params(),learning_rate=learning_rate,momentum=0.9,weight_decay=0.00004)
# Define loss function.
network_loss = SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean")
# Set the checkpoint config for the network.
ckpt_config = CheckpointConfig(save_checkpoint_steps=step_size,keep_checkpoint_max=10)
ckpt_callback = ModelCheckpoint(prefix='r2plus1d_kinetics400',directory=ckpt_save_dir,config=ckpt_config)
# Init the model.
model = Model(network, loss_fn=network_loss, optimizer=network_opt, metrics={'acc'})
# Begin to train.
print('[Start training `{}`]'.format('r2plus1d_kinetics400'))
print("=" * 80)
model.train(1,dataset_train,callbacks=[ckpt_callback, LossMonitor()],dataset_sink_mode=False)
print('[End of training `{}`]'.format('r2plus1d_kinetics400'))
[WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-10:41:43.490.637 [mindspore/dataset/core/validator_helpers.py:804] 'Compose' from mindspore.dataset.transforms.py_transforms is deprecated from version 1.8 and will be removed in a future version. Use 'Compose' from mindspore.dataset.transforms instead.
[WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-10:41:43.498.663 [mindspore/dataset/core/validator_helpers.py:804] 'Compose' from mindspore.dataset.transforms.py_transforms is deprecated from version 1.8 and will be removed in a future version. Use 'Compose' from mindspore.dataset.transforms instead.[Start training `r2plus1d_kinetics400`]
================================================================================
epoch: 1 step: 1, loss is 5.998835563659668
epoch: 1 step: 2, loss is 5.921803951263428
epoch: 1 step: 3, loss is 6.024421691894531
epoch: 1 step: 4, loss is 6.08278751373291
epoch: 1 step: 5, loss is 6.014780044555664
epoch: 1 step: 6, loss is 5.945815086364746
epoch: 1 step: 7, loss is 6.078174114227295
epoch: 1 step: 8, loss is 6.0565361976623535
epoch: 1 step: 9, loss is 5.952683448791504
epoch: 1 step: 10, loss is 6.033120632171631
epoch: 1 step: 11, loss is 6.05575704574585
epoch: 1 step: 12, loss is 5.9879350662231445
epoch: 1 step: 13, loss is 6.006839275360107
epoch: 1 step: 14, loss is 5.9968180656433105
epoch: 1 step: 15, loss is 5.971335411071777
epoch: 1 step: 16, loss is 6.0620856285095215
epoch: 1 step: 17, loss is 6.081112861633301
epoch: 1 step: 18, loss is 6.106649398803711
epoch: 1 step: 19, loss is 6.095144271850586
epoch: 1 step: 20, loss is 6.00246000289917
epoch: 1 step: 21, loss is 6.061524868011475
epoch: 1 step: 22, loss is 6.046009063720703
epoch: 1 step: 23, loss is 5.997835159301758
epoch: 1 step: 24, loss is 6.007784366607666
epoch: 1 step: 25, loss is 5.946590423583984
epoch: 1 step: 26, loss is 5.9461164474487305
epoch: 1 step: 27, loss is 5.9034929275512695
epoch: 1 step: 28, loss is 5.925591945648193
epoch: 1 step: 29, loss is 6.176599979400635
......

评估流程

from mindspore import context
from msvideo.data.kinetics400 import Kinetic400context.set_context(mode=context.GRAPH_MODE, device_target="GPU")# Data Pipeline.
dataset_eval = Kinetic400("/home/publicfile/kinetics-400",split="val",seq=32,seq_mode="interval",num_parallel_workers=1,shuffle=False,batch_size=8,repeat_num=1)
/home/publicfile/kinetics-400/cls2index.json
from msvideo.data.transforms import VideoCenterCrop, VideoRescale, VideoReOrder
from msvideo.data.transforms import VideoNormalize, VideoResizetransforms = [VideoResize([128, 171]),VideoRescale(shift=0.0),VideoCenterCrop([112, 112]),VideoReOrder([3, 0, 1, 2]),VideoNormalize(mean=[0.43216, 0.394666, 0.37645],std=[0.22803, 0.22145, 0.216989])]
dataset_eval.transform = transforms
dataset_eval = dataset_eval.run()
from mindspore import nn
from mindspore import context, load_checkpoint, load_param_into_net
from mindspore.train import Model
from mindspore.nn.loss import SoftmaxCrossEntropyWithLogits
from msvideo.utils.callbacks import EvalLossMonitor
from msvideo.models.r2plus1d import R2Plus1d18# Create model
network = R2Plus1d18(num_classes=400)# Define loss function.
network_loss = SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean")param_dict = load_checkpoint('/home/zhengs/r2plus1d/r2plus1d18_kinetic400.ckpt')
load_param_into_net(network, param_dict)# Define eval_metrics.
eval_metrics = {'Loss': nn.Loss(),'Top_1_Accuracy': nn.Top1CategoricalAccuracy(),'Top_5_Accuracy': nn.Top5CategoricalAccuracy()}# Init the model.
model = Model(network, loss_fn=network_loss, metrics=eval_metrics)print_cb = EvalLossMonitor(model)
# Begin to eval.
print('[Start eval `{}`]'.format('r2plus1d_kinetics400'))
result = model.eval(dataset_eval,callbacks=[print_cb],dataset_sink_mode=False)
print(result)
[WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-11:35:48.745.627 [mindspore/train/model.py:1077] For EvalLossMonitor callback, {'epoch_end', 'step_end', 'epoch_begin', 'step_begin'} methods may not be supported in later version, Use methods prefixed with 'on_train' or 'on_eval' instead when using customized callbacks.
[WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-11:35:48.747.418 [mindspore/dataset/core/validator_helpers.py:804] 'Compose' from mindspore.dataset.transforms.py_transforms is deprecated from version 1.8 and will be removed in a future version. Use 'Compose' from mindspore.dataset.transforms instead.
[WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-11:35:48.749.293 [mindspore/dataset/core/validator_helpers.py:804] 'Compose' from mindspore.dataset.transforms.py_transforms is deprecated from version 1.8 and will be removed in a future version. Use 'Compose' from mindspore.dataset.transforms instead.
[WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-11:35:48.751.452 [mindspore/dataset/core/validator_helpers.py:804] 'Compose' from mindspore.dataset.transforms.py_transforms is deprecated from version 1.8 and will be removed in a future version. Use 'Compose' from mindspore.dataset.transforms instead.[Start eval `r2plus1d_kinetics400`]
step:[    1/ 2484], metrics:[], loss:[3.070/3.070], time:1923.473 ms, 
step:[    2/ 2484], metrics:['Loss: 3.0702', 'Top_1_Accuracy: 0.3750', 'Top_5_Accuracy: 0.7500'], loss:[0.808/1.939], time:169.314 ms, 
step:[    3/ 2484], metrics:['Loss: 1.9391', 'Top_1_Accuracy: 0.5625', 'Top_5_Accuracy: 0.8750'], loss:[2.645/2.175], time:192.965 ms, 
step:[    4/ 2484], metrics:['Loss: 2.1745', 'Top_1_Accuracy: 0.5417', 'Top_5_Accuracy: 0.8750'], loss:[2.954/2.369], time:172.657 ms, 
step:[    5/ 2484], metrics:['Loss: 2.3695', 'Top_1_Accuracy: 0.5000', 'Top_5_Accuracy: 0.8438'], loss:[2.489/2.393], time:176.803 ms, 
step:[    6/ 2484], metrics:['Loss: 2.3934', 'Top_1_Accuracy: 0.4750', 'Top_5_Accuracy: 0.8250'], loss:[1.566/2.256], time:172.621 ms, 
step:[    7/ 2484], metrics:['Loss: 2.2556', 'Top_1_Accuracy: 0.4792', 'Top_5_Accuracy: 0.8333'], loss:[0.761/2.042], time:172.149 ms, 
step:[    8/ 2484], metrics:['Loss: 2.0420', 'Top_1_Accuracy: 0.5357', 'Top_5_Accuracy: 0.8571'], loss:[3.675/2.246], time:181.757 ms, 
step:[    9/ 2484], metrics:['Loss: 2.2461', 'Top_1_Accuracy: 0.4688', 'Top_5_Accuracy: 0.7969'], loss:[3.909/2.431], time:186.722 ms, 
step:[   10/ 2484], metrics:['Loss: 2.4309', 'Top_1_Accuracy: 0.4583', 'Top_5_Accuracy: 0.7639'], loss:[3.663/2.554], time:199.209 ms, 
step:[   11/ 2484], metrics:['Loss: 2.5542', 'Top_1_Accuracy: 0.4375', 'Top_5_Accuracy: 0.7375'], loss:[3.438/2.635], time:173.766 ms, 
step:[   12/ 2484], metrics:['Loss: 2.6345', 'Top_1_Accuracy: 0.4318', 'Top_5_Accuracy: 0.7159'], loss:[2.695/2.640], time:171.364 ms, 
step:[   13/ 2484], metrics:['Loss: 2.6395', 'Top_1_Accuracy: 0.4375', 'Top_5_Accuracy: 0.7292'], loss:[3.542/2.709], time:172.889 ms, 
step:[   14/ 2484], metrics:['Loss: 2.7090', 'Top_1_Accuracy: 0.4231', 'Top_5_Accuracy: 0.7308'], loss:[3.404/2.759], time:216.287 ms, 
step:[   15/ 2484], metrics:['Loss: 2.7586', 'Top_1_Accuracy: 0.4018', 'Top_5_Accuracy: 0.7232'], loss:[4.012/2.842], time:171.686 ms, 
step:[   16/ 2484], metrics:['Loss: 2.8422', 'Top_1_Accuracy: 0.3833', 'Top_5_Accuracy: 0.7167'], loss:[5.157/2.987], time:170.363 ms, 
step:[   17/ 2484], metrics:['Loss: 2.9869', 'Top_1_Accuracy: 0.3750', 'Top_5_Accuracy: 0.6875'], loss:[4.667/3.086], time:171.926 ms, 
step:[   18/ 2484], metrics:['Loss: 3.0857', 'Top_1_Accuracy: 0.3603', 'Top_5_Accuracy: 0.6618'], loss:[5.044/3.194], time:197.028 ms, 
step:[   19/ 2484], metrics:['Loss: 3.1945', 'Top_1_Accuracy: 0.3403', 'Top_5_Accuracy: 0.6458'], loss:[3.625/3.217], time:222.758 ms, 
step:[   20/ 2484], metrics:['Loss: 3.2171', 'Top_1_Accuracy: 0.3355', 'Top_5_Accuracy: 0.6513'], loss:[1.909/3.152], time:207.416 ms, 
step:[   21/ 2484], metrics:['Loss: 3.1517', 'Top_1_Accuracy: 0.3563', 'Top_5_Accuracy: 0.6625'], loss:[4.591/3.220], time:171.645 ms, 
step:[   22/ 2484], metrics:['Loss: 3.2202', 'Top_1_Accuracy: 0.3631', 'Top_5_Accuracy: 0.6667'], loss:[3.545/3.235], time:209.975 ms, 
step:[   23/ 2484], metrics:['Loss: 3.2350', 'Top_1_Accuracy: 0.3693', 'Top_5_Accuracy: 0.6591'], loss:[3.350/3.240], time:185.889 ms,

Code

代码仓库地址如下:

Gitee地址
Github地址

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/10807.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

鸿蒙哪些机型可以用,鸿蒙2.0可以在哪些型号的手机中使用?鸿蒙2.0适配的机型介绍...

鸿蒙2.0全新系统已于昨日正式更新发布,在很多方面都具有非常不错的优越性,而且在一台手机中也可以实现多个系统同时运行,这在其他品牌的手机上是极少出现的,小编相信本次的更新一定可以为大家带来更棒的使用体验,而且适…

华为nova2s可以升级鸿蒙吗,华为Nova7怎么升级鸿蒙系统 Nova7升级鸿蒙系统步骤教程...

Nova7这一次在鸿蒙系统的升级名单之中在第二批次,那么华为Nova7怎么升级鸿蒙系统呢?为了解决各位小伙伴们疑惑的问题,小编收集了资料带来了Nova7升级鸿蒙系统步骤教程。 华为Nova7怎么升级鸿蒙系统 重要的事情多说几遍,一定记得备…

什么牌子的护眼灯对孩子眼睛好?盘点五款护眼灯

现在生活节奏越来越快,夜间的学习和工作已经不可避免。很多人在劣质的光源下眼睛会出现各种问题。为了孩子、或者为了自己,选择一款优质光源的台灯保护眼睛极其重要。 那么我们该选择哪个牌子的护眼台灯呢?其实,选择哪个牌子的护眼…

哪个牌子的led灯质量好?2022LED护眼台灯最好的品牌有哪些

谈及led灯的品牌,就不得不提一些比较专业的厂商了,特别是在护眼照明领域,明基、南卡、飞利浦、松下等品牌都有不俗的实力,出产的led护眼台灯在业内都有广泛的知名度,在消费者领域也是好评连连。那么它们到底好在哪儿呢…

JUnit 5 –如何禁用测试?

JUnit 5 @Disabled示例禁用整个测试类或单个测试方法上的测试。 PS已通过JUnit 5.5.2测试 注意 您还可以根据条件禁用测试 。 1. @禁用方法 1.1测试

手机表格html5,手机上怎么做表格?

手机上怎么做表格?以下文字资料是由(历史新知网www.lishixinzhi.com)小编为大家搜集整理后发布的内容,让我们赶快一起来看一下吧! 手机上怎么做表格? 我的手机都是通过下载安装wps这个app程序,制作的表格。 就个人使用经验,手机制作表格,最好只做简单明了的,方便制作的…

ShuffleNetV2Plus-基于MindStudio的MindX SDK应用开发

目录 一、任务介绍 二、环境搭建与配置 1. Windows安装MindStudio 2. Windows安装MindX SDK 三、推理开发运行流程 四、ShuffleNetV2Plus模型推理介绍 五、Python应用开发(可参考代码) 1. 创建MindX SDK应用工程 2. 模型转换 3. pipeline流程文…

ba2plus android,BAPlus金融计算器

BAPlus金融计算器基于金融行业所开发的计算器软件,金融行业当中相关数据可以直接的通过这款软件来进行计算,对于金融行业的从业者来说这款软件相较于笔记本等来说更加的便利,所以有需要的话就快来下载这款BAPlus金融计算器! BAPlu…

研报精选230522

目录 【行业230522东亚前海证券】新能源行业深度报告:政策东风与海外需求共振,充电桩迎新一轮增长周期 【行业230522西南证券】人工智能专题研究:AIGC投资框架 【行业230522国信证券】传媒互联网行业周报:OpenAI推出移动版及网页端…

GPU受限,国内AI大模型能否交出自己的答卷?

继百度之后,阿里、华为、京东、360等大模型也陆续浮出水面,大模型军备竞赛正式开启。 4月7日,阿里云宣布自研大模型“通义千问”开始邀请企业用户测试体验。 4月8日,华为云人工智能领域首席科学家田奇现身《人工智能大模型技术高峰…

国产AI服务器分类、技术及产品(2023)

目前国产服务器主要品牌也就是浪潮、曙光、华为、超聚变、新华三、联想、风虎(科研服务器风虎信息、风虎云龙),也还有很多其他品牌,外国品牌惠普、戴尔、IBM等在国内还有不小的份额,其实核心部件大家都一样&#xff0c…

2023第五届双态IT北京用户大会 | 一起见证云原生时代的数据魅力

2023年6月9日-11日,由ITSS分会指导,ITSS数据中心运营管理组、双态IT论坛联合主办,ITSS媒体组协办的“2023第五届双态IT北京用户大会”将于北京召开。 为了能够有更多专注细分领域、内容深入的分享和探讨,每一届都会和论坛成员一起…

又涨了?2023全国程序员薪资最新统计(文末附招聘岗位)

大家好,金三银四招聘季还在进行中。刚好最近看到一份 2022 国内程序员薪酬报告,感觉挺有意思的,跟大家分享一下。 在科技迅速发展的时代,各行业对程序员的需求持续增长,程序员作科技市场的“重要基石”,薪…

AIGC如此火爆,有何机会? 10位大咖有话要说

金句 集锦 01‍‍‍‍ 大模型初步验证了一种新范式的价值,并且打破了此前的瓶颈。 02 目前可能90% 的创作都是人来做,剩下10% 由机器辅助,未来将反过来。 03 AI大模型(比如ChatGPT)的兴起跟2010年移动互联网刚起步一样…

谁是液冷行业真龙头?疯狂的液冷技术!

“人工智能领域AIGC”、“ChatGPT”、“数据特区”、“东数西算”、“数据中心”,可以说是2023年最热的概念,算力提升的背后,处理器的功耗越来越高,想发挥出处理器的最高性能,需要更高的散热效率。 算力井喷之下&…

协程Flow原理

什么是Flow Flow直译过来就是“流”的意思,也就是将我们我们任务如同水流一样一步一步分割做处理。想象一下,现在有一个任务需要从山里取水来用你需要怎么做? 扛上扁担走几十里山路把水挑回来。简单粗暴,但是有可能当你走了几十…

华为2023年一季度收入增长0.8%;微软将推私有版ChatGPT;2022年中国自动驾驶市场增速达106%丨每日大事件...

‍ ‍数据智能产业创新服务媒体 ——聚焦数智 改变商业 企业动态 OpenAI完成新一轮融资,估值接近300亿美元 据报道,OpenAI完成3亿美元融资,估值达到270亿-290亿美元。本轮融资参与的风投公司包括老虎全球、红杉资本、加州Andreessen Horowit…

最全ChatGPT创业方向!谁是下个字节跳动?

图片|Photo by Jonathan Kemper on Unsplash ©自象限原创 作者|程心 编辑|云天明 排版|李帛锦 3月24日,OpenAI轻描淡写的宣布了两件大事: 一是ChatGPT联网了。 二是OpenAI开放了第三方插件&…

Flutter ChatGPT | 代码生成器

theme: cyanosis highlight: mono-blue ChatGPT 作为一个自然语言处理工具,已经火了一段时间。对待 ChatGPT 不同人有着不同的看法,新事物的出现必然如此。利益相关者形成 抵制 和 狂热 两极;哗众取宠者蹭蹭热度,问些花活&#xf…

汽车+ChatGPT 车内生活体验再升级

这两年,人工智能工具ChatGPT爆火,在全球掀起了大模型之战。如今,最前沿的自然语言处理大模型应用到了人类的出行工具上,梅赛德斯-奔驰和微软官宣正在合作测试车载ChatGPT人工智能,并将面向约90万车主开启测试&#xff…