【Da-SimaRPN】《Distractor-aware Siamese Networks for Visual Object Tracking》

在这里插入图片描述

ECCV-2018

中科大

文章目录

1 Background and Motivation
2 Related Work
3 Advantages / Contributions
4 Method
- 4.1 Features and Drawbacks in Traditional Siamese Networks
- 4.2 Distractor-aware Training
- 4.3 Distractor-aware Incremental Learning
- 4.4 DaSiamRPN for Long-term Tracking
5 Experiments
- 5.1 Datasets and Metrics
- 5.2 State-of-the-art Comparisons on VOT Datasets
- 5.3 State-of-the-art Comparisons on UAV Datasets
- 5.4 State-of-the-Art Comparisons on OTB Datasets
- 5.5 Ablation Analyses
6 Conclusion（own） / Future work

1 Background and Motivation

单目标跟踪的难点：occlusions, out-of-view, deformation, background cluttering and other variations

Siamese tracking approaches can only discriminate foreground from the non-semantic backgrounds，缺点如下

背景复杂时效果可能翻车
往往失去了 on-line 更新模型的机制
长期跟踪的时候，full occlusion and out-of-view challenges 场景可能处理的不好

作者聚焦 accurate and long-term tracking，提出 Distractor-aware Siamese Networks，在离线训练阶段引入了 effective sampling strategy，推理阶段提出 distractor-aware module，效果显著

2 Related Work

Siamese Networks based Tracking
Features for Tracking
Long-term Tracking

3 Advantages / Contributions

发现 imbalance of the non-semantic background and semantic distractor in the training data is the main obstacle for the learning.
提出 Distractor-aware Siamese Region Proposal Networks (DaSiamRPN)，训练的时候 to learn distractor-aware features，推理的时候 online tracking explicitly suppress distractors
推理阶段提出 local-to-global search region strategy，提升 long-term 跟踪效果明显

4 Method

4.1 Features and Drawbacks in Traditional Siamese Networks

在这里插入图片描述
用的是 metric learning

Metric Learning，也称为距离度量学习或相似度学习，旨在学习一个能够捕捉数据高层语义信息的距离函数。这个函数通常被称为嵌入函数（Embedding Function），用于将数据映射到一个新的空间，使得在该空间中，相似样本之间的距离较小，而不同样本之间的距离较大。

训练的时候 non-semantic background occupies the majority

导致很难区分比较复杂的背景

图 1 展现的淋漓尽致

4.2 Distractor-aware Training

数据抽样方式

在这里插入图片描述

1）Diverse categories of positive pairs can promote the generalization ability

引入了 ImageNet Detection and COCO Detection 目标检测的数据集，丰富了正样本的类别，如图2（a）所示

2）Semantic negative pairs can improve the discriminative ability

负样本不仅来自于同类别，也引入了不同类别的负样本，如图2（b）和（c）

同类别的负样本可以让网络 focused on fine-grained representation

3）Customizing effective data augmentation for visual tracking

除了常规的 translation（12 pixels）, scale variations（0.85 to 1.15） and illumination changes，

还引入了 motion blur 数据增强方法

25% of the pairs are converted to grayscale

4.3 Distractor-aware Incremental Learning

增量学习

增量学习（Incremental Learning）指的是一个学习系统能够不断地从新样本中学习新的知识，并能在这一过程中保存大部分以前已经学习到的知识。

在这里插入图片描述

通用的方法是用 cosine window to suppress the distractors（越近分值惩罚越低，越远越高）， not guaranteed when the motion of objects are messy

作者 propose a distractor-aware module to effectively transfer the general representation to the video domain

（video domain 没有太明白指的是什么）

下面看看作者的具体增量学习方法——distractor-aware module

孪生跟踪器学习的是 similarity metric $f (z, x)$ ，基础知识可以参考

【SiamFC】《Fully-Convolutional Siamese Networks for Object Tracking》
【SiamRPN】《High Performance Visual Tracking With Siamese Region Proposal Network》

在这里插入图片描述

作者在这个的基础上引入 hard negative samples (distractors)

17 ∗ 17 ∗ 5 proposals in each frame，用 NMS 筛选出 potential distractors $d_i$ in each frames，筛选的方式如下

在这里插入图片描述
$h$ is the predefined threshold

$z_t$ is the selected target in frame $t$ ，得分最高的 proposal 选为 $z_t$

the number of this set $∣ D ∣ = n$

总结一下，就是和模板 $z$ 相关后得分高于阈值 $h$ 的 proposal 会被选定留下来作为 potential distractors

接下来 re-rank the proposals $P$ which have top-k similarities with the exemplar——从 potential distractors 中挑出得分最高的 $k$ 个 proposal （ $p_k$ ）进行后续操作

在这里插入图片描述

weight factor $\hat{\alpha} = 0.5$

weight factor $\alpha_i = 1$ can be viewed as the dual variables with sparse regularization

对偶变量是指在对偶线性规划问题中的变量，用于衡量资源或条件的价值。
它表示第i种资源每增加一单位对目标函数的贡献。

$d_i$ 需遍历 $n$ 个 proposals

$p_k$ 需遍历 $k$ 个 proposals

使得分最高的 k 个 proposals $p_k$ （除了得分最高的 $z_t$ ，可能就是目标 x 本身）和其他 NMS 后的 proposals 的相似度尽可能低——拉开前景和背景的差距，可以这么理解吧

exemplars and distractors can be viewed as positive and negative samples in correlation filters

作者对上述公式进行加速

在这里插入图片描述

it enables the tracker run in the comparable speed in comparisons with SiamRPN

引入学习率 $\beta = \sum_{i=0}^{t-1}(\frac{\eta}{1- \eta })^{i}$ ， $\eta=0.01$

在这里插入图片描述

这就是优化目标，替代了上面的
在这里插入图片描述

训练的时候优化，推理的时候 online tracking

4.4 DaSiamRPN for Long-term Tracking

severe out-of-view and full occlusion introduce extra challenges in long-term tracking

作者引入了 a simple yet effective local-to-global search region strategy

在这里插入图片描述
目标丢失后，DaSiamRPN 的 score 明显降低了（红色曲线），这个应该是学习的网络更好导致的，和这个测试时候才使用的策略没有关系

目标丢失时，DaSiamRPN 搜索范围会增大——iterative local-to-global search strategy

以便捕捉到在常规搜索范围之外出现的目标

5 Experiments

5.1 Datasets and Metrics

数据集

VOT2015
VOT2016
VOT2017
UAV20L with 20 long-term videos
UAV123 with 123 videos
OTB2015

评价方式

accuracy (A)
robustness ®
expected average overlap (EAO)
OP: mean overlap precision at the threshold of 0.5;
DP: mean distance precision of 20 pixels;
Success and precision plots

5.2 State-of-the-art Comparisons on VOT Datasets

在这里插入图片描述

领先的很明显

5.3 State-of-the-art Comparisons on UAV Datasets

在这里插入图片描述

在这里插入图片描述
long-term tracking dataset can be attributed to the distractor-aware features and local-to-global search strategy.

5.4 State-of-the-Art Comparisons on OTB Datasets

All the trackers are initialized with the ground-truth object state in the first frame

在这里插入图片描述

5.5 Ablation Analyses

在这里插入图片描述

消融的实验很好的体现了本文提出的方法的提升点

6 Conclusion（own） / Future work

核心提升，the distractor-aware features and local-to-global search strategy.（本博客 4.3 小节和 4.4 小节）
作者写的博客ECCV视觉目标跟踪之DaSiamRPN

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.rhkb.cn/news/349328.html

如若内容造成侵权/违法违规/事实不符，请联系长河编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！