ECCV-2018
中科大
文章目录
- 1 Background and Motivation
- 2 Related Work
- 3 Advantages / Contributions
- 4 Method
- 4.1 Features and Drawbacks in Traditional Siamese Networks
- 4.2 Distractor-aware Training
- 4.3 Distractor-aware Incremental Learning
- 4.4 DaSiamRPN for Long-term Tracking
- 5 Experiments
- 5.1 Datasets and Metrics
- 5.2 State-of-the-art Comparisons on VOT Datasets
- 5.3 State-of-the-art Comparisons on UAV Datasets
- 5.4 State-of-the-Art Comparisons on OTB Datasets
- 5.5 Ablation Analyses
- 6 Conclusion(own) / Future work
1 Background and Motivation
单目标跟踪的难点:occlusions, out-of-view, deformation, background cluttering and other variations
Siamese tracking approaches can only discriminate foreground from the non-semantic backgrounds,缺点如下
- 背景复杂时效果可能翻车
- 往往失去了 on-line 更新模型的机制
- 长期跟踪的时候,full occlusion and out-of-view challenges 场景可能处理的不好
作者聚焦 accurate and long-term tracking,提出 Distractor-aware Siamese Networks,在离线训练阶段引入了 effective sampling strategy,推理阶段提出 distractor-aware module,效果显著
2 Related Work
- Siamese Networks based Tracking
- Features for Tracking
- Long-term Tracking
3 Advantages / Contributions
- 发现 imbalance of the non-semantic background and semantic distractor in the training data is the main obstacle for the learning.
- 提出 Distractor-aware Siamese Region Proposal Networks (DaSiamRPN),训练的时候 to learn distractor-aware features,推理的时候 online tracking explicitly suppress distractors
- 推理阶段提出 local-to-global search region strategy,提升 long-term 跟踪效果明显
4 Method
4.1 Features and Drawbacks in Traditional Siamese Networks
用的是 metric learning
Metric Learning,也称为距离度量学习或相似度学习,旨在学习一个能够捕捉数据高层语义信息的距离函数。这个函数通常被称为嵌入函数(Embedding Function),用于将数据映射到一个新的空间,使得在该空间中,相似样本之间的距离较小,而不同样本之间的距离较大。
训练的时候 non-semantic background occupies the majority
导致很难区分比较复杂的背景
图 1 展现的淋漓尽致
4.2 Distractor-aware Training
数据抽样方式
1)Diverse categories of positive pairs can promote the generalization ability
引入了 ImageNet Detection and COCO Detection 目标检测的数据集,丰富了正样本的类别,如图2(a)所示
2)Semantic negative pairs can improve the discriminative ability
负样本不仅来自于同类别,也引入了不同类别的负样本,如图2(b)和(c)
同类别的负样本可以让网络 focused on fine-grained representation
3)Customizing effective data augmentation for visual tracking
除了常规的 translation(12 pixels), scale variations(0.85 to 1.15) and illumination changes,
还引入了 motion blur 数据增强方法
25% of the pairs are converted to grayscale
4.3 Distractor-aware Incremental Learning
增量学习
增量学习(Incremental Learning)指的是一个学习系统能够不断地从新样本中学习新的知识,并能在这一过程中保存大部分以前已经学习到的知识。
通用的方法是用 cosine window to suppress the distractors(越近分值惩罚越低,越远越高), not guaranteed when the motion of objects are messy
作者 propose a distractor-aware module to effectively transfer the general representation to the video domain
(video domain 没有太明白指的是什么)
下面看看作者的具体增量学习方法——distractor-aware module
孪生跟踪器学习的是 similarity metric f ( z , x ) f(z,x) f(z,x),基础知识可以参考
- 【SiamFC】《Fully-Convolutional Siamese Networks for Object Tracking》
- 【SiamRPN】《High Performance Visual Tracking With Siamese Region Proposal Network》
作者在这个的基础上引入 hard negative samples (distractors)
17 ∗ 17 ∗ 5 proposals in each frame,用 NMS 筛选出 potential distractors d i d_i di in each frames,筛选的方式如下
h h h is the predefined threshold
z t z_t zt is the selected target in frame t t t,得分最高的 proposal 选为 z t z_t zt
the number of this set ∣ D ∣ = n |D| = n ∣D∣=n
总结一下,就是和模板 z z z 相关后得分高于阈值 h h h 的 proposal 会被选定留下来作为 potential distractors
接下来 re-rank the proposals P P P which have top-k similarities with the exemplar——从 potential distractors 中挑出得分最高的 k k k 个 proposal ( p k p_k pk)进行后续操作
weight factor α ^ = 0.5 \hat{\alpha} = 0.5 α^=0.5
weight factor α i = 1 \alpha_i = 1 αi=1 can be viewed as the dual variables with sparse regularization
对偶变量是指在对偶线性规划问题中的变量,用于衡量资源或条件的价值。
它表示第i种资源每增加一单位对目标函数的贡献。
d i d_i di 需遍历 n n n 个 proposals
p k p_k pk 需遍历 k k k 个 proposals
使得分最高的 k 个 proposals p k p_k pk(除了得分最高的 z t z_t zt,可能就是目标 x 本身)和其他 NMS 后的 proposals 的相似度尽可能低——拉开前景和背景的差距,可以这么理解吧
exemplars and distractors can be viewed as positive and negative samples in correlation filters
作者对上述公式进行加速
it enables the tracker run in the comparable speed in comparisons with SiamRPN
引入学习率 β = ∑ i = 0 t − 1 ( η 1 − η ) i \beta = \sum_{i=0}^{t-1}(\frac{\eta}{1- \eta })^{i} β=∑i=0t−1(1−ηη)i, η = 0.01 \eta=0.01 η=0.01
这就是优化目标,替代了上面的
训练的时候优化,推理的时候 online tracking
4.4 DaSiamRPN for Long-term Tracking
severe out-of-view and full occlusion introduce extra challenges in long-term tracking
作者引入了 a simple yet effective local-to-global search region strategy
目标丢失后,DaSiamRPN 的 score 明显降低了(红色曲线),这个应该是学习的网络更好导致的,和这个测试时候才使用的策略没有关系
目标丢失时,DaSiamRPN 搜索范围会增大——iterative local-to-global search strategy
以便捕捉到在常规搜索范围之外出现的目标
5 Experiments
5.1 Datasets and Metrics
数据集
-
VOT2015
-
VOT2016
-
VOT2017
-
UAV20L with 20 long-term videos
-
UAV123 with 123 videos
-
OTB2015
评价方式
-
accuracy (A)
-
robustness ®
-
expected average overlap (EAO)
-
OP: mean overlap precision at the threshold of 0.5;
-
DP: mean distance precision of 20 pixels;
-
Success and precision plots
5.2 State-of-the-art Comparisons on VOT Datasets
领先的很明显
5.3 State-of-the-art Comparisons on UAV Datasets
long-term tracking dataset can be attributed to the distractor-aware features and local-to-global search strategy.
5.4 State-of-the-Art Comparisons on OTB Datasets
All the trackers are initialized with the ground-truth object state in the first frame
5.5 Ablation Analyses
消融的实验很好的体现了本文提出的方法的提升点
6 Conclusion(own) / Future work
-
核心提升,the distractor-aware features and local-to-global search strategy.(本博客 4.3 小节和 4.4 小节)
-
作者写的博客ECCV视觉目标跟踪之DaSiamRPN