Learning Global Structure Consistency for Robust Object Tracking

Abstract

Fast appearance variations and the distractions of similar objects are two of the most challenging problems in visual object tracking. Unlike many existing trackers that focus on modeling only the target, in this work, we consider the \emph{transient variations of the whole scene}. The key insight is that the object correspondence and spatial layout of the whole scene are consistent (i.e., global structure consistency) in consecutive frames which helps to disambiguate the target from distractors. Moreover, modeling transient variations enables to localize the target under fast variations. Specifically, we propose an effective and efficient short-term model that learns to exploit the global structure consistency in a short time and thus can handle fast variations and distractors. Since short-term modeling falls short of handling occlusion and out of the views, we adopt the long-short term paradigm and use a long-term model that corrects the short-term model when it drifts away from the target or the target is not present. These two components are carefully combined to achieve the balance of stability and plasticity during tracking. We empirically verify that the proposed tracker can tackle the two challenging scenarios and validate it on large scale benchmarks. Remarkably, our tracker improves state-of-the-art-performance on VOT2018 from 0.440 to 0.460, GOT-10k from 0.611 to 0.640, and NFS from 0.619 to 0.629.


Read More

HAMBox: Delving into Online High-quality Anchors Mining for Detecting Outer Faces

Abstract

Current face detectors utilize anchors to frame a multi-task learning problem which combines classification and bounding box regression. Effective anchor design and anchor matching strategy enable face detectors to localize faces under large pose and scale variations. However, we observe that more than 80% correctly predicted bounding boxes are regressed from the unmatched anchors (the IoUs between anchors and target faces are lower than a threshold) in the inference phase. It indicates that these unmatched anchors perform excellent regression ability, but the existing methods neglect to learn from them. In this paper, we propose an Online High-quality Anchor Mining Strategy (HAMBox), which explicitly helps outer faces compensate with high-quality anchors. Our proposed HAMBox method could be a general strategy for anchor-based single-stage face detection. Experiments on various datasets, including WIDER FACE, FDDB, AFW and PASCAL Face, demonstrate the superiority of the proposed method. Furthermore, our team win the championship on the Face Detection test track of WIDER Face and Pedestrian Challenge 2019. We will release the codes with PaddlePaddle.


BFBox: Searching Face-appropriate Backbone and Feature Pyramid Network for Robust Face Detector

Abstract

本文提出的方法BFBox是基于神经网络架构搜索(NAS)的方法同时搜索适合人脸检测的特征提取器和特征金字塔。动机是我们发现了一个有趣的现象:针对图像分类任务设计的流行的特征提取器已经在通用目标检测任务上验证了其重要的兼容性,然而在人脸检测任务上却没有取得预期的效果。同时不同的特征提取器与特征金字塔的结合也不是完全正相关的。首先,本文对于比较好的特征提取器进行分析,提出了适合人脸的搜索空间;其次,提出了图1的特征金字塔注意力模块(FPN-attention Module)去加强特征提取器和特征金字塔之间的联系;最后, 采取SNAS的方法同时搜出适和人脸的特征提取器和特征金字塔结构。多个数据集上(WIDER FACE, FDDB, AFW和PASCAL Face)的实验表明了我们提出的方法的优越性。
如下图所示为检测网络的结构。网络是基于RetinaNet的结构加上我们提出的特征金字塔注意力模块(FPN-attention Module),训练超网络时采用的是随机采样的方法。


Read More

基于身份保持的条件对抗生成网络的人脸老化IPCGAN (CVPR2018)
https://github.com/dawei6875797/Face-Aging-with-Identity-Preserved-Conditional-Generative-Adversarial-Networks
PyramidBox人脸检测器 (ECCV2018)
https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/face_detection
人脸检测轻量化模型faceboxes和blazeface
https://github.com/PaddlePaddle/PaddleDetection/tree/release/0.2/configs/face_detection
[抗击肺炎] 口罩人脸检测与分类
https://www.paddlepaddle.org.cn/hub/scene/maskdetect