##Decoupled IoU Regression for Object Detection


Non-maximum suppression(NMS) is widely used in object detection pipelines for removing duplicated bounding boxes. The inconsistency between the confidence for NMS and the real localization confidence seriously affects detection performance. Prior works propose to predict Intersection-over-Union (IoU) between bounding boxes and corresponding ground-truths to improve NMS, while accurately predicting IoU is still a challenging problem. We argue that the complex definition of IoU and feature misalignment make it difficult to predict IoU accurately. In this paper, we propose a novel Decoupled IoU Regression(DIR) model to handle these problems. The proposed DIR decouples the traditional localization confidence metric IoU into two new metrics, Purity and Integrity. Purity reflects the proportion of the object area in the detected bounding box, and Integrity refers to the completeness of the detected object area. Separately predicting Purity and Integrity can divide the complex mapping between the bounding box and its IoU into two clearer mappings and model them independently. In addition, a simple but effective feature realignment approach is also introduced to make the IoU regressor work in a hindsight manner, which can make the target mapping more stable. The proposed DIR can be conveniently integrated with existing two-stage detectors and significantly improve their performance. Through a simple implementation of DIR with Faster R-CNN, we obtain 41.9% AP on MS COCO benchmark under ResNet101 backbone, which outperforms previous methods by a large margin and achieves state-of-the-art.

Read More

##Deep Interactive Video Inpainting: an Invisibility Cloak for Harry Potter


In this paper, we propose a new task of deep interactive video inpainting and an application for users interact with the machine. To our knowledge, this is the first deep learning based interactive video inpainting work that only uses a free form user input as guidance (i.e. scribbles) instead of mask annotations for each frame, which has academic, entertainment, and commercial value. With users’ scribbles on a certain frame, it can simultaneously perform interactive video object segmentation and video inpainting tasks throughout the whole video. We utilize a shared spatial-temporal memory module, which combines the interactive video object segmentation and video inpainting tasks into an end-to-end pipeline. In our framework, the past frames with object masks(either the user’s scribbles or the predicted masks) form an external memory, and the current frame as the query is segmented and inpainted using the information in the shared memory. Furthermore, our method allows users to iteratively refine the segmentation results, which can effectively improve the inpainting results where the video object segmentation fails, thus allowing users to obtain high-quality video inpainting results even on challenging sequences. Qualitative and quantitative experimental results demonstrate the superiority of our approach.

Read More






Read More

提取密码:tgva (已报备)

链接: https://pan.baidu.com/s/1JkTfnyhT6HbsW53EYfyksg 密码: tgva


Read More



提取码:43C3 (已报备)

Read More

目前我在阿里文娱负责视频智能创作&互动特效方向。现主要研究方向包含两大块—视频智能创作:Video Summary/ Video Grounding/ 视频智能解说/ 文本视频化/ Text Video Retrieval,人脸互动特效:人脸检测跟踪/人脸编辑/ 人脸风格化/人脸换脸/人脸属性等。

另外,本人在招Research Intern和社招。欢迎感兴趣的同学可以邮件(或微信)联系我:buhui.tx@alibaba-inc.com

Read More