##Decoupled IoU Regression for Object Detection

Abstract

Non-maximum suppression(NMS) is widely used in object detection pipelines for removing duplicated bounding boxes. The inconsistency between the confidence for NMS and the real localization confidence seriously affects detection performance. Prior works propose to predict Intersection-over-Union (IoU) between bounding boxes and corresponding ground-truths to improve NMS, while accurately predicting IoU is still a challenging problem. We argue that the complex definition of IoU and feature misalignment make it difficult to predict IoU accurately. In this paper, we propose a novel Decoupled IoU Regression(DIR) model to handle these problems. The proposed DIR decouples the traditional localization confidence metric IoU into two new metrics, Purity and Integrity. Purity reflects the proportion of the object area in the detected bounding box, and Integrity refers to the completeness of the detected object area. Separately predicting Purity and Integrity can divide the complex mapping between the bounding box and its IoU into two clearer mappings and model them independently. In addition, a simple but effective feature realignment approach is also introduced to make the IoU regressor work in a hindsight manner, which can make the target mapping more stable. The proposed DIR can be conveniently integrated with existing two-stage detectors and significantly improve their performance. Through a simple implementation of DIR with Faster R-CNN, we obtain 41.9% AP on MS COCO benchmark under ResNet101 backbone, which outperforms previous methods by a large margin and achieves state-of-the-art.

Read More

##Deep Interactive Video Inpainting: an Invisibility Cloak for Harry Potter

Abstract

In this paper, we propose a new task of deep interactive video inpainting and an application for users interact with the machine. To our knowledge, this is the first deep learning based interactive video inpainting work that only uses a free form user input as guidance (i.e. scribbles) instead of mask annotations for each frame, which has academic, entertainment, and commercial value. With users’ scribbles on a certain frame, it can simultaneously perform interactive video object segmentation and video inpainting tasks throughout the whole video. We utilize a shared spatial-temporal memory module, which combines the interactive video object segmentation and video inpainting tasks into an end-to-end pipeline. In our framework, the past frames with object masks(either the user’s scribbles or the predicted masks) form an external memory, and the current frame as the query is segmented and inpainted using the information in the shared memory. Furthermore, our method allows users to iteratively refine the segmentation results, which can effectively improve the inpainting results where the video object segmentation fails, thus allowing users to obtain high-quality video inpainting results even on challenging sequences. Qualitative and quantitative experimental results demonstrate the superiority of our approach.

Read More


阿里文娱速看短视频自动化生产解决方案

随着用户的时间碎片化程度加剧,视频“由长变短”成为一种趋势,信息流场景下的短视频消费需求日益增长,优酷每年为用户提供大量优质视频资源,具备天然的“由长变短”优势,并通过算法研究在速看短视频的自动化生产方面取得突破。

AI自动剪辑的目标是通过算法手段全自动或半自动进行视频剪辑,借助机器的批量化优势实现批量化生产,能够提升内容生产效率,提升短视频运营和分发效率。目前全网人工短视频生产集中在头部IP,AI自动剪辑可以为腰、尾部版权IP内容进行定向供货,带来新的流量增长点。

目前优酷已经将AI算法能力赋能到了多个业务场景,比如优酷弹幕看点提取、视频理解标签、剧集前情提要、智能封面图、视频速看解说等。例如,智能封面图能力不但支持短视频智能生产,还作为媒资的基础服务开放给UPGC,应用于优酷号上传、优酷搜索、短/小视频推荐等场景。

与此同时,还搭建了前情提要“机器生产+人工审核+广告生成”的生产链路,相比纯人工生产的前情提要,新链路将生产时长从天级别压缩到分钟级别,极大地提高了生产效率。

Read More



浙大蔡登老师实验室分享—视频多模态理解&互动特效的研究与技术实践,相关ppt参考链接:https://pan.baidu.com/s/1JkTfnyhT6HbsW53EYfyksg
提取密码:tgva (已报备)

链接: https://pan.baidu.com/s/1JkTfnyhT6HbsW53EYfyksg 密码: tgva

分享内容包括:
人脸互动特效(换脸,人脸风格化、人脸编辑、人脸属性等)
视频浓缩,视频看点提取,视频解说等。

Read More


DataFun峰会知识图谱与智能创作论坛技术分享,阿里文娱视频智能生产技术实践

分享我们在视频智能生产和创作上,近期的进展,包括视频切条、视频混剪&二创、视频浓缩、视频解说、文本视频化等。

相关ppt见链接:https://pan.baidu.com/s/1KfPKhqIxk9sKGgj9FlmCVw
提取码:43C3 (已报备)

Read More

很荣幸能够加入阿里巴巴,希望在接下来的几年能够继续努力、高效、快乐的工作,做出更多有意义、有价值、有影响力的科研成果和产品。
目前我在阿里文娱负责视频智能创作&互动特效方向。现主要研究方向包含两大块—视频智能创作:Video Summary/ Video Grounding/ 视频智能解说/ 文本视频化/ Text Video Retrieval,人脸互动特效:人脸检测跟踪/人脸编辑/ 人脸风格化/人脸换脸/人脸属性等。


另外,本人在招Research Intern和社招。欢迎感兴趣的同学可以邮件(或微信)联系我:buhui.tx@alibaba-inc.com
期待你的加入。

Read More