DeepSurvey2015

"cvpaper.challenge in 2015"
Hirokatsu Kataoka (AIST), Yudai Miyashita, Tomoaki Yamabe (TDU), Soma Shirakabe, (AIST, Univ. of Tsukuba), Shin'ichi Sato, Hironori Hoshino, Ryo Kato, Kaori Abe, Takaaki Imanari, Naomichi Kobayashi, Shinichiro Morita, Akio Nakamura (TDU)

The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers on computer vision, pattern recognition, and related fields. For this particular review, we focused on reading the ALL 602 conference papers presented at the CVPR2015, the premier annual computer vision event held in June 2015, in order to grasp the trends in the field. We will describe the characteristics of CVPR2015 and discuss the trends and leading methods used in three areas; namely, recognition, 3D, and imaging/image processing. Further, we are proposing "DeepSurvey" as a mechanism embodying the entire process from the reading through all the papers, the generation of ideas, and to the writing of paper. There is a need to gain the ability to view the field from a wider perspective aside from actually testing the survey results to better understand the issues.

We are proposing DeepSurvey (see Figure below) as a mechanism for the systematization of knowledge, the generation of ideas, and as well as the writing of papers (specially for new research problems) based on an extensive reading of papers. DeepSurvey architecture is devised based on DeepLearning, which has flourished in recent years, and is composed of the following elements:

Input: Input the papers read (knowledge)

1st ideas: Individually generate ideas (from knowledge to ideas)

1st discussion: Group discussion (consolidation of ideas)

2nd ideas: Generate more ideas based on consolidated ideas

2nd discussion: Further refinement of ideas

1st implementation: Pick-up and hackathon

2nd implementation: Full-scale implementation and experiment

Output: Paper

In comparison with general Convolutional Neural Networks (CNN) [LeCun+, 1998], "ideas" can be replaced with "convolution layer," "discussion" with "pooling," and "implementation" with "fully connected layer" to make it easier to understand. In "pooling” (discussion), multiple ideas are collected and good ideas are inputted as they are to the next layer, thus, it is closely similar to Lp pooling, which simultaneously possesses characteristics of max pooling and average pooling. The strategy is to repeat generation of ideas and discussion, and proceed to implementation once ideas have taken shape. The current counting of layers include convolutional layers and fully connected layers, thus, the architecture is a four-layer configuration.

References

- Hirokatsu Kataoka, Yudai Miyashita, Tomoaki Yamabe, Soma Shirakabe, Shin'ichi Sato, Hironori Hoshino, Ryo Kato, Kaori Abe, Takaaki Imanari, Naomichi Kobayashi, Shinichiro Morita, Akio Nakamura, "cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey", arXiv pre-print 1605.08247, May. 2016. [PDF]

- Hirokatsu Kataoka, Yudai Miyashita, Tomoaki Yamabe, Soma Shirakabe, Shin'ichi Sato, Hironori Hoshino, Ryo Kato, Kaori Abe, Takaaki Imanari, Naomichi Kobayashi, Shinichiro Morita, Akio Nakamura, "cvpaper.challenge in CVPR2015 - A review of CVPR2015", Special Talk at Pattern Recognition and Media Understanding (PRMU), Dec. 2015. [PDF] (in Japanese) [Slide]