Comic storyboard extraction via edge segment analysisby Yongtao Wang, Yafeng Zhou, Dong Liu, Zhi Tang

Multimed Tools Appl


Media Technology / Computer Networks and Communications / Hardware and Architecture / Software


Comic storyboard extraction via edge segment analysis

Yongtao Wang1 & Yafeng Zhou1 & Dong Liu1 & Zhi Tang1

Received: 31 October 2014 /Revised: 30 April 2015 /Accepted: 6 May 2015 # Springer Science+Business Media New York 2015

Abstract Comic storyboard extraction aims to decompose the comic image into several storyboards (or frames), which is the key technique to produce the digital comic documents suitable for mobile reading. Previous methods fail either to detect overlapped storyboards or to produce storyboards without blank margins. To tackle these problems, we propose a novel comic storyboard extraction method based on edge segment analysis. First, we extract edge segments (i.e. contiguous chains of Canny edge points) from the input comic image; second, we detect line segments within each obtained edge segment with a top-down scheme; third, we detect storyboards through line segments combination and storyboard validation, and perform post-processing to handle some special cases. We test the proposed method on two datasets comprising 2237 comic pages from 11 printed comic series. Experimental results demonstrate that the proposed method achieves satisfactory results and outperformed the existing methods on the storyboard and page level.

Keywords Comic storyboard extraction . Quadrangle detection . Edge segment detection .

Line segment detection 1 Introduction

As a special kind of entertainment publications, comics are popular among people of different ages all over the world. With the development of mobile devices such as smart phones, tablets and e-book readers, more and more people read scanned copies of comic books on mobile devices. Due to the features of comic layouts and the limitations of mobile devices such as screen size and resolution, it is not suitable to display an entire comic page on mobile devices.

To satisfy the requirement of producing comic contents for mobile reading, comic images should be rearranged using page layout analysis. As shown in Fig. 1, an intuitive solution to this problem is content adaptation [17], i.e. converting the existing comic page contents into digital documents suitable for displaying on mobile devices.

Multimed Tools Appl

DOI 10.1007/s11042-015-2680-8 * Yongtao Wang 1 Institute of Computer Science and Technology, Peking University, Beijing, China

Previous methods mainly used three strategies: (1) Recursively cut comic pages by detecting division lines; (2) Segment storyboards using connected component detection; (3)

Identify storyboard quadrangles or enclosing boxes. We will briefly review the works using each strategy.

Tanaka [15], Chan [4], Han [6] and Ishii [8] utilized division lines to cut comic images recursively. To the best of our knowledge, Tanaka et al. [15] first tried to detect division lines using density gradient, and attained a tree structure by recursively cutting the comic page using division lines. Finally, they decided the reading order of storyboards by traversing the tree.

They assumed that the background was pure white and storyboards were separated by white regions, which greatly limited the application of their method. Similarly, Chan et al. [4] proposed an approach to cut comic image recursively into parts along stripes (long lines with uniform color) between two storyboards. This method still cannot deal with overlapped storyboards. Han et al. [6] combined recursive x-y cut with multi-layer perceptron to better segment comic pages. They introduced machine learning techniques, but x-y cut based algorithms could not detect diagonal division lines. Ishii and Watanabe [8] introduced Harris corner point detection to locate storyboards accurately. In general, the division line based methods cannot handle the blank margins near the detected storyboards very well even with

Harris corner point detection.

Connected component analysis was used by Ponsard [13], Arai [2], Ngo ho [7] and Rigaud [14]. Ponsard and Fries [13] binarized the image and segmented it with the watershed algorithm. After the watershed segmentation, they removed small foreground regions and merged overlapped foreground regions. The remaining foreground regions were considered storyboards. This method only applied to non-overlapped storyboards with complete borderlines. Arai and Tolle [2] introduced blob detection and added division line detection to separate two overlapped storyboards. Their method could not generalize to multiple overlapped storyboards. Ngo ho et al. [7] utilized morphological operations to remove overlapped drawings. To be precise, they repeated dilation and erosion N times until the threshold was met. This method is quite ad-hoc with respect to the selection of N and the structuring element.

What’s more, mathematical morphological operations are not qualified for real-time application and will distort original shape. Rigaud et al. [14] binarized the image with the computed threshold, and then performed connected component labelling. Next, they classified bounding boxes of connected components into three types (storyboard, text and noise) using k-means.

Fig. 1 Illustration for comic content adaptation on a mobile device. The comic page shown in the left is segmented into seven storyboards, and then the storyboards are displayed sequentially on the mobile device according to their reading order. (Source (title, author, original publisher, publisher of this edition, volume or episode, page number): The Prince of Tennis, Takeshi Konomi, Shueisha, Ching Win, volume 1, page 99)

Multimed Tools Appl

For overlapped storyboards, they applied mathematical morphological operations as Ngo et al. [7] did. Consequently, their method has most drawbacks of Ngo’s.

Liu et al. [11] and Li et al. [9,10] detected quadrangles by combining line segments. Li et al. [9,10] extracted line segments of the image using LSD algorithm. Due to the redundancy of extracted line segments, they performed two rounds of clustering. Last, they performed postprocessing on them to get accurate storyboards. Liu et al. [11] improved the edge and line segment extraction method for comic images. They analyzed edge segments and decided whether the edge segment belonged to non-overlapped or overlapped storyboard. Last, they selected four line segments as borderlines to form quadrilateral storyboards and identified the reading order. These methods all needed to extract edge segments and line segments, but further processes were quite different. Such approaches assumed that the storyboards were all quadrilateral polygons and enclosed by borderlines composed of straight line segments. Consequently, they would fail when the storyboards did not have enclosing borderlines. Notably, [11] used neighboring storyboards to infer the storyboard without enclosing borderlines.