ARTICLE IN PRESS
JID: ESWA [m5G;July 24, 2015;16:20]
Expert Systems With Applications xxx (2015) xxx–xxx
Contents lists available at ScienceDirect
Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa
Audio parameterization with robust frame selection for improved bird identification
Thiago M. Venturab,c, Allan G. de Oliveiraa,b,c, Todor D. Gancheva,d,∗, Josiel M. de Figueiredob,c,Q1
Olaf Jahna,e, Marinez I. Marquesa,g, Karl-L. Schuchmanna,e,f,g aNational Institute for Science and Technology in Wetlands (INAU), Science without Borders Program (CsF), Federal University of Mato Grosso (UFMT), Av.
Fernando Corrêa da Costa 2367, Cuiabá-MT, Brazil b Institute of Computing, Federal University of Mato Grosso, Av. Fernando Corrêa da Costa 2367, Cuiabá-MT, Brazil c Institute of Physics, Federal University of Mato Grosso, Av. Fernando Corrêa da Costa 2367, Cuiabá-MT, Brazil dDepartment of Electronics, Technical University of Varna, str. Studentska 1, 9010, Varna Bulgaria e Zoological Research Museum A. Koenig, Adenauerallee 160, 53113, Bonn Germany fUniversity of Bonn, Regina-Pacis-Weg 3, D -53113, Bonn Germany g Institute of Biosciences, Federal University of Mato Grosso, Av. Fernando Corrêa da Costa 2367, Cuiabá-MT, Brazil a r t i c l e i n f o
Hidden Markov Model (HMM)
Mel Frequency Cepstral Coefficients (MFCCs)
Robust frame selection a b s t r a c t
A major challenge in the automated acoustic recognition of bird species is the audio segmentation, which aims to select portions of audio that contain meaningful sound events and eliminates segments that contain predominantly background noise or sound events of other origin. Here we report on the development of an audio parameterization method with integrated robust frame selection that makes use of morphological filtering applied on the spectrogram seen as an image. The morphological filtering allows to exclude from further processing certain audio events, which otherwise could cause misclassification errors. The Mel Frequency Cepstral Coefficients (MFCCs) computed for the selected audio frames offer a good representation of the spectral information for dominant vocalizations because the morphological filtering eliminates short bursts of noise and suppresses weak competing signals. Experimental validation of the proposed method on the identification of 40 bird species from Brazil demonstrated superior accuracy and faster operation than three traditional and recent approaches. This is expressed as reduction of the relative error rate by 3.4% and the overall operational time by 7.5% when compared to the second best result. The improved frame selection robustness, precision, and operational speed facilitate applications like multi-species identification of real-field recordings. © 2015 Elsevier Ltd. All rights reserved. 1. Introduction1
Biodiversity monitoring is a prerequisite for sustainable conserva-2 tion action and is particularly important in efforts to reduce the loss3 of species (Pereira et al., 2013). Traditionally, animal species distribu-4 tion, diversity, and population density are assessed with a variety of5 survey methods that are costly and limited in space and time (e.g.,6
Bibby, Burgess, Hill, & Mustoe, 2000; Jahn, 2011a, 2011b).7 ∗ Corresponding author at: Department of Electronics, Technical University of Varna,Q2 str Studentska 19010, Varna Bulgaria. Tel.: +359 888096974.
E-mail addresses: email@example.com (T.M. Ventura), firstname.lastname@example.org (A.G. de Oliveira), email@example.com, firstname.lastname@example.org, email@example.com (T.D. Ganchev), firstname.lastname@example.org (J.M. de Figueiredo), email@example.com (O. Jahn), firstname.lastname@example.org (M.I. Marques), email@example.com (Karl-L. Schuchmann).
Since many animals, such as crickets, cicadas, anurans, birds, 8 and certain mammals are more often heard than seen, one promis- 9 ing non-intrusive method for monitoring their presence and activ- 10 ity is the automated acoustic detection and identification. Remote 11 and autonomous survey methods can provide continuous informa- 12 tion on the presence/absence of rare and threatened species as well
Q3 13 as on the general status of biodiversity in a cost-effective way (e.g., 14
Sueur et al., 2008, Aide et al., 2013, Potamitis, Ntalampiras, Jahn, & 15
Riede, 2014; Ganchev, Jahn, Marques, de Figueiredo, & Schuchmann, 16 2015). Thus, the use of new technologies is considered as an oppor- 17 tunity for facilitating biodiversity monitoring efforts in remote and 18 difficult-to-access areas, such as the vast Pantanal wetlands of Brazil 19 (Schuchmann, Marques, Jahn, Ganchev, & Figueiredo, 2014). 20
Based on soundscapes, it is possible to identify the species that 21 are present in an area. However this is not a simple task, since 22 the amount of data to be analyzed is very large, reaching the or- 23 der of several terabytes per continuous annual cycle of recordings. 24 http://dx.doi.org/10.1016/j.eswa.2015.07.002 0957-4174/© 2015 Elsevier Ltd. All rights reserved.
Please cite this article as: T.M. Ventura et al., Audio parameterization with robust frame selection for improved bird identification, Expert
Systems With Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.07.002 2 T.M. Ventura et al. / Expert Systems With Applications xxx (2015) xxx–xxx
ARTICLE IN PRESS
JID: ESWA [m5G;July 24, 2015;16:20]
Consequently, data processing is lengthy and computationally ex-25 pensive (Oba, 2004). The principle prerequisites for large-scale ap-26 plication of soundscape analysis methods are an increased species27 recognition accuracy and reduction of the overall computational28 demands. For that purpose improvements, in the sense of accu-29 racy and speed, are required in the audio parameterization and the30 classification methods. In the present work we focus on the audio31 parameterization.32
Nowadays, the statistical machine learning approach dominates33 the field of bioacoustics. The audio signal is first parameterized and34 subsequently the statistical distribution of the audio parameters is35 modeled. The most widely used modeling techniques for acous-36 tic animal identification are based on the Hidden Markov Model37 (HMM) (Bardeli et al., 2010; Chu & Blumstein, 2011; Potamitis et al.,38 2014; Trifa, Kirschel, Taylor, & Vallejo, 2008) or its single-state ver-39 sion known as Gaussian Mixture Models (GMMs) (Ganchev et al.,40 2015; Henríquez et al., 2014). The success of the GMM- and HMM-41 based recognition method depends on the appropriateness of the42 audio parameterization process, particularly the segmentation and43 selection of representative portions of the species-specific sound44 emissions.45