ICML 2014 Workshop on Unsupervised Learning from Bioacoustic Big Data (uLearnBio)

Aim and Scope

The general topic of uLearnBio is probabilistic machine learning from bioacoustic data. It mainly focuses on unsupervised learning from bioacoustic data. Unsupervised learning approaches aim at automatically acquiring "knowledge" from data for representation, analysis, etc. One of the main goals in unsupervised learning is clustering/segmentation. Clustering is indeed one of the essential tasks in statistics and machine learning and one of the most popular and successful approaches in cluster analysis is the mixture model-based clustering approach. The model-based clustering approach is known by its well-established theoretical background and the associated efficient estimation algorithms such as Expectation- Maximization algorithms. The problem of selecting the number of mixture components can be tackled thanks to model selection criteria such as BIC, AIC, ICL, etc.

Another probabilistic alternative for cluster analysis is the one based on Bayesian Non-Parametrics (BNP), in particular the Infinite Gaussian Mixture Model (IGMM) formulation, Chinese Restaurant Process (CRP) mixtures and Dirichlet Process Mixtures (DPM). The non-parametric alternative avoids assuming restricted functional forms and thus allows the complexity and accuracy of the inferred model to grow as more data is observed. It also represents an alternative to the difficult problem of model selection in model-based clustering models by inferring the number of clusters from the data as the learning proceeds. One the main current concerns for all these approaches which are confronted with the big data problem, is to scale them up.

This workshop offers an excellent framework to see how these parametric and nonparametric probabilistic models for cluster analysis can perform to learn from complex real bio-acoustic data. Data issued from bird songs, whale songs, will be provided in the framework of challenges as in our previous ICML and NIPS Workshops on learning from bio-acoustic data (ICML4B and NIPS4B).

In recent years, the majority of the existing applications lend themselves to advanced acoustic signal processing methodologies, our efforts are successfully integrating robust processing and machine learning algorithms for scaled analysis of these abundant recordings. Major issues such as data repositories and the need for standardizations within the bioacoustics field discussed and addressed.

uLearnBio will bring ideas on how to proceed in understanding bioacoustics to provide methods for biodiversity indexing. The scaled bio-acoustic data science is a novel challenge for artificial intelligence that requires new methods. Big data scientists are today invited to look into that data using advanced methods to definitely new knowledge about this important species. Large cabled submarine acoustic observatory deployments permit data to be acquired continuously, over long time periods. For examples, submarine Neptune observatory in Canada, Antares or Nemo neutrino detectors (see NIPS4B proceedings) are 'big data' challenges to the scientists. Automated analysis, including clustering/segmentation and structuration of acoustic signals, event detection, data mining and machine learning to discover relationships among data streams are techniques which promise to aid scientists in discoveries in an otherwise overwhelming quantity of acoustic data.

The topics of this workshop cover (but are not limited to):

Unsupervised Generative Learning
Latent data Models
Model-based clustering
Bayesian Non-parametric clustering
Bayesian sparse representation

Applied to

Big Bio-acoustic data clustering/structuration
Species clustering (birds, etc)
Whale song clustering/decomposition
Bird Song clustering/decomposition

uLearnBio: Workshop on Unsupervised Learning from Bioacoustic Big Data

ICML 2014: The 31st International Conference on Machine Learning

26 June 2014, Beijing, China

Aim and Scope