## Discovering Clusters in Motion
Time-Series Data: |

The following is a brief overview of our approach to discovering clusters in motion time-series data. Readers are referred to the technical report for a detailed description of the approach.

The task of cluster analysis is to decompose or partition a data set into groups so that the items in one group are similar to each other and are as different as possible from items in other groups. The numerous clustering algorithms thus far proposed typically fall into one of the following three categories: hierarchical methods, partition-based methods, and probabilistic model-based methods. The method presented in this paper belongs to the class of probabilistic model-based methods, where data is assumed to have been generated by a finite mixture model. In particular, it is assumed that the sequences of observations are generated by a finite mixture of hidden Markov models (HMMs). Over the past two decades, HMMs have become popular in a number of applications, including speech recognition, bioinformatics, and computer vision to name a few. The main reasons for HMMs popularity is their ability to describe a wide range of series data with considerable variations, and their success in practical implementations.

Our primary goal is to find clusters within a sequence database, while at the same time estimating HMMs for each cluster. Stated differently, we want to assign sequences to those clusters whose HMMs best explain them. The problem of assigning a particular sequence to each cluster can be approached in two ways: via the k-means approach, or via the Expectation Maximization (EM) approach. In the k-means approach, a particular sequence is assigned to a single cluster, and then only those sequences that are assigned to a particular cluster are used for re-estimating its HMM parameters. These two steps are repeated until convergence. In the EM approach, a particular sequence can be partially assigned to all clusters, with the degree of cluster membership determined by the a posteriori probability, which depends via Bayes rule on the cluster a priori probability and the data likelihood. The cluster HMM parameters are then re-estimated, but in contrast to k-means, the parameter estimates of a single HMM cluster are influenced by all the observation sequences with the corresponding a posteriori probabilities. These two steps are repeated until convergence.

In the previous paragraph, we have assumed that the number of mixture components M is known. The problem of estimating the "correct'' number of clusters is a difficult one: a full Bayesian solution for obtaining the posterior probability on M, requires a complex integration over the HMM parameter space, as well as knowledge about the priors on the mixture parameters and about the priors on M itself. Often this integration cannot be solved in closed form, and Monte-Carlo methods and other approximation methods are used to evaluate it. These methods however are computationally intensive. Other methods that sacrifice some accuracy for efficiency, are the penalized likelihood approaches, where the log-likelihood term is penalized by subtraction of a complexity term. We use such a method that tries to find a model with Minimum Description Length (MDL).

Last Modified:January 14, 2003