Discovering Clusters in Motion Time-Series Data:

IVC Logo

Experimental Evaluation

The EM-based HMM-clustering algorithm described in the previous section was implemented in Matlab using Kevin Murphy's HMM toolbox. Two matlab functions were implemented: one for learning mixture of HMMs with discrete observations, and the other for learning mixture of HMMs with continuous mixture of Gaussians observations. Incidentally, all the experiments reported in this section make use of the second function, where each observation is modeled with a single (not a mixture) univariate Gaussian density.

The measure used for testing the validity of our clustering results is classification accuracy. The reason is that for all our experiments ground-truth is available, so the majority class of each cluster can be associated with the cluster itself, which enables the computation of classification accuracy. When ground-truth is not available, or when it is not known whether the data points can be naturally clustered, other validity measures should be employed.

The purpose of the experiments with simulated data is to compare performance of the k-means and the EM-based approaches for clustering sequences with HMMs. The purpose of the experiments with real data is to to demonstrate the usefulness of HMM-based clustering in a number of applications.

Experiment with Synthetic Data

In this experiment, 100 sequences of length 200 are generated from a 2-component HMM mixture (50 sequences from each component). Both HMMs are modeled with two states, and a 1-d Gaussian observation density, in a manner similar to \cite{Smyth97}. Using DTW for initial clustering, both models were initialized in the standard way: uniform priors, uniform transition matrices, and means and variances are estimated using k-means. In this experiment the amount of overlap between the generating models is varied by varying the mean separation between the Gaussian outputs of the two states, in a similar way for both HMMs, leaving all the other parameters fixed.

The HMMs' dynamics are

A1 = [0.6 0.4     A2 = [0.4 0.6
          0.4 0.6]              0.6 0.4]

and for both HMMs the standard deviations of the Gaussians are kept fixed $\sigma_1 = \sigma_2 = 1$, the mean of the first state was kept fixed $\mu_1 = 0$, and the mean of the second state (for both HMMs) varied in the range $\mu_2 = (1.00, 1.25, 1.50, 1.75, 2.00, 2.25, 2.50, 2.75, 3.00)$, a total of nine values. This corresponds to a change in the standardized measure of separation between the two Gaussians. For each of the nine values, 50 trials were run, and for each trial, classification accuracy was computed by associating the majority class in each cluster with the cluster itself. Example sequences are shown in Fig. Note their similarity. The results are summarized in Tab. 1.

Classification Accuracy(%) Method

Standardized Mean Separation

1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00
60 True 49 50 50 50 50 50 50 50 50
EM 0 1 10 23 38 48 49 50 47
k-mean 1 1 8 13 27 29 30 44 49
90 True 0 0 3 28 44 50 50 50 50
EM 0 0 0 1 22 46 49 47 47
k-mean 0 0 0 0 21 29 30 44 49

Tab. 1: EM vs. k-means classification accuracy. Each entry in the table is the number of trials for which the method achieved the specified classification accuracy as a function of the separation between the observation Gaussian densities.

Model selection was applied only to the data generated in the 50 trials corresponding to mean separation  = 3.00. The number of clusters varied between the values 1 and 6. In 46 out of the 50 trials the EM algorithm correctly found 2 HMM clusters. The k-means algorithm found the correct number of clusters in all of the trials.

Experiments with Real Data

We conducted experiments with Gait data that was collected for the Human ID project at the computational perception lab at Georgia Tech. This database is available online at This database consists of video sequences (and accompanying data files) of 20 walking human subjects taken under various viewing conditions (namely, different combinations of indoor/outdoor footage, side/angle view, and far/near view.) We used a subset of this database for our experiment: a total of 45 sequences of 15 subjects (3 sequences per subject), for which binary masks, extracted using a background subtraction algorithm, was available.

All sequences were taken under the same viewing conditions: indoor footage, and far side view. Fig. 3 shows example frames of two out of the 45 sequences pertaining to subjects 3 and 5. Binary masks are shown below their corresponding frames. For each binary mask we computed the aspect ratio of its bounding box, and thus for each of the 45 sequences we obtained an aspect ratio time-series or signal. An average smoothing filter of size five was applied to the time-series. The resulting smoothed aspect ratio time signals for subjects 3 and 5 are depicted in Fig. 4

The model selection criterion was tested with 5, 10, and 15 clusters, the true number of classes. The EM algorithm selected the value M=5, and the k-means algorithm selected M=10. Model selection underestimated the number of clusters because the aspect ratio is not the most discriminative feature, and in such cases MDL will tend to simplify the model even more. In what follows, we chose to ignore the estimated number of clusters and initialized the algorithms with the true number of classes, 15.

In order to reduce the sensitivity of the HMM-clustering algorithm to the initial grouping, we tried two initializations: Smyth's method, and Dynamic time warping. Both methods use agglomerative hierarchical clustering, and the clusters are formed by cutting the hierarchy at level 15. Smyth's method uses the Kullback-Leibler (KL) divergence as the dissimilarity measure, while DTW uses the warping distance as the dissimilarity measure. Using the results of these algorithms as the initial grouping, we run our two versions of HMM clustering algorithm, namely k-means and EM.

The input time-series depicted in Fig. 4 can be modeled as sine waves plus noise. We selected a cyclic HMM structure with four states that corresponds to this sine wave. More details on this particular structure and the motivating reasons for choosing it can be found in the technical report.

The results of this experiment are summarized in Table 2. The results indicate that the EM and k-means procedures improve the initial clusters obtained from DTW. Smyth's procedure yields good initial clusters. EM and k-means assigned a few more sequences to clusters with similar sequences, thus reducing the number of clusters and consequently the classification accuracy. The classification accuracies of the clustering algorithms are very similar to the classification accuracy obtained using supervised learning and the nearest-neighbor leave-one-out cross-validation method. In other words, the clustering-based classifier achieved comparable performance, but without the need for class labeling provided in the supervised learning approach. This is an encouraging result.


© 2003 Image and Video Computing Group - Boston University
Last Modified: January 17, 2003