Clustering Time-Series Gene Expression Data with Unequal Time Intervals

Rueda L., Bari A., Ngom A.

2nd International Conference on Bio-Inspired Models of Network, Information and Computing Systems, Budapest, Hungary, 10 - 13 December 2007, vol.5410, pp.100-123 identifier

  • Publication Type: Conference Paper / Full Text
  • Volume: 5410
  • City: Budapest
  • Country: Hungary
  • Page Numbers: pp.100-123
  • Middle East Technical University Affiliated: No


Clustering gene expression data given in terms of time-series is a challenging problem that imposes its own particular constraints, namely exchanging two or more time points is not possible as it would deliver quite different results, and also it would lead to erroneous biological conclusions. We have focused on issues related to clustering gene expression temporal profiles, and devised a novel algorithm for clustering gene temporal expression profile microarray data. The proposed clustering method introduces the concept of profile alignment which is achieved by minimizing the area between two aligned profiles. The overall pattern of expression in the time-series context is accomplished by applying agglomerative clustering combined with profile alignment, and finding the optimal number of clusters by means of a variant of a clustering index, which can effectively decide upon the optimal number of clusters for a given dataset. The effectiveness of the proposed approach is demonstrated on two well-known datasets, yeast and serum, and corroborated with a set of pre-clustered yeast genes, which show a very high classification accuracy of the proposed method, though it is an unsupervised scheme.