Evolving Intelligent Systems

Evolving Clustering:An Asset for Evolving Systems

Abdelhamid Bouchachia


Evolving clustering (EC), which can be encountered in the literature under different names such as incremental clustering, dynamic clustering, online clustering, etc. has been studied for decades and remains an active area of research. EC is mainly intended for two situations: (i) data streams, where data arrive at some speed and the algorithm must process them under time and space constraints; (ii) static large datasets where data cannot be processed at once due to space limitation of the machine. Reflectively the spectrum of EC applications is broad and versatile in data mining and pattern recognition fields. The present contribution aims at shedding light on the principal avenues of evolving clustering. In particular, three aspects are presented: (i) concept definition of EC; (ii) Relevance to evolving systems (iii) short summary of EC methods.


Evolving systems (ESs) are systems that develop their structure, their functionality and their internal knowledge representation through continuous learning from data and interaction with the environment [24]. A clear distinction, however, must be made between evolving systems and evolutionary computation. In the latter the data is static and the goal is to find the optimal solution in an iterative manner and close-ended offline setting by evolving the population of possible solutions. In the former case, the setting is rather open-ended, meaning that data arrive dynamically over time and the solution must be further refined. Note that here we have one current solution, not a population of solutions.

The scale of evolving systems may range from complete systems to single algorithms. The literature pertaining to data mining, pattern recognition and machine learning shows that different names have been used to mean evolving systems: online learning, incremental learning, open-ended learning, single-pass learning, etc. Over the recent years tasks like clustering, classification, feature selection, association rule mining, have been re-thought from an incremental perspective to fit the requirements of data stream processing. The present contribution is concerned in particular with incremental clustering. Note that in the rest of this contribution, incremental, online and evolving will be used interchangeably since the first two are established names.

The goal of clustering is to uncover the hidden structure of data. In other terms, it aims partitioning unlabeled data objects into groups, called clusters, in a way that similar objects are assigned to the same group and distinct ones are assigned to different groups. Ideally, a good clustering algorithm should observe two main criteria:

A sheer mass of offline clustering algorithms based on (or partly based on) these two criteria have been proposed in the literature [35]. We can distinguish two types of clustering algorithms: (i) hard partitioning where data objects belong to one cluster and (ii)soft partitioning where objects belong all clusters to a certain degree. In addition, there are classes of algorithms: (i) partitional where data are split into a predefined number of clusters according to some criteria and (ii) hierarchical where the output takes the form of a tree (dendrogram) and can be performed either in a divisive (top-down) or aggregative (bottom-up) manner [23].

However, these offline algorithms do not fit the setting of evolving systems. Hence, online versions of various algorithms have been developed. It is easy to see that online clustering (OC) is a vital asset for evolving systems dedicated to dynamic optimization, dynamic prediction, dynamic diagnostic and dynamic identification and many other system modeling fields. On the other hand, because clustering can also be just one step in a multi-step complex system, its importance grows. In applications like control, artificial vision, surveillance, etc., clustering quality is central for the well-functioning of the whole multi-step system. Adding the time dimension to such applications renders clustering dynamic and hence challenging. For instance, in ubiquitous environments, dynamic clustering of distributed, parallel, heterogeneous and spatio-temporal data is a particularly interesting problem for a number of applications.

The concept of “dynamic” here refers to time and change in an open-ended environment. Online clustering is very relevant when storage capacities (memory) are very limited and when data arrive over long periods of time. More details will follow in Sec. II.

The rest of this contribution is organized in three sections. Section II describes the concept of online clustering, its requirements and generic steps. Section III highlights some of the existing online clustering approaches without claiming any exhaustiveness. Section ?? concludes this contribution.


Online clustering is based on the idea of dynamic change. It is meant for organizing data that come over time sequentially into clusters. Ideally the clusters are compact and well separated taking account of time and space constraints. Time constraint defines the (pseudo) real-time nature of data processing. Data should be clustered online as they become available. On the other hand, space constraints refers to the necessity to free memory after data have been processed such that the algorithm will not be able to see those data (or most of them) in the future.

Usually online clustering is required to cope with non-stationary changing environments by employing adaptive mechanisms to accommodate changes in the data. That is, the algorithm should self-adapt to the new data samples which may convey a changing situation and at the same time should keep in memory only relevant information (e.g., the model) that had been learned in the remote past and using it as an old experience to bias clustering in the future with the overall goal of increasing the system’s effectiveness.

Therefore, online clustering relies on several assumptions among which online tuning is fundamental. In a nutshell, online clustering should fulfill the following conditions that actually apply to online learning in general [6], [8], [9]:

These properties are not always satisfied by the existing online clustering algorithms. It is important here to note that incrementality can take mainly two forms:

Moreover OC embraces different scenarios of incrementality:

A. Generic Stages of OC

Disregarding details which may be specific to the computational paradigm and peculiarities underlying the algorithm, an online clustering algorithm usually consists of the following generic stages which describe the leader algorithm [21] followed by almost all OC algorithms.

1) Matching: Depending on the form this operation may differ slightly. In the point-based form, the incoming example is compared to the existing clusters to find the closest one. This requires a distance measure (or a similarity measure, conditional probability, etc.). In the batch-based form, this stage may not be relevant.

2) Accommodation of new data: The new sample is used to update the winner provided that the sample lies in the boundary (region of influence) of winner. Otherwise a new cluster is initialized with that sample. In the batch-based form, batches are clustered independently yielding a set of local clusters.

3) Refinement of the model: If the number of clusters exceeds a pre-fixed quantity, either one cluster is deleted after some criterion (age, staleness, etc.) or a pair of clusters is merged into one based on some measure. In the batch-based form, the local clusters generated are merged with the exiting ones. In some situations, clusters may be split to fit certain quality constraints (e.g., to avoid strong overlap between clusters, to enhance the compactness, etc.).

Clearly online clustering may not be optimal but stands just an approximation which may deviate from the true structure of data. The main reason, often raised in the literature, is the order of data presentation especially in the point-based clustering case [12]. Of course only a local optimality may be achieved since the algorithm sees the data only once. Algorithms considering data drift use the treatment techniques discussed earlier.


The following summary of the methods is rather illustrative and is by no means exhaustive. The summary is organized according to 3 clustering models. Other (hybrid) models are not mentioned here.

A. Neural Models

One of the first incremental clustering algorithms was the Leader algorithm [21] mentioned earlier. It uses a threshold to determine whether a new pattern can be assigned to an existing cluster or whether it should form a new one by itself. The Leader algorithm has gained popularity because of its neural network implementation, the ART network [19]. Similar to ART, other incremental neural networks have been proposed like generalized fuzzy min-max neural networks [17], MaxNet [35], evolving self-organizing maps [11], [30], [31], growing neural gas [16] and many variants thereof. All of these methods are based on the concept of competitive learning and vector quantization.

B. Mixture-based Models

An overview of these methods can be found in [7]. There are mainly two categories of online growing Gaussian mixture models: refinement-based methods and learning methods.

1) Refinement-based methods: Usually refinement of the clustering model after accommodating a data sample is realized by means of a merge and/or split operations that combine similar components of GMM. Many merge criteria have been used, e.g., Hotellings T2-statistic [32], fidelity [13], weighting [20], Chernoff bound [20], Kullback-Leibler measure [29], mutual information [37], minimum description length [3].

2) Learning methods: In this category of methods, the idea is to apply an online variant of the Expectation-Maximization algorithm [28] to train the Gaussian mixtures. Examples of such class of methods can be found in [34], [26], [33].

C.Objective Function-based Models

Usually objective function based clustering proceeds in an iterative manner to optimize the goal function. This may not lend for an online implementation of such algorithms. To surmount this difficulty, authors tends to introduce some simplification and assumptions to avoid iterating over data. A set of algorithms based on K-means and Fuzzy C-means have been proposed like in [4], [18], [1], [14]. The partition matrix and the prototypes are recursively updated as new data become available. Other online clustering algorithms can be more or less directly based an objective function such [2], [5], [38].


The present contribution is a very brief survey on evolving clustering as an important asset for evolving systems. Online clustering can be applied to single tasks as well as to complex systems where data come over time in a continuous stream. The relevance of online clustering to evolving systems and its characterization have been shown. There exists a great number of online clustering algorithms that have been proposed over the recent years. They rely on various computational intelligence machineries. It is expected that online clustering in dynamic environments inhabited by evolving systems remains a challenging problem.


[1]Charu C. Aggarwal, Jiawei Han, Jianyong Wang, and Philip S. Yu. A framework for clustering evolving data streams. In Proceedings of the 29th international conference on Very large data bases - Volume 29, pages 81–92, 2003.
[2] P. Angelov and D. Filev. Flexible models with evolving structure. Int. J. Intell. Syst., 19(4):327–340, 2004.
[3]O. Arandjelovic and R. Cipolla. Incremental learning of temporally coherent gaussian mixture models. In Proceedings of the The 16th British Machine Vision Conference, pages 759–768, 2005.
[4]L. Bottou and Y. Bengio. Convergence properties of the k-means algorithm. Advances in Neural Information Processing Systems, 7:585–592, 1995.
[5]H. Boubacar, S. Lecoeuche, and S. Maouche. Self-adaptive kernel machine: online clustering in rkhs. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, volume 3, pages 1977 – 1982, 2005
[6]A. Bouchachia. Incremental learning. In Encyclopedia of Data Warehousing and Mining, pages 1006–1012. Idea-Group, 2009.
[7]A. Bouchachia. An evolving classification cascade with self-learning. Evolving Systems, 1(3):143–160, 2010.
[8]A. Bouchachia. Fuzzy classification in dynamic environments. Soft Comput., 15(5):1009–1022, 2011.
[9]A. Bouchachia. Incremental learning with multi-level adaptation. Neurocomputing, 74(11):1785–1799, 2011.
[10]A. Bouchachia and R. Mittermeir. Towards fuzzy incremental classifiers. Soft Computing, 11(2):193–207, 2006.
[11]D. Da Deng and N. Kasabov. On-line pattern analysis by evolving self-organizing maps. Neurocomputing, 51:87 – 103, 2003.
[12] I. Dagher, M. Georgiopoulos, G.L. Heileman, and G. Bebis. An ordering algorithm for pattern presentation in fuzzy artmap that tends to improve generalization performance. IEEE Transactions on Neural Networks, 10(4):768–778, 1999.
[13]A. Declercq and J. Piater. Online learning of gaussian mixture models - a two-level approach. In Proceedings of the Third International Conference on Computer Vision Theory and Applications, pages 605–611, 2008.
[14]D. Dovzan and I. Igor Skrjanc. Recursive clustering based on a gustafsonkessel algorithm. Evolving Systems, 2(1):15–24, 2011.
[15]D. Fisher. Knowledge Acquisition via Incremental Conceptual Clustering. Machine Learning, 2:139–172, 1987.
[16]B. Fritzke. A growing neural gas network learns topologies. In Advances in neural information processing systems, pages 625–632, 1995.
[17]B. Gabrys and A. Bargiela. General fuzzy min-max neural network for clustering and classification. IEEE Trans. on Neural Networks, 11(3):769–783, 2000.
[18]O. Georgieva and F. Klawonn. Dynamic data assigning assessment clustering of streaming data. Appl. Soft Comput., 8(4):1305–1313, 2008.
[19]S. Grossberg. Nonlinear neural networks: principles, mechanism, and architectures. Neural Networks, 1:17–61, 1988.
[20]P. Hall and Y. Hicks. A method to add gaussian mixture models. Technical report, University of Bath, 2004.
[21]J. Hartigan. Clustering Algorithms. John Wiley and Sons, New York, 1975.
[22]J. J. Weng, Y. Zhang, and W. Hwang. Candid covariance free incremental principal component analysis. IEEE Trans. Pattern Anal.Mach. Intell, 25(8):1034–1040, 2003.
[23]A. Jain, M. Murty, and P. J. Flynn. Data clustering: a review. ACM Comput. Surv., 31:264–323, 1999.
[24]N. Kasabov. Foundations of Neural Networks, Fuzzy Systems and Knowledge Engineering. MIT Press, 1996.
[25]L. Kuncheva. Classifier ensembles for changing environments. In In Proc. of the 5th international workshop on multiple classifier systems, pages 1–15, 2004.
[26]D. Lee. Effective Gaussian mixture learning for video background subtraction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5):827–832, 2005.
[27]S. Ozawa, S. Pang, and N. Kasabov. On-line feature selction for adaptive evolving connectionist systems. International Journal of Innovative Computing, Information and Control, 2(1):181–192, 2006.
[28]N. Radford and G. Hinton. Learning in graphical models, chapter A view of the EM algorithm that justifies incremental, sparse, and other variants, pages 355–368. MIT Press, 1999.
[29]S. Roberts, C. Holmes, and D. Denison. Minimum-entropy data partitioning using reversible jump markov chain monte carlo. IEEE Trans. Pattern Anal. Mach. Intell., 23(8):909–914, 2001.
[30]N. Rougier and Y. Yann Boniface. Dynamic self-organising map. Neurocomputing, 74(11):1840–1847, 2011.
[31]J. Rubio. Sofmls: Online self-organizing fuzzy modified least-squares network. IEEE T. Fuzzy Systems, 17(6):1296–1309, 2009.
[32]M. Song and H Wang. Highly efficient incremental estimation of gaussian mixture models for online data stream clustering. In Intelligent Computing: Theory and Applications, pages 174–183. SPIE, 2005.
[33]C. Stauffer and W. Grimson. Adaptive background mixture models for real-time tracking. Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, 2:246–252, 1999.
[34]C. Stauffer and W. Grimson. Learning patterns of activity using real-time tracking. IEEE Trans. Pattern Anal. Mach. Intell., 22(8):747–757, 2000.
[35]S. Theodoridis and K. Koutroumbas. Pattern Recognition. Academic Press, 2006.
[36]A. Tsymbal. The problem of concept drift: definitions and related work. Technical Report TCD-CS-2004-15, Dept. of Computer Science Trinity College, Dublin, 2004.
[37]Z. Yang and M. Zwolinski. Mutual information theory for adaptive mixture models. IEEE Trans. Pattern Anal. Mach. Intell., 23(4):396–403, 2001.
[38]W. Yu and X. Li. On-line fuzzy modeling via clustering and support vector machines. Inf. Sci., 178(22):4264–4279, 2008.
[39]H. Zhao, P. Yuen, and J. Kwok. A novel incremental principal component analysis and its application for face recognition. IEEE Trans Syst Man Cybern B Cybern., 36(4):873–886, 2006.