Evolving Intelligent Systems

Evolving clustering (EC), which can be encountered in the literature under different names such as incremental clustering, dynamic clustering, online clustering, etc. has been studied for decades and remains an active area of research. EC is mainly intended for two situations: (i) data streams, where data arrive at some speed and the algorithm must process them under time and space constraints; (ii) static large datasets where data cannot be processed at once due to space limitation of the machine. Reﬂectively the spectrum of EC applications is broad and versatile in data mining and pattern recognition ﬁelds. The present contribution aims at shedding light on the principal avenues of evolving clustering. In particular, three aspects are presented: (i) concept deﬁnition of EC; (ii) Relevance to evolving systems (iii) short summary of EC methods.

Evolving systems (ESs) are systems that develop their structure, their functionality and their internal knowledge representation through continuous learning from data and interaction with the environment [24]. A clear distinction, however, must be made between evolving systems and evolutionary computation. In the latter the data is static and the goal is to ﬁnd the optimal solution in an iterative manner and close-ended ofﬂine setting by evolving the population of possible solutions. In the former case, the setting is rather open-ended, meaning that data arrive dynamically over time and the solution must be further reﬁned. Note that here we have one current solution, not a population of solutions.

The scale of evolving systems may range from complete systems to single algorithms. The literature pertaining to data mining, pattern recognition and machine learning shows that different names have been used to mean evolving systems: online learning, incremental learning, open-ended learning, single-pass learning, etc. Over the recent years tasks like clustering, classiﬁcation, feature selection, association rule mining, have been re-thought from an incremental perspective to ﬁt the requirements of data stream processing. The present contribution is concerned in particular with incremental clustering. Note that in the rest of this contribution, incremental, online and evolving will be used interchangeably since the ﬁrst two are established names.

The goal of clustering is to uncover the hidden structure of data. In other terms, it aims partitioning unlabeled data objects into groups, called clusters, in a way that similar objects are assigned to the same group and distinct ones are assigned to different groups. Ideally, a good clustering algorithm should observe two main criteria:

- Minimization of intra-cluster distance
- Maximization of the inter-cluster distance

A sheer mass of ofﬂine clustering algorithms based on (or partly based on) these two criteria have been proposed in the literature [35]. We can distinguish two types of clustering algorithms: (i) hard partitioning where data objects belong to one cluster and (ii)soft partitioning where objects belong all clusters to a certain degree. In addition, there are classes of algorithms: (i) partitional where data are split into a predeﬁned number of clusters according to some criteria and (ii) hierarchical where the output takes the form of a tree (dendrogram) and can be performed either in a divisive (top-down) or aggregative (bottom-up) manner [23].

However, these ofﬂine algorithms do not ﬁt the setting of evolving systems. Hence, online versions of various algorithms have been developed. It is easy to see that online clustering (OC) is a vital asset for evolving systems dedicated to dynamic optimization, dynamic prediction, dynamic diagnostic and dynamic identiﬁcation and many other system modeling ﬁelds. On the other hand, because clustering can also be just one step in a multi-step complex system, its importance grows. In applications like control, artiﬁcial vision, surveillance, etc., clustering quality is central for the well-functioning of the whole multi-step system. Adding the time dimension to such applications renders clustering dynamic and hence challenging. For instance, in ubiquitous environments, dynamic clustering of distributed, parallel, heterogeneous and spatio-temporal data is a particularly interesting problem for a number of applications.

The concept of “dynamic” here refers to time and change in an open-ended environment. Online clustering is very relevant when storage capacities (memory) are very limited and when data arrive over long periods of time. More details will follow in Sec. II.

The rest of this contribution is organized in three sections. Section II describes the concept of online clustering, its requirements and generic steps. Section III highlights some of the existing online clustering approaches without claiming any exhaustiveness. Section ?? concludes this contribution.

Online clustering is based on the idea of dynamic change. It is meant for organizing data that come over time sequentially into clusters. Ideally the clusters are compact and well separated taking account of time and space constraints. Time constraint deﬁnes the (pseudo) real-time nature of data processing. Data should be clustered online as they become available. On the other hand, space constraints refers to the necessity to free memory after data have been processed such that the algorithm will not be able to see those data (or most of them) in the future.

Usually online clustering is required to cope with non-stationary changing environments by employing adaptive mechanisms to accommodate changes in the data. That is, the algorithm should self-adapt to the new data samples which may convey a changing situation and at the same time should keep in memory only relevant information (e.g., the model) that had been learned in the remote past and using it as an old experience to bias clustering in the future with the overall goal of increasing the system’s effectiveness.

Therefore, online clustering relies on several assumptions among which online tuning is fundamental. In a nutshell, online clustering should fulﬁll the following conditions that actually apply to online learning in general [6], [8], [9]:

- OC should accommodate plasticity by adapting the current model using the new data.
- OC should ensure stability by avoiding forgetting the previously induced model (as long as it is consistent with the data).
- OC should use only new data and previously used data should not be used anymore in subsequent stages once processed.
- OC should enable the new data to refer either to the already known structure of the model or to a new structure.
- OC does not assume any prior knowledge about the data and the associated statistical properties.
- OC does not assume any prior knowledge about the topological structure of the model. The algorithm must call for self-organization so that it can incrementally tune its structure.

These properties are not always satisﬁed by the existing online clustering algorithms. It is important here to note that incrementality can take mainly two forms:

- Point-based: Only data objects are individually and sequentially presented to the clustering algorithm or
- Batch-based: Batches of data objects are sequentially presented to the system The ﬁrst scenario is actually the true incremental, because it works on a point-basis. The second scheme may be applied even for re-clustering as can be encountered in many studies.

Moreover OC embraces different scenarios of incrementality:

- New features: It may happen that future data examples are additionally described by new features. Moreover, features describing the data may not be all worth using during the clustering process. Discriminating features at time t may not be so at time t+1. Hence incremental feature selection and reduction methods are required to conduct clustering based on the meaningful features [10], [22], [27], [39].
- New data: the usual and main scheme of incrementality is of course the one corresponding to arrival of data online in a stream. OC has to accommodate new data as they become available [2], [15], [17], [21].
- Evolving clusters: the structure of the clustering model changes over time as new data is accommodated. The structure can be dynamic due to non-stationarity which may emerge from data drift phenomenon. There are different types of drift (random, gradual, abrupt and cyclic) operating at various change rate (slow, medium, fast) and many methods for handling each type of drift [8], [9], [25], [36].

Disregarding details which may be speciﬁc to the computational paradigm and peculiarities underlying the algorithm, an online clustering algorithm usually consists of the following generic stages which describe the leader algorithm [21] followed by almost all OC algorithms.

1) Matching: Depending on the form this operation may differ slightly. In the point-based form, the incoming example is compared to the existing clusters to ﬁnd the closest one. This requires a distance measure (or a similarity measure, conditional probability, etc.). In the batch-based form, this stage may not be relevant.

2) Accommodation of new data: The new sample is used to update the winner provided that the sample lies in the boundary (region of inﬂuence) of winner. Otherwise a new cluster is initialized with that sample. In the batch-based form, batches are clustered independently yielding a set of local clusters.

3) Reﬁnement of the model: If the number of clusters exceeds a pre-ﬁxed quantity, either one cluster is deleted after some criterion (age, staleness, etc.) or a pair of clusters is merged into one based on some measure. In the batch-based form, the local clusters generated are merged with the exiting ones. In some situations, clusters may be split to ﬁt certain quality constraints (e.g., to avoid strong overlap between clusters, to enhance the compactness, etc.).

Clearly online clustering may not be optimal but stands just an approximation which may deviate from the true structure of data. The main reason, often raised in the literature, is the order of data presentation especially in the point-based clustering case [12]. Of course only a local optimality may be achieved since the algorithm sees the data only once. Algorithms considering data drift use the treatment techniques discussed earlier.

The following summary of the methods is rather illustrative and is by no means exhaustive. The summary is organized according to 3 clustering models. Other (hybrid) models are not mentioned here.

One of the ﬁrst incremental clustering algorithms was the Leader algorithm [21] mentioned earlier. It uses a threshold to determine whether a new pattern can be assigned to an existing cluster or whether it should form a new one by itself. The Leader algorithm has gained popularity because of its neural network implementation, the ART network [19]. Similar to ART, other incremental neural networks have been proposed like generalized fuzzy min-max neural networks [17], MaxNet [35], evolving self-organizing maps [11], [30], [31], growing neural gas [16] and many variants thereof. All of these methods are based on the concept of competitive learning and vector quantization.

An overview of these methods can be found in [7]. There are mainly two categories of online growing Gaussian mixture models: reﬁnement-based methods and learning methods.

1) Reﬁnement-based methods: Usually reﬁnement of the clustering model after accommodating a data sample is realized by means of a merge and/or split operations that combine similar components of GMM. Many merge criteria have been used, e.g., Hotellings T2-statistic [32], ﬁdelity [13], weighting [20], Chernoff bound [20], Kullback-Leibler measure [29], mutual information [37], minimum description length [3].

2) Learning methods: In this category of methods, the idea is to apply an online variant of the Expectation-Maximization algorithm [28] to train the Gaussian mixtures. Examples of such class of methods can be found in [34], [26], [33].

Usually objective function based clustering proceeds in an iterative manner to optimize the goal function. This may not lend for an online implementation of such algorithms. To surmount this difﬁculty, authors tends to introduce some simpliﬁcation and assumptions to avoid iterating over data. A set of algorithms based on K-means and Fuzzy C-means have been proposed like in [4], [18], [1], [14]. The partition matrix and the prototypes are recursively updated as new data become available. Other online clustering algorithms can be more or less directly based an objective function such [2], [5], [38].

The present contribution is a very brief survey on evolving clustering as an important asset for evolving systems. Online clustering can be applied to single tasks as well as to complex systems where data come over time in a continuous stream. The relevance of online clustering to evolving systems and its characterization have been shown. There exists a great number of online clustering algorithms that have been proposed over the recent years. They rely on various computational intelligence machineries. It is expected that online clustering in dynamic environments inhabited by evolving systems remains a challenging problem.

[1] | Charu C. Aggarwal, Jiawei Han, Jianyong Wang, and Philip S. Yu. A framework for clustering evolving data streams. In Proceedings of the 29th international conference on Very large data bases - Volume 29, pages 81–92, 2003. |

[2] | P. Angelov and D. Filev. Flexible models with evolving structure. Int. J. Intell. Syst., 19(4):327–340, 2004. |

[3] | O. Arandjelovic and R. Cipolla. Incremental learning of temporally coherent gaussian mixture models. In Proceedings of the The 16th British Machine Vision Conference, pages 759–768, 2005. |

[4] | L. Bottou and Y. Bengio. Convergence properties of the k-means algorithm. Advances in Neural Information Processing Systems, 7:585–592, 1995. |

[5] | H. Boubacar, S. Lecoeuche, and S. Maouche. Self-adaptive kernel machine: online clustering in rkhs. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, volume 3, pages 1977 – 1982, 2005 |

[6] | A. Bouchachia. Incremental learning. In Encyclopedia of Data Warehousing and Mining, pages 1006–1012. Idea-Group, 2009. |

[7] | A. Bouchachia. An evolving classiﬁcation cascade with self-learning. Evolving Systems, 1(3):143–160, 2010. |

[8] | A. Bouchachia. Fuzzy classiﬁcation in dynamic environments. Soft Comput., 15(5):1009–1022, 2011. |

[9] | A. Bouchachia. Incremental learning with multi-level adaptation. Neurocomputing, 74(11):1785–1799, 2011. |

[10] | A. Bouchachia and R. Mittermeir. Towards fuzzy incremental classiﬁers. Soft Computing, 11(2):193–207, 2006. |

[11] | D. Da Deng and N. Kasabov. On-line pattern analysis by evolving self-organizing maps. Neurocomputing, 51:87 – 103, 2003. |

[12] | I. Dagher, M. Georgiopoulos, G.L. Heileman, and G. Bebis. An ordering algorithm for pattern presentation in fuzzy artmap that tends to improve generalization performance. IEEE Transactions on Neural Networks, 10(4):768–778, 1999. |

[13] | A. Declercq and J. Piater. Online learning of gaussian mixture models - a two-level approach. In Proceedings of the Third International Conference on Computer Vision Theory and Applications, pages 605–611, 2008. |

[14] | D. Dovzan and I. Igor Skrjanc. Recursive clustering based on a gustafsonkessel algorithm. Evolving Systems, 2(1):15–24, 2011. |

[15] | D. Fisher. Knowledge Acquisition via Incremental Conceptual Clustering. Machine Learning, 2:139–172, 1987. |

[16] | B. Fritzke. A growing neural gas network learns topologies. In Advances in neural information processing systems, pages 625–632, 1995. |

[17] | B. Gabrys and A. Bargiela. General fuzzy min-max neural network for clustering and classiﬁcation. IEEE Trans. on Neural Networks, 11(3):769–783, 2000. |

[18] | O. Georgieva and F. Klawonn. Dynamic data assigning assessment clustering of streaming data. Appl. Soft Comput., 8(4):1305–1313, 2008. |

[19] | S. Grossberg. Nonlinear neural networks: principles, mechanism, and architectures. Neural Networks, 1:17–61, 1988. |

[20] | P. Hall and Y. Hicks. A method to add gaussian mixture models. Technical report, University of Bath, 2004. |

[21] | J. Hartigan. Clustering Algorithms. John Wiley and Sons, New York, 1975. |

[22] | J. J. Weng, Y. Zhang, and W. Hwang. Candid covariance free incremental principal component analysis. IEEE Trans. Pattern Anal.Mach. Intell, 25(8):1034–1040, 2003. |

[23] | A. Jain, M. Murty, and P. J. Flynn. Data clustering: a review. ACM Comput. Surv., 31:264–323, 1999. |

[24] | N. Kasabov. Foundations of Neural Networks, Fuzzy Systems and Knowledge Engineering. MIT Press, 1996. |

[25] | L. Kuncheva. Classiﬁer ensembles for changing environments. In In Proc. of the 5th international workshop on multiple classiﬁer systems, pages 1–15, 2004. |

[26] | D. Lee. Effective Gaussian mixture learning for video background subtraction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5):827–832, 2005. |

[27] | S. Ozawa, S. Pang, and N. Kasabov. On-line feature selction for adaptive evolving connectionist systems. International Journal of Innovative Computing, Information and Control, 2(1):181–192, 2006. |

[28] | N. Radford and G. Hinton. Learning in graphical models, chapter A view of the EM algorithm that justiﬁes incremental, sparse, and other variants, pages 355–368. MIT Press, 1999. |

[29] | S. Roberts, C. Holmes, and D. Denison. Minimum-entropy data partitioning using reversible jump markov chain monte carlo. IEEE Trans. Pattern Anal. Mach. Intell., 23(8):909–914, 2001. |

[30] | N. Rougier and Y. Yann Boniface. Dynamic self-organising map. Neurocomputing, 74(11):1840–1847, 2011. |

[31] | J. Rubio. Sofmls: Online self-organizing fuzzy modiﬁed least-squares network. IEEE T. Fuzzy Systems, 17(6):1296–1309, 2009. |

[32] | M. Song and H Wang. Highly efﬁcient incremental estimation of gaussian mixture models for online data stream clustering. In Intelligent Computing: Theory and Applications, pages 174–183. SPIE, 2005. |

[33] | C. Stauffer and W. Grimson. Adaptive background mixture models for real-time tracking. Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, 2:246–252, 1999. |

[34] | C. Stauffer and W. Grimson. Learning patterns of activity using real-time tracking. IEEE Trans. Pattern Anal. Mach. Intell., 22(8):747–757, 2000. |

[35] | S. Theodoridis and K. Koutroumbas. Pattern Recognition. Academic Press, 2006. |

[36] | A. Tsymbal. The problem of concept drift: deﬁnitions and related work. Technical Report TCD-CS-2004-15, Dept. of Computer Science Trinity College, Dublin, 2004. |

[37] | Z. Yang and M. Zwolinski. Mutual information theory for adaptive mixture models. IEEE Trans. Pattern Anal. Mach. Intell., 23(4):396–403, 2001. |

[38] | W. Yu and X. Li. On-line fuzzy modeling via clustering and support vector machines. Inf. Sci., 178(22):4264–4279, 2008. |

[39] | H. Zhao, P. Yuen, and J. Kwok. A novel incremental principal component analysis and its application for face recognition. IEEE Trans Syst Man Cybern B Cybern., 36(4):873–886, 2006. |