Abstract:
Existing density-based data stream clustering algorithms are difficult to discover clusters with different densities and to distinguish clusters with bridges and the outliers. A novel stream clustering algorithm was proposed based on the shared nearest neighbor density. In this algorithm, the shared nearest neighbor density was defined based on the shared nearest neighbor graph, which considered the degree of data object surrounded by the nearest neighbors and the degree of data object demanded by around data objects. So the clustering result was not influenced by the density variation. The average distance of data object and the cluster density were defined to identify outliers and clusters with bridges. The updating algorithm over the sliding window was designed to maintain the renewal of clusters on the shared nearest neighbor graph. Theoretical analysis and experimental results demonstrate the performance of clustering effect and a better clustering quality.