基于共享最近邻密度的演化数据流聚类算法

Evolving data stream clustering algorithm based on the shared nearest neighbor density

  • 摘要: 现有的基于密度的数据流聚类算法难于发现密度不同的簇,难于区分由若干数据对象桥接的簇和离群点.本文提出了一种基于共享最近邻密度的演化数据流聚类算法.在此算法中,基于共享最近邻图定义了共享最近邻密度,结合数据对象被类似的最近邻对象包围的程度和被其周围对象需要的程度这两个环境因素,使聚类结果不受密度变化的影响.定义了数据对象的平均距离和簇密度,以识别离群点和簇间的桥接.设计了滑动窗口模型下数据流更新算法,维护共享最近邻图中簇的更新.理论分析和实验结果验证了算法的聚类效果和聚类质量.

     

    Abstract: Existing density-based data stream clustering algorithms are difficult to discover clusters with different densities and to distinguish clusters with bridges and the outliers. A novel stream clustering algorithm was proposed based on the shared nearest neighbor density. In this algorithm, the shared nearest neighbor density was defined based on the shared nearest neighbor graph, which considered the degree of data object surrounded by the nearest neighbors and the degree of data object demanded by around data objects. So the clustering result was not influenced by the density variation. The average distance of data object and the cluster density were defined to identify outliers and clusters with bridges. The updating algorithm over the sliding window was designed to maintain the renewal of clusters on the shared nearest neighbor graph. Theoretical analysis and experimental results demonstrate the performance of clustering effect and a better clustering quality.

     

/

返回文章
返回