考虑加权排序的分类数据聚类算法

Clustering algorithm of categorical data in consideration of sorting by weight

  • 摘要: 针对部分聚类算法对数据输入顺序敏感的问题,定义了不干涉序列指数,提出了应用不干涉序列指数对分类数据进行加权排序的方法,并基于该方法对受数据输入顺序影响的CABOSFV_C分类数据高效聚类算法进行改进,提出了考虑加权排序的聚类算法(CABOSFV_CSW),消除了算法对数据输入顺序的敏感性.采用UCI基准数据集进行实验,发现应用加权升序排序的CABOSFV_CSW算法在处理分类数据时,聚类质量较原始CABOSFV_C算法和其他受数据输入顺序影响的算法在准确性上有改善,在稳定性上有显著提高.

     

    Abstract: Aimed at solving the problem that part of clustering algorithms are sensitive to the data input order, a non-interference sequence index was defined, and an approach applying the non-interference sequence was proposed to sort categorical data by weight. Based on this approach, a new clustering algorithm considering sorting by weight (CABOSFV_CSW) was presented to improve CABOSFV_C, which is an efficient clustering algorithm for categorical data but sensitive to the data input order. This approach eliminates sensitivity to the data input order. UCI benchmark data sets were used to compare the proposed CABOSFV_CSW algorithm with traditional CABOSFV_C algorithm and other algorithms sensitive to the data input order. Empirical tests show that the new CABOSFV_CSW clustering algorithm for categorical data improves the accuracy and increases the stability effectively.

     

/

返回文章
返回