Clustering algorithm of categorical data in consideration of sorting by weight
-
-
Abstract
Aimed at solving the problem that part of clustering algorithms are sensitive to the data input order, a non-interference sequence index was defined, and an approach applying the non-interference sequence was proposed to sort categorical data by weight. Based on this approach, a new clustering algorithm considering sorting by weight (CABOSFV_CSW) was presented to improve CABOSFV_C, which is an efficient clustering algorithm for categorical data but sensitive to the data input order. This approach eliminates sensitivity to the data input order. UCI benchmark data sets were used to compare the proposed CABOSFV_CSW algorithm with traditional CABOSFV_C algorithm and other algorithms sensitive to the data input order. Empirical tests show that the new CABOSFV_CSW clustering algorithm for categorical data improves the accuracy and increases the stability effectively.
-
-