WU Sen, WEI Gui-ying, BAI Chen, ZHANG Gui-qiong. Clustering algorithm based on set dissimilarity for high dimensional data of categorical attributes[J]. Chinese Journal of Engineering, 2010, 32(8): 1085-1089. DOI: 10.13374/j.issn1001-053x.2010.08.045
Citation: WU Sen, WEI Gui-ying, BAI Chen, ZHANG Gui-qiong. Clustering algorithm based on set dissimilarity for high dimensional data of categorical attributes[J]. Chinese Journal of Engineering, 2010, 32(8): 1085-1089. DOI: 10.13374/j.issn1001-053x.2010.08.045

Clustering algorithm based on set dissimilarity for high dimensional data of categorical attributes

  • A clustering algorithm is proposed based on set dissimilarity. Through defining set dissimilarity and set reduction, it does not calculate the distance between each pair of objects but computes the general dissimilarity of all the objects in a set directly, reduces high-dimensional categorical data enormously without loss of computation accuracy and gets the clustering result by only once data scanning. The time complexity of the algorithm is almost linear. An example of real data shows that the clustering algorithm is effective.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return