一种不完备信息表的预处理方法

A method for preprocessing an incomplete information table

摘要: 针对不完备信息表预处理问题中的不完备数据的填补问题、冗余属性的约简问题和连续属性的离散化问题进行了研究.应用粗糙集理论,由相容信息表中条件属性与决策属性间的一致性对应关系,定义了划分区间的加法运算,解决了不完备数据填补问题;根据类别概念,定义了差别向量,利用差别向量加法运算删除了冗余属性;根据条件属性与决策属性之间的依赖关系及相对信息熵概念,实现了连续属性的离散化.数值示例和实验结果显示此方法是有效可行的.

Abstract: This paper studied the problems of filling up incomplete data, reducing redundant attributes and discretizing continuous attributes in preprocessing the incomplete information table with continuous attributes in a rough set. According to the concept of interval value and the consistency of condition attributes and decision attributes, a plus rule for interval values was defined to filling up the incomplete data. Depending on the conception of classification, the discernible vector was defined and the discernible vector addition rule was used to delete redundant attributes. By use of the super-club data and entropy of the information table, the discretization of continuous attributes was implemented. The illustration and experimental results indicate that the method is effective.