Abstract:
Building up a high-quality, expandable, complete, free-discardable, free-of-repetition and free-of-abnormal-ambiguity multi-language semantic unit knowledge base for a multi-language machine translation system was discussed. In the process of buildup, the type feature classification method was adopted o effectively reduce the calculation complexity, make the calculation for repetition removal reduced by half, and reduce the trash-removal calculation to
O (β
N), where
N is the scale of the semantic unit knowledge base, β is bounded, β<
C and
C is a constant. All algorithms can be concurrently realized on a multi-core processor in constant efficiency. Furthermore, the reecomposition of a semantic unit and the expansion methods for the semantic unit knowledge base in case of natural language type increase were also discussed. This knowledge base can be used not only for the multi-language machine translation system but also as the basic knowledge base for natural language understanding and processing.