多语言机译系统中高质量语义单元库形成方法

Formation method of a high-quality semantic unit base for a multi-language machine translation system

  • 摘要: 讨论构建多自然语言互译机译系统所需的高质量、可扩充、完备的、无可弃、无重复、无非正常歧义的多语统一语义单元知识库.在构建过程中采用类型特征分类方法有效降低计算复杂性,使去重复的计算量降低一半,去可弃的计算量降到ON)(N是语义单元库规模,β是有界数,β<C,C是常数).全部算法都可以在多核处理机上以常数效率地实现.同时讨论了语义单元的再分解和自然语言种类的增多时语义单元知识库的扩充方法.该知识库不仅用于多自然语言互译系统,还可作为自然语言理解和处理的基础知识库.

     

    Abstract: Building up a high-quality, expandable, complete, free-discardable, free-of-repetition and free-of-abnormal-ambiguity multi-language semantic unit knowledge base for a multi-language machine translation system was discussed. In the process of buildup, the type feature classification method was adopted o effectively reduce the calculation complexity, make the calculation for repetition removal reduced by half, and reduce the trash-removal calculation to ON), where N is the scale of the semantic unit knowledge base, β is bounded, β< C and C is a constant. All algorithms can be concurrently realized on a multi-core processor in constant efficiency. Furthermore, the reecomposition of a semantic unit and the expansion methods for the semantic unit knowledge base in case of natural language type increase were also discussed. This knowledge base can be used not only for the multi-language machine translation system but also as the basic knowledge base for natural language understanding and processing.

     

/

返回文章
返回