早教吧作业答案频道 -->英语-->
英语翻译中文分词是中文信息处理的基础.在自然语言理解、语言文字研究、中文文本自动标引、信息检索、机器翻译等领域中,中文分词具有不可替代的作用.因此,中文分词的研究至关重要.
题目详情
英语翻译
中文分词是中文信息处理的基础.在自然语言理解、语言文字研究、中文文本自动标引、信息检索、机器翻译等领域中,中文分词具有不可替代的作用.因此,中文分词的研究至关重要.
但是,中文分词的研究水平已经远落后于与它关联的相关技术,成为制约其它技术发展的瓶颈.中文分词的研究过程中遇到了以下问题:语言学方面的困难,新词的不断出现,歧义的判别,分词的标准不统一等;计算机方面的困难,没有合理的自然语言形式模型,没有有效方式对语义进行理解以及形式化等.这些问题将会制约着中文分词的发展.
本文在综合分析现有的中文分词研究成果,重点对基于图的中文分词进行研究,提出了基于S-EK图最短路径的中文分词.研究的主要内容如下:
1.对中文分词的主要的算法进行了研究,比较和分析了常用的三种分词算法:基于字符串匹配的分词算法,基于统计的分词算法和基于知识理解的分词算法,并对它们之间的优缺点进行了总结.最后文章还给出了中文分词的评测标准及其意义.
2.重点在有向图和中文分词结合方面进行了深入研究,对N-最短路径中文分词的算法中的有向图进行改进,提出了S-EK图,并采用N-元统计模型计算出一个词在一定的语境下的概率,并对该值做了平滑处理,把最后的结果作为S-EK图的边的权值.
3.基于S-EK图的优点提出了S-EK最短路径算法.该算法在与N-最短路径算法和Dijkstra算法进行对比,实验和理论推导均证明该算法有一定的优点和价值.
关键词:中文分词;信息处理;S-EK图;最短路径;统计模型
中文分词是中文信息处理的基础.在自然语言理解、语言文字研究、中文文本自动标引、信息检索、机器翻译等领域中,中文分词具有不可替代的作用.因此,中文分词的研究至关重要.
但是,中文分词的研究水平已经远落后于与它关联的相关技术,成为制约其它技术发展的瓶颈.中文分词的研究过程中遇到了以下问题:语言学方面的困难,新词的不断出现,歧义的判别,分词的标准不统一等;计算机方面的困难,没有合理的自然语言形式模型,没有有效方式对语义进行理解以及形式化等.这些问题将会制约着中文分词的发展.
本文在综合分析现有的中文分词研究成果,重点对基于图的中文分词进行研究,提出了基于S-EK图最短路径的中文分词.研究的主要内容如下:
1.对中文分词的主要的算法进行了研究,比较和分析了常用的三种分词算法:基于字符串匹配的分词算法,基于统计的分词算法和基于知识理解的分词算法,并对它们之间的优缺点进行了总结.最后文章还给出了中文分词的评测标准及其意义.
2.重点在有向图和中文分词结合方面进行了深入研究,对N-最短路径中文分词的算法中的有向图进行改进,提出了S-EK图,并采用N-元统计模型计算出一个词在一定的语境下的概率,并对该值做了平滑处理,把最后的结果作为S-EK图的边的权值.
3.基于S-EK图的优点提出了S-EK最短路径算法.该算法在与N-最短路径算法和Dijkstra算法进行对比,实验和理论推导均证明该算法有一定的优点和价值.
关键词:中文分词;信息处理;S-EK图;最短路径;统计模型
▼优质解答
答案和解析
The Chinese word segmentation is Chinese information processing foundation. In natural language understanding, language research, Chinese text automatic indexing, information retrieval, machine translation, etc, the Chinese word segmentation plays an irreplaceable role. Therefore, the Chinese word segmentation research is very important.
However, the Chinese word segmentation research level is already far behind its associated related technologies, become the bottleneck of restricting the development of other technologies. The Chinese word segmentation research process encountered the following questions: linguistic difficulties, the words appear ceaselessly, ambiguity discriminant, participle standard is not uniform; Computer difficulties, no reasonable natural language form model, no effective way for understanding of the semantic and formalized, etc. These problems will restricts the development of the Chinese word segmentation.
Based on synthetic analysis of existing research results of the Chinese word segmentation, focus on Chinese word segmentation based on graph, is put forward based on S - EK figure shortest path Chinese word segmentation. The main content of the study are as follows:
1. The main for the Chinese word segmentation algorithm was studied, and the comparison and analysis of three commonly used words segmentation algorithm based on string matching, based on statistical words segmentation algorithm and the words segmentation algorithm based on knowledge understanding and of words segmentation algorithm and the advantages and disadvantages of between them are summarized. Finally the paper also gives the assessment of the Chinese word segmentation and its significance.
2. Key in a directed graph and combined Chinese word segmentation is studied, the shortest path to N - the Chinese word segmentation algorithm digraph was improved, puts forward S - EK chart and adopt N - yuan statistical model to compute a word in a certain context, and the probability of made smooth processing, value the final result as S - EK figure edge metric.
3. Based on S - EK proposed graph advantages s-rough shortest path algorithm EK. This algorithm in and N - a shortest path algorithm and Dijkstra algorithm is compared, and the experiment and theoretical derivation proves this algorithm has certain advantages and value.
However, the Chinese word segmentation research level is already far behind its associated related technologies, become the bottleneck of restricting the development of other technologies. The Chinese word segmentation research process encountered the following questions: linguistic difficulties, the words appear ceaselessly, ambiguity discriminant, participle standard is not uniform; Computer difficulties, no reasonable natural language form model, no effective way for understanding of the semantic and formalized, etc. These problems will restricts the development of the Chinese word segmentation.
Based on synthetic analysis of existing research results of the Chinese word segmentation, focus on Chinese word segmentation based on graph, is put forward based on S - EK figure shortest path Chinese word segmentation. The main content of the study are as follows:
1. The main for the Chinese word segmentation algorithm was studied, and the comparison and analysis of three commonly used words segmentation algorithm based on string matching, based on statistical words segmentation algorithm and the words segmentation algorithm based on knowledge understanding and of words segmentation algorithm and the advantages and disadvantages of between them are summarized. Finally the paper also gives the assessment of the Chinese word segmentation and its significance.
2. Key in a directed graph and combined Chinese word segmentation is studied, the shortest path to N - the Chinese word segmentation algorithm digraph was improved, puts forward S - EK chart and adopt N - yuan statistical model to compute a word in a certain context, and the probability of made smooth processing, value the final result as S - EK figure edge metric.
3. Based on S - EK proposed graph advantages s-rough shortest path algorithm EK. This algorithm in and N - a shortest path algorithm and Dijkstra algorithm is compared, and the experiment and theoretical derivation proves this algorithm has certain advantages and value.
看了 英语翻译中文分词是中文信息处...的网友还看了以下:
在文件系统中,要求物理块必须连续的物理文件是( )。A.顺序文件B.链接文件C.索引文件D.Hash 2020-05-23 …
全文检索是以( )存储为基础。 2020-05-31 …
下列关于WWW网目前使用的搜索引擎的描述中,错误的是A目前广泛使用的搜索引擎大多数是按照全文检索原 2020-06-05 …
地理基础知识请分别说出自然地理,人文地理,区域地理需要掌握的基础知识.注意:不需要把具体知识说出来 2020-06-14 …
公鸡的求偶行为,蜜蜂的采蜜、鸟类的迁徙,母猩猩哺乳,小鸡模仿母鸡用爪扒地索食等行为的产生,生理基础 2020-06-16 …
于敏一生致力于我国核物理事业,在氢弹的理论探索中,于敏几乎从一张白纸开始,依靠自己的勤奋,举一反三 2020-06-18 …
下列说法正确的是()A.搜索引擎按其工作方式可划分为蜘蛛程序和机器人B.搜索引擎按其工作方式可划分 2020-06-27 …
英语翻译审美语境下的伤时诗意蕴浅探摘要:从美学的角度对创作伤时诗的身体基础和心理基础进行探讨,可以看 2020-12-05 …
一部改革开放的实践发展史,也是一部马克思主义中国化的理论探索史。这表明()A.实践是认识的基础B.认 2020-12-10 …
《斜塔上的实验》这一课是不是按照时间线索来梳理文章的?如果不是,请你来说文章先写了啥?后写了啥?《斜 2021-01-12 …