中国农业机械化科学研究院集团有限公司 主管

北京卓众出版有限公司 主办

邸小康,张辉,秦晓婧,等.融合新词发现和改进TextRank算法的农业领域关键词提取算法[J].农业工程,2023,13(6):21-25. DOI: 10.19998/j.cnki.2095-1795.2023.06.004
引用本文: 邸小康,张辉,秦晓婧,等.融合新词发现和改进TextRank算法的农业领域关键词提取算法[J].农业工程,2023,13(6):21-25. DOI: 10.19998/j.cnki.2095-1795.2023.06.004
DI Xiaokang,ZHANG Hui,QIN Xiaojing,et al.Agricultural keyword extraction algorithm combining new word discovery and improved textrank[J].Agricultural Engineering,2023,13(6):21-25. DOI: 10.19998/j.cnki.2095-1795.2023.06.004
Citation: DI Xiaokang,ZHANG Hui,QIN Xiaojing,et al.Agricultural keyword extraction algorithm combining new word discovery and improved textrank[J].Agricultural Engineering,2023,13(6):21-25. DOI: 10.19998/j.cnki.2095-1795.2023.06.004

融合新词发现和改进TextRank算法的农业领域关键词提取算法

Agricultural Keyword Extraction Algorithm Combining New Word Discovery and Improved TextRank

  • 摘要: 针对农业领域文本中专业术语类关键词提取困难的问题,提出了一种融合新词发现和改进TextRank算法的农业领域关键词提取方法。该算法利用信息熵对文本中的词进行成词概率计算,以此发现领域专有名词和新词,通过人工审核扩充分词字典;在分词字典基础上,改进TextRank算法在词图构建中节点值的计算方法,添加词语位置和词性权重,利用词语综合权重提取文本关键词。对比结果表明,该算法的F值比传统的TF-IDF算法平均提高7.5%,比TextRank算法平均提高9.8%,具有一定的实用性。

     

    Abstract: Aiming at difficulty of agricultural keyword extraction in domain text, an agricultural keyword extraction method was proposed, which combined new word discovery and improved TextRank algorithm.The algorithm calculated word formation probability of words in text through information entropy to find domain proper nouns and new words, and expanded word segmentation dictionary through manual audit.Based on word segmentation dictionary, calculation method of TextRank algorithm node value in the construction of word graph was improved, word position and part of speech weight were added, and comprehensive weight of words was used to extract text keywords.Through experimental comparison, F value of this algorithm was 7.5% higher than traditional TF-IDF algorithm on average, and 9.8% higher than TextRank algorithm on average.The algorithm had certain practicability.

     

/

返回文章
返回