HFCAS OpenIR
本体与条件随机场结合的涉农商品名称抽取与类别标注
其他题名Agriculture-related product name extraction and category labeling based on ontology and conditional random field
2017-01-01
发表期刊计算机应用
ISSN1001-9081
摘要传统的基于条件随机场(CRF)的信息抽取方法在进行涉农商品名称抽取与类别标注时,需要大量的训练语料,标注工作量大,且抽取精度不高。为解决该问题,提出了一种基于农业本体与CRF相结合的涉农商品名称抽取与类别标注方法,将涉农商品名称的自动抽取与分类看作序列标注的任务。首先是原始数据的分词处理和词、词性、地理属性、本体概念特征选择;然后,采用改进的拟牛顿算法训练CRF模型参数,用维特比算法实现解码,共完成4组对比实验,识别出7种类别,并将CRF和隐马尔可夫模型(HMM)、最大熵马尔可夫模型(MEMM)通过实验进行比较;最后,将CRF应用于农产品供求趋势分析。结合合适的特征模板,本体概念的加入使CRF开放测试的总体准确率提高10.20%,召回率提高59.78%,F值提高37.17%,证明了本体与CRF结合方法在涉农商品名称和类别抽取中的可行性和有效性,可以促进农产品供求对接。
其他摘要Traditional information extraction method based on Conditional Random Field (CRF) requires large-scale labeled corpus, it is expensive to label corpus manually and the extraction precision is low in processing agriculture-related product name extraction and category labeling. In order to solve this problem, a method of agriculture-related product name extraction and category labeling based on agricultural ontology and CRF was proposed, automatic extraction and classification of agriculture-related product names was regarded as sequence labeling. Firstly, original data was processed, word, part of speech, geographical attributes and ontology concept features were selected. Then, parameters of the CRF mode were trained by the improved quasi-Newton algorithm and decoding was implemented by Viterbi algorithm. A total of four groups of comparative experiments were completed and seven categories were identified. CRF, Hidden Markov Model (HMM) and Maximum Entropy Markov Model (MEMM) were compared through experiments. Finally, the supply and demand trend analysis of agriculture produce was accomplished. The experimental results show that the overall precision, recall and F-score of the open test were increased by 10.20%, 59.78% and 37.17% respectively by adding ontology concepts with appropriate CRF features; it also proves the feasibility, effectiveness and practical significance of the method in promoting automatic supply and demand docking of agricultural products.
关键词条件随机场 农业本体 涉农商品名称 供求趋势 序列标注
收录类别CSCD
语种中文
CSCD记录号CSCD:5897675
引用统计
被引频次:2[CSCD]   [CSCD记录]
文献类型期刊论文
条目标识符http://ir.hfcas.ac.cn:8080/handle/334002/64993
专题中国科学院合肥物质科学研究院
推荐引用方式
GB/T 7714
. 本体与条件随机场结合的涉农商品名称抽取与类别标注[J]. 计算机应用,2017,037.
APA (2017).本体与条件随机场结合的涉农商品名称抽取与类别标注.计算机应用,037.
MLA "本体与条件随机场结合的涉农商品名称抽取与类别标注".计算机应用 037(2017).
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
百度学术
百度学术中相似的文章
必应学术
必应学术中相似的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。