文件名称:InformationExtractionAlgorithms
-
所属分类:
- 标签属性:
- 上传时间:2013-07-27
-
文件大小:3.24mb
-
已下载:0次
-
提 供 者:
-
相关连接:无下载说明:别用迅雷下载,失败请重下,重下不扣分!
介绍说明--下载内容来自于网络,使用问题请自行百度
关于网页信息抽取的论文:【摘要】提出并实现了一种基于网页文字密度的正文信息提取算法,该算法主要根据中文网页源码每行中的中文字符比例,区别正文行和非正文行,并辅助一些相关的伪源码正文块识别算法,来区别真正的正文信息和噪声信息,从而实现中文网页正文信息的提取。实验结果表明本方法切实可行并且具有较高的准确性和通用性。-About Web information extraction papers: Abstract proposed and implemented a web-based text information extraction text density algorithm mainly based on Chinese Web source of Chinese characters in each line of the proportion of the difference between text lines and non-text lines, and some related pseudo auxiliary source text block identification algorithm to distinguish the true body of information and noise information, enabling Chinese web text information extraction. Experimental results show that this method is feasible and has a high accuracy and versatility.
(系统自动生成,下载前可以参看下载内容)
下载文件列表
Information Extraction Algorithms and Its Application Based on Word Density in a Webpage.pdf
本网站为编程资源及源代码搜集、介绍的搜索网站,版权归原作者所有! 粤ICP备11031372号
1999-2046 搜珍网 All Rights Reserved.