文件名称:Nutch-Web
-
所属分类:
- 标签属性:
- 上传时间:2012-11-16
-
文件大小:324.92kb
-
已下载:0次
-
提 供 者:
-
相关连接:无下载说明:别用迅雷下载,失败请重下,重下不扣分!
介绍说明--下载内容来自于网络,使用问题请自行百度
在对目前具有代表性的开源网络抓取软件Nutch、Heritrix、WCT、Web-Harvest进行比较分析的基础上,提出基于Nutch的Web网站定向采集系统,并对种子站点的选取、抓取过程管理、网页去噪、新种子站点的发现等关
键问题进行重点探讨。
-The paperanalyzes typicalopen sourceWeb crawl software, such asNutch, Heritrix, WCT, andWeb-Har-
vest. Following the analyzed result, itputs forward a targetedwebsitesharvestsystem based onNutch. Fourkey issues of
this system are discussed emphatically, which are the initial seedwebsites selection, the harvestprocessmanagement, the
web page contentdenoising, and discovering ofnew seedwebsites.
键问题进行重点探讨。
-The paperanalyzes typicalopen sourceWeb crawl software, such asNutch, Heritrix, WCT, andWeb-Har-
vest. Following the analyzed result, itputs forward a targetedwebsitesharvestsystem based onNutch. Fourkey issues of
this system are discussed emphatically, which are the initial seedwebsites selection, the harvestprocessmanagement, the
web page contentdenoising, and discovering ofnew seedwebsites.
(系统自动生成,下载前可以参看下载内容)
下载文件列表
Nutch-Web.caj
本网站为编程资源及源代码搜集、介绍的搜索网站,版权归原作者所有! 粤ICP备11031372号
1999-2046 搜珍网 All Rights Reserved.