资源列表
SearchEngine
- 基于Java平台的一个简单的搜索引擎的完整实现-Implemented based on the integrity of the Java platform, a simple search engine
Search.test1
- 主要是測試以asp.net下載網路上的檔案,並可以解析word,excel,pdf格式的檔案為文字檔。限制:必須安裝office 2-The test asp.net download files over the Internet, and can parse word, excel, pdf format file as a text file. Restrictions: must install office 2000
joyhtml-0.2.2
- html正文提取,利用匹配来进行正文的抽取-html text extraction, the use of matching to carry out the extraction of the body
miniSearch
- 搜索,2006年4月开发出来。开发之初,搜索就将自己的目标定位于打造专业化的搜索引擎。力争用“最”短的时间为广大搜索用户提供“最”有用的信息。 搜索目前主要提供“美容经验” “大杂烩” “旅行资讯”三个领域的网页信息。 注:我们提供各行业专业搜索引擎索引库定制服务,您只需提供需要索引的网址,我们便会为您提供强大的网页信息抓取服务,丰富您的搜索引擎数据库,抓取内容可嵌入本搜索系统中。我们按所提供的网址数量收费。欢迎广大公司、站长和个人联系!
bigingiukhinthngminh
- ANN & GA In most of the industrial applications the liquid level control is of paramount importance, especially in petrochemical industries, pharmaceutical & food processing industries.
5
- 自己动手写搜索引擎第三章代码,随书光盘中的内容,整个太大,只能分别上传-Chapter code search engine to write himself, with the contents of the CD-ROM, the whole is too big, we were only able to upload
heritrix.rar
- web 网络爬虫 用户可以使用它从网络上抓取想要得资源,开发者还可以扩展它的各个组件,来实现自己的抓取逻辑。,Reptile web network users can use it from the network you want to crawl resources, developers can also extend its various components, to achieve their own logic crawl.
heritrix2.rar
- Heritrix是一个爬虫框架,可加如入一些可互换的组件。 它的执行是递归进行的,主要有以下几步: 1。在预定的URI中选择一个。 2。获取URI 3。分析,归档结果 4。选择已经发现的感兴趣的URI。加入预定队列。 5。标记已经处理过的URI ,Heritrix is a framework for reptiles, such as income may be a number of interchangeable components. It is a recursive implem
SolrEXP
- 一个很好的例子,发了蛮长的时间才写出来的,大家可以参考一下.-A good example, made a pretty long time to write, we can refer to.
wtxx
- 一个课程设计,用于将下载的网页,去除无用信息,基于本地的lucene搜索引擎,可以输入关键字,然后查找那些文件包含这个keyword-A course design, used for download web pages, remove useless information, based on local lucene search engine, can enter keyword and then find those that file contains the keyword
crawler_without_ring_vs2008_PQ
- 网络爬虫,为从网络的网页爬相关的网页来进行展示!-net crawling
Char04
- 网络搜索引擎代码,内涵各种爬行算法和相关子程序-This program code designed an eDonkey network crawling system which could avoid being added to the blacklist of the central server and break the count restriction of the results when crawler search something from the server.Af