资源列表
hao123_5.0
- this hao123网址导航源码-this is hao123 site navigation source
Lucene_Course
- Lucene电子书(pdf版),包含Lucene的入门到精通的使用(Lucene e-book (PDF version), including the introduction of Lucene to master the use of)
spider_engine
- 分析网页代码,提取url进行散列处理,提交客户端程序进行排重处理,然后存入客户机数据库,随后根据数据库中的url列表遍历整个网络。-Analysis of web code, extract the hashed url, submit re-schedule the client program to deal with, and then stored in the client database, and then the url list in the database through
WebCrawler
- 一个简易的网络爬虫,并进行page权值的计算-A simple web crawler, and the calculation of weights for page
deploy
- 该系统把经常变动的信息,类似公司动态、企业新闻、新产品发布、促销活动和行业动态等更新信息集中管理,并通过信息的某些共性进行分类,最后系统化、标准化发布到网站上,同时提供新闻搜索及相关网站的友情链接。-The system is the constantly changing information, similar to the company dynamic, business news, new product releases, promotions and industry dynami
SearchEngine1.0
- 实现搜索引擎最基本的下载网页、建立倒排索引、关键词查询功能。程序的实现借助了libcurl库。-Search engine to achieve the most basic functionality of downloading page, seting up inverted index, keyword querying. Program implementation with the libcurl library.
JWikiDocs-1.0.tar
- a tool for crawling and downloading Wikipedia documents
Chinesewordsegmentationalgorithm
- 中文分词算法,跟金山词霸一样,当鼠标移动到语句上时,能自动分割词语-Chinese word segmentation algorithm with the same PowerWord, when the mouse moved to sentence when the words automatically partition
kooxoo
- 在线采集源程序,kooxoo初期代码,供学习研究
sphinx-0.9.8-rc2-chinese
- 搜索引擎,和MYSQL结和搜索网站的内容,速度极快,可以达到0.00XX秒.-Search engine, and MYSQL node and search site, extremely fast, can be achieved 0.00XX seconds.
forictclas
- 1.在vs2008下,解压缩即可运行 2.该代码为中科院的中文分词系统ictclas源码,本人修改部分bug后上传 3.运行后输入 中文字符串就可以-1. In vs2008, the extract to run 2. The code word for the Chinese Academy of Sciences of the sub-system ictclas source, I modified some bug and upload 3. Run and enter the
LITATERcHECKmOTHED
- 这个文档介绍了如何在网络上有效的搜索资料,对做研究工作的人来讲,非常的使用-This document describes how to effectively search the network information, do research, the very use of