资源列表
translation
- 计算机英汉机器翻译系统中的英语词性标注方法实现-Computer English-Chinese machine translation system of English speech tagging method
beiyes
- 贝叶斯网络概率中文分词算法,基于概率的分词算法-Bayesian network probability of Chinese word segmentation algorithm, based on the probability of word segmentation algorithm
word_vc
- vc++实现的基于字典的中文分词算法,基于贝耶斯网络的概率中文分词算法,以及文本相似程度比较的算法。
smallseg_0.6.tar
- 一个简单的中文分词系统的原代码,实现了基于language model的分词逻辑-word segment
GBK12+GBK14+GBK16
- GBK的点阵字库,包括GBK12,GBK14,GBK16和ASC12,ASC14,ASC16
dct-compression
- audio compression using dct under matlab language
g723
- 对语音根据G723标准进行压缩和解压缩,经过验证过的,很好用-G723 standard voice compression and decompression, proven, good with
itpp-3.10.4
- 强大的C++库,综合了Matlab的功能和C的速度,适合于信号处理等领域。
ROSTDM
- 网页文本抓取,通过设置XML可以批量抓取任意网站的任意数据-Web text crawl, crawl any website any data volume by setting XML
CRF++-0.46.tar
- Conditional Random Field(CRF)是重要的串学习模型,广泛用于自然语言处理的各个领域。CRF++是CRF的一个高效的实现,具有可扩展性好,功能强大的优点。-Conditional Random Field (CRF) is an important learning model series , widely used in natural language processing in various fields. CRF CRF is the realization
tm_0.3
- R-Project是一个开源的统计软件,专门有一个R语言,类似S语言,这个包里面就是一个R实现的文本挖掘(text mining简称tm)的包.里面有代码和样本数据.
contextextract
- 读取文章中的字符,生成字典,字典中包含文章中的字符,并记录出现次数-Read the article in the character, generating a dictionary, the dictionary contains the article in the character and record the number of occurrences