搜索资源列表
语音合成语料库管理系统的研究与设计
- 本文主要叙述了语料及其管理系统的研究与设计用最新的开发工具和现有软件达到系统的设计 目标-This paper describes the corpus and its management system research and design using the latest development tools and existing software system to achieve the design goal
v.206(预处理)
- lex语法分析,对BNC语料库进行文本标注前的预处理,将与SGML标注与文本词性标注无关的删除掉-this is lex syntax analyzing,annotate with BNC syntax LIB.
CJCorpus
- 一个日汉平行的双语语料库,含有4053个句子-a parallel to the Japanese and Chinese bilingual corpus, containing 4,053 Sentence
TestCorpusyuliaoguanli
- 1. 这是一个简单的语料库管理系统 2. 可以添加和删除语料文件,统计语料中的字数 3. 可以查找语料中的汉字串以及重叠形式 4. 语料文件存放在corpus目录下,查询结果保存在跟语料库相同目录下 5. corpus目录下有4个文本文件(其中test1, test2是两个小文件)供测试用 6. 只能处理文本文件,GB内码-1. This is a simple Corpus management system 2. They can add and delete corpu
Wordsegmentation2
- NLP技术实现,对语料库进行自动统计生成分词词典,对训练集进行分词,列出所有的分词可能并计算每种可能的概率。请使用者自行加入语料库和测试集。-NLP technology to automatically Corpus Health Statistics ingredients dictionary, the training set for segmentation, list all the sub-term may calculate the probability of each pos
Form2-26
- 从已标注的语料库中提取数据,转存在EXCEL中
quanwenjiansuo
- 全文检索程序,最长匹配,可以立刻找到所有出现的句子,需要语料库,(例如人民日报)。-text retrieval procedures, the longest match, can immediately find all the sentences need to Corpus (for example, the People's Daily).
zzbds.rar
- 用正则表达式处理语料库,最多可以处理500个句子,如果想拥有更多功能可以注意使用V2.0,Corpus with the regular expression processing, can handle up to 500 sentences, if you want to have more features may take note of the use of V2.0
chinese
- 中文信息处理基础 第一讲VC环境编程简介 第二讲文件处理 第三讲字符编码 第四讲字频统计 第五讲文本断句 第六讲语料库-Basic information first deal with English-speaking environment for programming VC brief introduction stresses the second file handle character encoding the third stresses t
Chinese--NER
- 基于CRF的中文机构名识别系统。使用北京大学1998年的人民日报语料库作为训练语料。除常用的特征模板,已经词性特征外,使用词语的最后一个字作为特征,提高了机构名识别的准确率, 调用了CRF++程序包训练模型。-CRF-based name recognition system of Chinese institutions. People' s Daily, Peking University in 1998 with corpus as training data. In additio
chinese-text
- 文本分类语料库,经过编辑手工整理与分类的新闻语料与对应的分类信息。其分类体系包括几十个分类节点,网页规模约为十万篇文档-Text classification corpus, edited manually compiled and classification of news corpus and the corresponding classification information. Their classification system includes dozens of classi
Unsupervise
- 利用隐马尔可夫模型实现词性标注。此为无监督模型。 内含语料库和测试集。方便大家学习。-The use of Hidden Markov Model to achieve part of speech tagging. This is no oversight model. Corpus and the test set contains. To facilitate them to learn.
SpamFiltering
- 该程序实现的是一个垃圾邮件过滤系统,方法采用的是NAIVE Bayes,语料库用的是LINspam—public,程序中有使用说明,希望大家一起探讨改进一下.-The program is a spam filtering system, methods used NAIVE Bayes, Corpus used LINspam-public, the procedures in use, hoping to improve what we explore.
fenci
- 自己下载一个语料库,根据程序,计算权重,然后对语料库进行分词-Download a corpus itself, according to the procedures for calculating the weights, and then carried out on sub-word corpus
PU123ACorpora.tar
- 这是一个供做垃圾邮件方面东西的朋友的语料库,很好用的,望对大家有帮助-This is a place for things to do in junk e-mail a friend corpus, well used, hope helpful to everyone
11
- 关于语音识别中语料库的建立与整理,以及分析统计-Speech Recognition Corpus on the establishment and finishing, as well as the analysis of statistical
22
- 关于语音识别中语料库的建立与整理,以及分析统计-Speech Recognition Corpus on the establishment and finishing, as well as the analysis of statistical
3
- 关于语音识别中语料库的建立与整理,以及分析统计-Speech Recognition Corpus on the establishment and finishing, as well as the analysis of statistical
4
- 关于语音识别中语料库的建立与整理,以及分析统计-Speech Recognition Corpus on the establishment and finishing, as well as the analysis of statistical
aclImdb_v1.tar
- 英文影评语料库,用于英文情感分析。包含训练集和测试集,均为标注数据。(English movie reviews corpus)