搜索资源列表
R4
- 短文本数据集,各大论文的数据集取材,英文文本,已经stemming,去停词,提炼后的。-R4 short text dataset,english. stemming and non-stop words.
InfoRetri
- 基于朴素贝叶斯的文本分类,包含去停用词,分词,特征提取,分类等-Text classification, based libsvm, included to stop words, segmentation, feature extraction and classification
kctp
- 此代码实现数据的预处理,包括分词、去符号、去停用词等。(This code realizes the preprocessing of data, including participle, symbol, stop words, etc.)