搜索资源列表
stop
- 用来去除英文文档中的停用词,将一些高频词从文档中删除-English documents used to remove the stop words, some high-frequency words will be deleted from the document
vb
- 连接数据库 分词 去除停用词 计算权重值-Connect to the database to remove stop words word weighted value
stopword-list
- 在文本进行分类聚类之前,必须对文本进行预处理。预处理的第一步是分词,这中间需要去除停用词。这个文件就是停用词列表-Must preprocess the text before the text classification clustering. The first step in preprocessing is the word, the middle need to remove the stop words. This file is the stop word list
Text-classification
- 文本分类之词频统计 分词、词干提取、去停用词、计算词频,有界面-Text classification of word frequency statistics word stemmer, to stop words, calculate word frequency, interface
R4
- 短文本数据集,各大论文的数据集取材,英文文本,已经stemming,去停词,提炼后的。-R4 short text dataset,english. stemming and non-stop words.
InfoRetri
- 基于朴素贝叶斯的文本分类,包含去停用词,分词,特征提取,分类等-Text classification, based libsvm, included to stop words, segmentation, feature extraction and classification
stopword
- In this code how stop words are removed are shown and after removing stop words documents are displaying
THULAC_lite_java_v1
- 中文文本分词 词频统计,分词,去掉停词。 仅支持UTF-8编码-Chinese text segmentation To get the word frequency, word segmentation, remove stop words. Support only UTF-8 encoding
kctp
- 此代码实现数据的预处理,包括分词、去符号、去停用词等。(This code realizes the preprocessing of data, including participle, symbol, stop words, etc.)
segmentation
- 对文本进行分词,使用停用词表去除停用词,标点等。(segmentation, and deleting stop words and punctuations.)