搜索资源列表
wawatextcluster
- 蛙蛙的中文文本聚类,主要采用k-means算法。wawa s text cluster using C#.
main.计算文本之间相似度的程序
- 计算文本之间相似度的程序,用于文本的聚类。是在已知各个文本的文本特征向量基础上进行计算的,利用余弦值计算,Calculation of similarity between the text of the procedures for text clustering. Are known at all the text of the text feature vector calculated based on the use of cosine values
WawaTextCluster
- 文本聚类算法源码-Source text clustering algorithm
K-means_clustering_demo
- K-均值聚类算法 vc++图形演示程序-K-means clustering algorithm c++ demo program
Dbscan
- 基于密度原则的文本聚类实现,使用C代码编程,适合初试学习自然语言处理的学生-DBSCAN
TDIDF_Demo
- 基于k-means的文本聚类程序实现,希望对大家有帮助!-Based on the k-means clustering procedures for the realization of the text, I hope all of you help!
reuters21578
- 这是一个英文的语料库,可以用于进行文本的分类与聚类。是文本分类领域共用的一个语料库。-This is a corpus of English, can be used for text classification and clustering. The field of text classification is a common corpus.
LJParser
- 聚类算法相关知识,有语料和训练文本集,可供大家学习。-AppWizard has created this application for you. This application not only demonstrates the basics of using the Microsoft Foundation classes but is also a starting point for writing your application.
111
- 有关层次聚类的源码,对于文本聚类有一定作用。-Hierarchical clustering,Text Clustering
TextClusteringKmeans
- 从文本文件读入文本,分词,去停顿词,然后利用kmeans进行文本聚类-Text Clustering with K means
myfirst1
- 实现吉布斯采样,可用来处理文本,对文本聚类,分析文本主题,请用vs2010打开-Achieve Gibbs sampling, can be used to handle text, text clustering, analysis of text themes
1
- 基于WEKA平台的文本聚类研究与实现 文本聚类是文本挖掘领域的一个重要研究分支,是聚类方法在文本处理领域的应用。本文对基于空间向量模型的文本聚类过程做了较深入的讨论和总结,利用文本语料库,基于数据挖掘工具研究并实现了文本聚类的过程。本文首先给出了文本聚类的思想和过程,回顾了文本聚类领域的已有成果,列举了文本聚类领域在特征表示、特征提取等方面的基础研究工作。另外,本文回顾了现有的文本聚类算法,以及常用的文本聚类效果评价指标。在研究了已有成果的基础上,本文利用20 Newsgroup文本语料库,
K-Means_Text_Cluster
- K-Means文本聚类python实现,文本聚类算法,人名排除歧义-Text Cluster by the algorithm of K-means(include texts), discrimination of name ambiguity.
optics_cos
- 基于余弦聚类的OPTICS聚类算法,能够用于文本聚类-This is the OPTICS clustering algorithm based on cosine distance which can be used in text clustering.
VSM
- vsm矩阵构建以及同现矩阵概率计算,用于文本聚类等-The vsm matrix construction, and calculate the same probability of occurrence matrix for document clustering
toolkit_for_words_En
- 处理英文中的停词、同词干词,不改变文章结构。适用于文本分类、文本聚类、推荐预处理。-Processing of stop words in English, with the stem word, does not change the structure of the article. Suitable for text categorization, text clustering, recommend pretreatment.
DBSCAN Clustering
- 基于matlab的dbscancluster的实现可用于文本聚类(The implementation of dbscancluster based on Matlab can be used for text clustering)
words_1025_dic.txt
- dbscan,暂时不要下载,有误,回头整理(dbscan and word2vec for chinese words)
English
- 包括了原始英文文档、删除特殊符号、分词、词干化、计算相似度等文本预处理后产生的文档,总的数量是500个英文文档(Including the original English document, delete special symbols, such as text segmentation, a preprocessed documents produced, the total number of 500 English document)
EnglishChuLi
- 利用python编写的文本预处理的程序,包含了每一步的实现代码,分为删除标点符号、删除停用词、相似度计算、PCA降维、聚类以及可视化等,运行环境为pytharm,python3开发环境(The text preprocessing program written by Python contains every step of implementation code, which is divided into delete punctuation marks, delete stop word