搜索资源列表
JAVA实现文本聚类,用到TF/IDF权重
- JAVA实现文本聚类,用到TF/IDF权重,用余弦夹角计算文本相似度,用k-means进行数据聚类等数学和统计 知识。,JAVA realization of text clustering, using TF/IDF weight, calculated using cosine angle between the text of similarity, using k-means clustering for data such as mathematical and statistical
WordSimilarityJAVA-
- 计算词汇语义相似度,基于知网,java编程,附《基于<知网>的词汇语义相似度计算》.-Computing lexical semantic similarity based on HowNet, java programming, with a semantic similarity calculation based on the vocabulary of the <Text> ".
java
- 采用标准的 Levenshtein Distance 算法计算两个文件的相似度。 2。 程序使用简单。选入多个文件,然后按分析便个分析出两个文-Standard Levenshtein Distance algorithm for calculating the similarity of two documents. 2. Using a simple procedure. Selected multiple files, and then analyzed by analysis of tw
(java)wenbenjulei
- 文本距离,文本相似度计算的java源代码,内含测试文档-Text from the text similarity calculation java source code, containing the test document
WordSimilarity
- 辞典文件放置在dict目录下,由"中文自然语言处理开放平台提供。算法参数使用《基于<知网>的词汇语义相似度计算》的实验参数,具体定义在类WordSimilarity中,为私有静态常量,可根据需要自行修改。-Dictionary files are placed in the dict directory, by the " Chinese natural language processing to provide an open platform for the algorithm p
words-similarity--using-web-search--
- 我们提出了一个新颖的方法来 计算使用自动提取的语义相似 从文本片段的词法,句法模式。-we propose a novel approach to compute semantic similarity using automatically extracted lexico-syntactic patterns from text snippets.
ComputerDecision
- 计算文本的余弦相似度,进行文本分类 。两文本相似度越接近1,越有可能被分为1类-Calculated text cosine similarity for text classification
CosineSimilarAlgorithm
- 主要通过余弦定理来实现两个文本相似程度的比较-Text similarity comparison can be achieved
English-sentence-sim
- 英文文本的相似度计算,分别从词形、词序、词义等进行权重计算,得到相似度结果-English text similarity calculation were re-calculated from the word form, word order, meaning, etc. right, the similarity results
src
- 基于文本内容相似度查询的软件代码,Java开发语言。-Similarity-based software code text queries, Java development language.
SimHash
- simhash算法的使用,可以进行网页去重,文本的相似度计算等-Use simhash algorithm can go heavy pages, such as text similarity calculation
TextSimilarity
- 余弦幅度法计算两个向量的相,求余弦相似度 -Consine Text Similarity
java-string-similarity-master
- code for finding similarity between programs the text that has been entered so can find similar items
VSM
- 利用向量空间模型对两个文本的相似度进行计算的Java代码-The use of vector space model for the two text similarity calculation of the Java code
文本查重
- 类说明: 名称:Contrast 描述:用于两文本进行各种方法的相似度对比。 相似度对比方法: 1.EditDistance编辑距离 2.CosineSimilarAlgorithm余弦定理 3.JianDanMoHu模糊匹配 4.综合对比,三种方法皆对比一遍,取平局值 方法:String getDegree(文本1,文本2,使用方法id)返回值为:字符串型的,相似度百分比(Class descr iption: Name: Contrast Desc
Kmeans
- 算法思想:提取文档的TF/IDF权重,然后用余弦定理计算两个多维向量的距离来计算两篇文档的相似度,用标准的k-means算法就可以实现文本聚类。源码为java实现(Algorithm idea: extract the TF/IDF weight of the document, then calculate the distance between two multidimensional vectors by cosine theorem, calculate the similarity