搜索资源列表
DM4
- 执行流程: 1. 用户输入参数:K的选择,训练数据,测试数据的路径; 2. 读取训练数据集和测试数据集文件,用ArffFileReader类读取并组织起InstanceSet数据结构; 3. 利用上面的相似度量标准,对每一个测试集中的Instance,计算与其最相似的K个训练集中的Instance,通过投票进行分类,将分类结果存储经Instance的成员变量targetGuess中; 4. 对分类结果进行度量,包括分类正确率,各种类别实例的Precision,Recall;Con
sckr2013_final
- 基于Word2vec的词语相似度计算,包含完整的java代码,但语料由Word2vec训练得到的模型太大,故不附带模型。-Word2vec-based word similarity computation, including complete java code, but by the Word2vec training corpus resulting model is too large, it is not included with model.
semantic-similarity
- 对语言的语义相似度进行计算,给出两个词的相似值,相似值的范围是1到5之间,实现是用java实现-Language for semantic similarity calculation, give similar values of the two words, similar to the range of values is between 1-5, implementation is using java
DocDistance
- java实现的文本相似度系统,使用向量空间模型以及余弦相似度距离公式,实测可以实现2篇文本的相似度计算且有一定的效果。-Java text similarity system, using the vector space model and the cosine similarity distance formula, the measured results can be achieved two similarity of text and have some effect.