搜索资源列表
bb
- 淘宝中用来做评论近义词分析,利用左右上下文的相似度-Taobao to comment synonyms analysis, using the left and right context similarity
FENLEI
- 淘宝中用来做评论近义词分析,利用左右上下文的相似度-Taobao to comment synonyms analysis, using the left and right context similarity
semantic-similarity
- 对语言的语义相似度进行计算,给出两个词的相似值,相似值的范围是1到5之间,实现是用java实现-Language for semantic similarity calculation, give similar values of the two words, similar to the range of values is between 1-5, implementation is using java
ddtw
- 基于DWT计算两个向量之间的距离,用于音节或者词的相似度比较-calculation the distance between 2 vectors, for comparing the similarity of 2 syllables or words
similarity
- 求字符串s1和s2的最大公共字串,衡量文档的相似度,体现了词的顺序。-The biggest public string for strings s1 and s2, measure the similarity of document, embodies the word order.
1
- 检测中文文章的相似度,首先对中文文章分词处理,然后提取特征,计算特征向量夹角。检验是否相似-Similarity detection Chinese article, the first article of the Chinese word processing and feature extraction, feature vector angle calculation. Test whether similar
testsurf
- surf方法对多组图片分类:对于多组图片,训练词库,利用词库表示图片比较每一张与其他组的相似度,依照各图片相似度累加,评判各组之间的相似关系-surf multiple sets of image classification method: For the multiple sets of pictures, training thesaurus using thesaurus compare each image represents a similarity with other grou
DTW
- Dynamic Time Warping(DTW)诞生有一定的历史了(日本学者Itakura提出),它出现的目的也比较单纯,是一种衡量两个长度不同的时间序列的相似度的方法。应用也比较广,主要是在模板匹配中,比如说用在孤立词语音识别(识别两段语音是否表示同一个单词),手势识别,数据挖掘和信息检索等中。(The birth of Dynamic Time Warping (DTW) has a certain history (Itakura, a Japanese scholar), and it
English
- 包括了原始英文文档、删除特殊符号、分词、词干化、计算相似度等文本预处理后产生的文档,总的数量是500个英文文档(Including the original English document, delete special symbols, such as text segmentation, a preprocessed documents produced, the total number of 500 English document)
Chinese
- 是做文本预处理时候利用爬虫收集的500个中文文档,包括分词部分、去掉特殊符号部分以及最后的相似度计算等(It is the 500 Chinese document collected by a crawler for text preprocessing, including the part of the participle, the removal of the special part of the symbol, and the final similarity calculatio
EnglishChuLi
- 利用python编写的文本预处理的程序,包含了每一步的实现代码,分为删除标点符号、删除停用词、相似度计算、PCA降维、聚类以及可视化等,运行环境为pytharm,python3开发环境(The text preprocessing program written by Python contains every step of implementation code, which is divided into delete punctuation marks, delete stop word
Python中文文本预处理
- 包括删除标点、分词、删除停用词、计算相似度、文本聚类等功能