搜索资源列表
simhash-f624c65.tar
- simhash.c /* Bibliography * Mark Manasse * Microsoft Research Silicon Valley * Finding similar things quickly in large collections * http://research.microsoft.com/research/sv/PageTurner/similarity.htm * * Andrei Z. Broder * On
simhash-java-master
- Hasher-master java program
simhash-java-master
- 通过java simhash算法的简单实现(A simple implementation of simhash algorithm by java.)
simHash
- Google 的 simhash 算法的c#实现。通过大量测试,simhash用于比较大文本,比如500字以上效果都还蛮好,距离小于3的基本都是相似,误判率也比较低。(The c# implementation of the simhash algorithm for Google. Through a lot of tests, simhash for relatively large text, such as more than 500 words are quite good, the
simhash
- 针对网络爬虫获取的文本进行去重和筛选,保留样本多样的基础上去重重读的文本(web clawer to let the simple word ,and make more information to abtain)