搜索资源列表
simhash-f624c65.tar
- simhash.c /* Bibliography * Mark Manasse * Microsoft Research Silicon Valley * Finding similar things quickly in large collections * http://research.microsoft.com/research/sv/PageTurner/similarity.htm * * Andrei Z. Broder * On
deduplication
- C语言实现的simhash算法,用于文章查重!-Simhash algorithm C language, and re-check for the article!
simHash
- Google 的 simhash 算法的c#实现。通过大量测试,simhash用于比较大文本,比如500字以上效果都还蛮好,距离小于3的基本都是相似,误判率也比较低。(The c# implementation of the simhash algorithm for Google. Through a lot of tests, simhash for relatively large text, such as more than 500 words are quite good, the