搜索资源列表
cofinew
- COFI tree is used for mining frequent patterns from large text data
Text-mining
- 10几篇文本挖掘方面的论文 例如 web内容挖掘综述 web内容挖掘技术研究.-Text mining,data mining,web mining.10 several text mining papers such as the web content mining Summary of Web Content Mining.
ARFFInputformat
- hadoop下自定义的读文件格式类,对于数据挖掘分类算法的训练测试文本的特殊格式有很大帮助.-hadoop read the file format class custom of great help for training in the special format of the test text data mining classification algorithms.
util
- 很多文本处理有用的工具,NLP,数据挖掘都能用到-A lot of useful text processing tools, NLP, data mining can be used
NaiveBayes
- 基于朴素贝叶斯算法实现的文本分类程序,对数据挖掘的初学者具有很好的学习参考价值。-Based on Bayesian text classification algorithm procedures, data mining beginners a good learning reference value.
IDF
- IDF反映了在文档集合中一个单词对一个文档的重要性,经常在文本数据挖据与信息提取中用来作为权重因子。在一份给定的文件里,词频(termfrequency-TF)指的是某一个给定的词语在该文件中出现的频率。逆向文件频率(inversedocument frequency,IDF)是一个词语普遍重要性的度量。-IDF reflects the importance of a word in a document collection for a document, often in the text
source-archive (3)
- A toolkit for generation of dummy XML documents of user specified size and randomness of the structure xml-generator is a Python based toolkit for generation of well formed XML sample documents. Its primary purpose is to generate documents for perf