搜索资源列表
kmeans
- k-means算法是文本聚类经典算法,也是数据挖掘十大经典算法之一。k-means算法Java实现。-k-means algorithm is a classical algorithm text clustering, data mining is one of the ten classic algorithms. k-means algorithm is implemented in Java.
three_gram_train
- 直接从文本文档中统计建立三阶语言模型的MATLAB程序-Text document directly MATLAB programs set up third-order statistical language model
R4
- 短文本数据集,各大论文的数据集取材,英文文本,已经stemming,去停词,提炼后的。-R4 short text dataset,english. stemming and non-stop words.
data_mining
- 这是一个R的程序 文本挖掘的作业,用来统计新闻类别再进行分类 -This is an R program text mining operations for Statistical News category then classified
extractWiki
- 从enwiki-latest-pages-articles.xml中抽取维基页面的正文内容。-Extract text content enwiki-latest-pages-articles.xml.
LSASummarization-a-paper
- lsasummarization 用 lsa 来干文摘的工作。 配套论文-lsasummarization leverages lsa language model in the task of text summarization
NaiveBayes-master
- 对文本信息进行分类,训练和学习,利用朴素贝叶斯算法实现。-Text information on the classification, training and learning, with Naive Bayes algorithm.
DeepLearning-master
- 深度学习的概念源于人工神经网络的研究。含多隐层的多层感知器就是一种深度学习结构。深度学习通过组合低层特征形成更加抽象的高层表示属性类别或特征,以发现数据的分布式特征表示。[1] 深度学习的概念由Hinton等人于2006年提出。基于深信度网(DBN)提出非监督贪心逐层训练算法,为解决深层结构相关的优化难题带来希望,随后提出多层自动编码器深层结构。此外Lecun等人提出的卷积神经网络是第一个真正多层结构学习算法,它利用空间相对关系减少参数数目以提高训练性能。[1] 深度学习是机器
Hadoop
- 使用hadoop开发,可以对输入文件中出现的关键词统计词频并进行不同文本词频统计高低的排序,本代码需要用户自行定义关键词和输入文件-Use hadoop development, can appear in the input file keyword statistics word frequency and low frequency statistics different sort of text, the code requires a user-defined keywords an
Enhancedtextmining
- 强化版本文本挖掘流程,包含分词,分类聚类,分词结果评估等-Enhanced version of the text mining process, including word segmentation, classification clustering, segmentation results uation, etc.
Bias_algorithm_java
- 贝叶斯算法java实现,在贝叶斯算法思想基础上做改进,提供文本分类效率-Bias algorithm java implementation, based on the idea of Bias algorithm to improve the efficiency of text classification
InfoRetri
- 基于朴素贝叶斯的文本分类,包含去停用词,分词,特征提取,分类等-Text classification, based libsvm, included to stop words, segmentation, feature extraction and classification
pyspark_process
- 使用pyspark进行文本分类算法实现,其中使用了tf-idf表示-Use pyspark text classification algorithm, which uses the tf-idf representation
datamining
- PDF格式的PPT,来自英国南安普顿大学。主要介绍了数据挖掘的技术以及应用,包括决策树,推荐系统,文本聚类,搜索引擎,购物篮子分析。-PPT PDF format, the University of Southampton. It introduces data mining technology and applications, including decision, recommendation systems, text clustering, search engines, sho
kNN
- 使用python编写kNN算法,包括生成数据集,简单分类器,文本转换等简单算法。-Using python write kNN algorithms, including generating a data set, a simple classification, text conversion simple algorithm.
text-mining
- 文本挖掘,用词项-文档矩阵带入算法模拟出标签云和词条网络-text mining
text-mining
- text mining using R programming
bayes1
- 朴素贝叶斯法主要根据概率论中的贝叶斯法则,是一种很好的用于文本分析的机器学习算法-Naive Bayes method is a kind of machine learning algorithm based on the theory of probability, which is a good machine learning algorithm for text analysis.
ICTCLAS_api
- 用于为指定文本进行分词操作。按照不同的词性进行分词。-Used to specify the text for the operation of word segmentation. According to different parts of speech.
SVM
- 支持向量机神经网络的信息粒化时序回归预测代码案列,代码易于修改-Support Vector Machines Neural Network Information granulated series regression predictive code text column, easy to modify the code