资源列表
EM
- 对于混合高斯分布的情况,使用最大期望算法,通过不断计算每个样本的均值与方差,使得似然函数达到最大值。可以很好地处理满足一定概率分布的数据。 代码中通过mvnrnd()函数,设定其中的参数,产生符合混合高斯分布的一组数据集。-For the case of a mixed Gaussian distribution, using expectation-maximization algorithm, through continuous calculation of the mean and
multiverso-master
- Multiverso is a parameter server based framework for training machine learning models on big data with numbers of machines. It is currently a standard C++ library and provides a series of friendly programming interfaces. With such easy-to-use APIs, m
lightlda-master
- LightLDA is a distributed system for large scale topic modeling. It implements a distributed sampler that enables very large data sizes and models. LightLDA improves sampling throughput and convergence speed via a fast O(1) metropolis-Hastings algori
distributed_word_embedding-master
- The Distributed Word Embedding tool is a parallelization of the Word2Vec algorithm on top of our DMTK parameter server. It provides an efficient scaling to industry size solution for word embedding. -The Distributed Word Embedding tool is a paralle
distributed_skipgram_mixture-master
- The Distributed Multisense Word Embedding(DMWE) tool is a parallelization of the Skip-Gram Mixture [1] algorithm on top of the DMTK parameter server. It provides an efficient scaling to industry size solution for multi sense word embedding. -The Di
fnlp-master
- FNLP涓昏
naivebayes
- 朴素贝叶斯算法 求导致某一结果或现象发生的最可能的条件-Naive Bayes algorithm for the most likely cause of the condition or a result of the phenomenon
maxminjulei
- 最大最小聚类算法改进,和书上的步骤完全一样,可以运行,简单-Maximum and minimum clustering algorithm improvements, and steps on the book exactly the same, it can run, simple
f24
- 使用24点进行快速傅里叶变换fft,进行时域和频域之间的转换-use 24 points to do fft
K-Nearest-Neighbor
- 数据挖掘中经典的KNN(K-最近邻)算法,导入即可运行-Data Mining the classical KNN (K- nearest neighbor) algorithm, you can import operation
keyword_find
- 实现了将pdf转换为txt,并且进行分模块的关键词抽取算法-Realized convert pdf to txt, and dividing module keyword extraction algorithm
pyspark_process
- 使用pyspark进行文本分类算法实现,其中使用了tf-idf表示-Use pyspark text classification algorithm, which uses the tf-idf representation