mahout

Multiple models in Myrrix

痴心易碎 提交于 2020-01-15 12:21:46
问题 I have a CSV file like this: typeA,typeB typeA,typeC typeA,typeC typeA,typeB Here, typeA, typeB and typeC are 3 different types of entities. Consider types B and C to be two different types of items and consider type A to be the users. I can build a model by feeding this CSV file into Myrrix. This file has two types only, B (the "B" items from the former CSV file are in here as users) and D. Now, suppose I have another CSV file like this: typeB,typeD typeB,typeD typeB,typeD typeB,typeD Here,

mahout基于用户推荐的简单例子(1)

放肆的年华 提交于 2020-01-08 11:49:21
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> mahout是机器学习的一个工具,里面封装了大量的机器学习的算法。 在Mahout实现的机器学习算法: 算法类 算法名 中文名 分类算法 Logistic Regression 逻辑回归 Bayesian 贝叶斯 SVM 支持向量机 Perceptron 感知器算法 Neural Network 神经网络 Random Forests 随机森林 Restricted Boltzmann Machines 有限波尔兹曼机 聚类算法 Canopy Clustering Canopy聚类 K-means Clustering K均值算法 Fuzzy K-means 模糊K均值 Expectation Maximization EM聚类(期望最大化聚类) Mean Shift Clustering 均值漂移聚类 Hierarchical Clustering 层次聚类 Dirichlet Process Clustering 狄里克雷过程聚类 Latent Dirichlet Allocation LDA聚类 Spectral Clustering 谱聚类 关联规则挖掘 Parallel FP Growth Algorithm 并行FP Growth算法 回归 Locally Weighted Linear

Remove character from integer

元气小坏坏 提交于 2020-01-05 06:44:25
问题 FileDataModel accepts data in the format userId,itemId,pref(long,long,Double). At the moment I have some itemId that consist of an 'x' at the end of the number. How do I edit the some of the itemID such that it removes the 'x' ? Is it possible to do this with a simple try catch statement? DataModel model = null; try{ model = new FileDataModel(new File("book_data/BX-Book-Ratings.csv")); }catch(NumberFormatException e){ REMOVE X } CODE: DataModel model = new FileDataModel(new File("book_data/BX

Mahout precomputed Item-item similarity - slow recommendation

筅森魡賤 提交于 2020-01-03 03:45:23
问题 I am having performance issues with precomuted item-item similarities in Mahout. I have 4 million users with roughly the same amount of items, with around 100M user-item preferences. I want to do content-based recommendation based on the Cosine similarity of the TF-IDF vectors of the documents. Since computing this on the fly is slow, I precomputed the pairwise similarity of the top 50 most similar documents as follows: I used seq2sparse to produce TF-IDF vectors. I used mahout rowId to

数据挖掘优秀工具对比

烈酒焚心 提交于 2019-12-31 16:49:35
https://www.cnblogs.com/Yuanjing-Liu/p/9391964.html 目录 1、数据挖掘工具对比 2、Rapid Miner 3、Orange 4、Weka 4.1 介绍 4.2 使用准备 4.3 主要功能与使用 4.4 优缺点 4.5 开发资源 5、KNIME 5.1 介绍 5.2 主要功能与使用 5.3 优缺点 5.4 开发资源 6 Apache Mahout 6.1 简介 6.2 主要特性 6.3 Mahout安装、配置 6.4 使用简单示例验证mahout 6.5 优缺点 文献 正文 回到顶部 1、数据挖掘工具对比 数据来源: Top 15 Best Free Data Mining Tools: The Most Comprehensive List — Software Testing Help 回到顶部 2、 Rapid Miner 回到顶部 3、Orange 回到顶部 4、Weka 4.1 介绍 Weka的全名是怀卡托智能分析环境(Waikato Environment for Knowledge Analysis),同时weka也是新西兰的一种鸟名,而Weka的主要开发者来自新西兰。Weka作为一个公开的数据挖掘工作平台,集合了大量能承担数据挖掘任务的机器学习算法,包括对数据进行预处理,分类,回归、聚类

Vectorization in Apache Mahout

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-30 05:14:05
问题 I am new to Mahout. I have a requirement to convert a text file to a vector for classification in later stage. Could anybody of of shed some light on these below questions? How to convert a text file to a vector in mahout? The file format is like "username|comment about item|rating" The data will be few TBs. So which algorithm implementable I can use for classification using the vector I suppose to create? Thanks, Arun 回答1: You can check these 2 examples that also somewhat do/explain how to

Java's Mahout equivalent in Python

本秂侑毒 提交于 2019-12-29 11:35:09
问题 Java based Mahout's goal is to build scalable machine learning libraries. Are there any equivalent libraries in Python ? 回答1: scikits learn is highly recommended http://scikit-learn.sourceforge.net/ 回答2: Spark MLlib is recommmended. It is a scalable machine learning lib, can read data from HDFS and of course runs on top of Spark. You can access it via PySpark (see the Programming Guide's Python examples). 回答3: Orange is supposedly pretty decent, from what I've heard, but I've never used it

Run Mahout RowSimilarity recommender on MongoDB data

痴心易碎 提交于 2019-12-25 04:56:30
问题 I have managed to run Mahout rowsimilarity on flat files of below format: item-id tag1 tag-2 tag3 This has to be run via cli and the output is again flat files. I want to make this such that it reads data from MongoDB (open to using other DBs too) and then dumps the output to DB which can then be picked from our system. I've researched for past few days and found below things: Will have to write Scala code implementing RowSimilarity Pass it an IndexedDataSet object to process the data Convert

How to start Mahout Spark Shell?

馋奶兔 提交于 2019-12-25 04:25:36
问题 I am trying to use mahout for spark and followed the instruction below: https://mahout.apache.org/users/sparkbindings/play-with-shell.html I have successfully installed oracle-7-java, maven 3.3.9. Instead of spark 1.1.0 I have downloaded spark 2.0.1 and ran build/sbt assembly which took care of scala (install 2.11.8) and other dependencies. I have also installed git, and checked out latest version of mahout in /home/mehrab/mahout-src directory. Spark is running well and fine without any error

Maven BUILD FAILURE when installing Mahout on Ubuntu

北慕城南 提交于 2019-12-24 22:32:11
问题 I am trying to build mahout in Ubuntu 12.04, but on a virtual machine running on a Windows 7 host machine. Maven does not seem to like this, and I don't really understand how to fix the problem. This is the result of a good long period of build tests: Results : Failed tests: SearchSanityTest.testRemoval:166->Assert.assertEquals:494->Assert.failNotEquals:743->Assert.fail:88 Previous second neighbor should be first expected:<0.0> but was:<15.74860724515773> Tests run: 834, Failures: 1, Errors: