mahout

How to install and launch mahout for spark?

人走茶凉 提交于 2019-12-11 06:07:14
问题 I am interested in learning machine learning algorithms for big data, and for that purpose I want to learn how to code in Mahout for Spark. Now I have posted my original question in here, but nobody answered, so I am modifying my question now. If anyone knows detailed procedures how to install LATEST Spark in Ubuntu 14.04 and how to integrate MAHOUT for it, I will be really grateful. Thanks in advance. 回答1: Currently Mahout uses: Spark 1.6.2 Scala 2.10.4 You can try to build your own version

Apache Mahout + Euclidean Distance: Unexpected Results

醉酒当歌 提交于 2019-12-11 05:59:02
问题 I'm using Mahout's EuclideanDistanceSimilarity class to rank the similarity of several users given the following data set of user preferences. The range for preferences is currently all integers from 1 to 5 inclusive. However I have control over the scale, so that can change if it would help. User Preferences: Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 1 2 4 3 5 1 2 2 5 1 5 1 5 1 3 1 5 1 5 1 5 4 2 4 3 5 1 2 5 3 3 4 5 2 2 I'm getting unexpected results when I run the following test code, which

Apache Mahout not giving any recommendation

谁说我不能喝 提交于 2019-12-11 05:25:20
问题 I am trying to use mahout for the recommendation but getting none . My dataset : 0,102,5.0 1,101,5.0 1,102,5.0 Code : DataModel datamodel = new FileDataModel(new File("dataset.csv")); // Creating UserSimilarity object. UserSimilarity usersimilarity = new PearsonCorrelationSimilarity(datamodel); // Creating UserNeighbourHHood object. UserNeighborhood userneighborhood = new ThresholdUserNeighborhood(0.1, usersimilarity, datamodel); // Create UserRecomender UserBasedRecommender recommender = new

User matching with current data

北城以北 提交于 2019-12-11 02:36:12
问题 I have a database full of two different types of users (Mentors and Mentees), whereby I want the second group (Mentees) to be able to "search" for people in the first group (Mentors) who match their profile. Mentors and Mentees can both go in and change items in their profile at any point in time. Currently, I am using Apache Mahout for the user matching (recommender.mostSimilarIDs()). The problem I'm running into is that I have to reload the user data every single time anyone searches. By

Not executing my hadoop mapper class while parsing xml in hadoop using XMLInputFormat

▼魔方 西西 提交于 2019-12-10 21:03:19
问题 I am new to hadoop, using Hadoop 2.6.0 version and trying to parse an complex XML. After searching for a while I get to know that for XML parsing we need to write custom InputFormat which is mahout's XMLInputFormat. I also took a help from this example But when I am running my code after passig XMLInputformat class, It will not call my own Mapper class and the output file is having 0 data in it if I use the XMLInputFormat given in the example. Surprisingly if I do not pass my XMLInputFormat

How to implement the SlopeOne recommender in Mahout 0.9?

旧街凉风 提交于 2019-12-10 17:37:07
问题 I'm new to Mahout and am trying to work through 'Mahout in Action,' which uses the 0.5 release. One of the early examples calls for using the slope-one recommender. Is this recommender still included in Mahout 0.9? I've looked through the documentation and I couldn't find it. Perhaps it has changed names? Thanks for your help! 回答1: There is no SlopeOneRecommender present in Mahout 0.9. It was removed in an earlier version of Mahout. SlopeOne Recommender was removed from Mahout 0.8 onwards

How to read a CSV file from Hdfs?

有些话、适合烂在心里 提交于 2019-12-10 10:59:16
问题 I have my Data in a CSV file. I want to read the CSV file which is in HDFS. Can anyone help me with the code?? I'm new to hadoop. Thanks in Advance. 回答1: The classes required for this are FileSystem, FSDataInputStream and Path. Client should be something like this : public static void main(String[] args) throws IOException { // TODO Auto-generated method stub Configuration conf = new Configuration(); conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/core-site.xml")); conf

Getting an IOException when running a sample code in “Mahout in Action” on mahout-0.6

≯℡__Kan透↙ 提交于 2019-12-10 04:26:34
问题 I'm learning Mahout and reading "Mahout in Action". When I tried to run the sample code in chapter7 SimpleKMeansClustering.java, an exception popped up: Exception in thread "main" java.io.IOException: wrong value class: 0.0: null is not class org.apache.mahout.clustering.WeightedPropertyVectorWritable at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1874) at SimpleKMeansClustering.main(SimpleKMeansClustering.java:95) I successed this code on mahout-0.5, but on mahout-0.6 I

Utilizing multiple, weighed data models for a Mahout recommender

扶醉桌前 提交于 2019-12-09 19:24:30
问题 I have a boolean preference recommender based on user similarity. My data set essentially contains relations where ItemId are articles the user has decided to read. I'd like to add a second data model containing where ItemId is a subscription to a particular topic. The only way I can imagine doing this is by merging the two together, offsetting the subscription IDs so that they don't collide with the article IDs. For weighting I considered dropping the boolean preference setup and introducing

Converting CSV to SequenceFile

丶灬走出姿态 提交于 2019-12-09 12:57:54
问题 I have a CSV file which I would like to convert to a SequenceFile, which I would ultimately use to create NamedVectors to use in a clustering job. I've been using the seqdirectory command to try to make a SequenceFile, and then fed that output into seq2sparse with the -nv option to create NamedVectors. It seems like this is giving one big vector as an output, but I ultimately want each line of my CSV to become a NamedVector. Where am I going wrong? 回答1: seqdirectory command takes every file