Rhadoop - wordcount using rmr

后端 未结 1 1716
面向向阳花
面向向阳花 2021-01-17 04:30

I am trying to run a simple rmr job using Rhadoop package but it is not working.Here is my R script

print(\"Initializing variable.....\")
Sys.setenv(HADOOP_H         


        
相关标签:
1条回答
  • 2021-01-17 05:25

    Firstly, you'll have to set the HADOOP_STREAMING environment variable in your code.

    Try the below code, and note that the code assumes that you have copied your text file to the hdfs folder examples/wordcount/data

    R Code:

    Sys.setenv("HADOOP_CMD"="/usr/local/hadoop/bin/hadoop")
    Sys.setenv("HADOOP_STREAMING"="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.4.0.jar")
    
    # load librarys
    library(rmr2)
    library(rhdfs)
    
    # initiate rhdfs package
    hdfs.init()
    
    map <- function(k,lines) {
      words.list <- strsplit(lines, '\\s')
      words <- unlist(words.list)
      return( keyval(words, 1) )
    }
    
    reduce <- function(word, counts) {
      keyval(word, sum(counts))
    }
    
    wordcount <- function (input, output=NULL) {
      mapreduce(input=input, output=output, input.format="text", map=map, reduce=reduce)
    }
    
    ## read text files from folder example/wordcount/data
    hdfs.root <- 'example/wordcount'
    hdfs.data <- file.path(hdfs.root, 'data')
    
    ## save result in folder example/wordcount/out
    hdfs.out <- file.path(hdfs.root, 'out')
    
    ## Submit job
    out <- wordcount(hdfs.data, hdfs.out) 
    
    ## Fetch results from HDFS
    results <- from.dfs(out)
    results.df <- as.data.frame(results, stringsAsFactors=F)
    colnames(results.df) <- c('word', 'count')
    
    head(results.df)
    

    Output:

    word count
      AS    16
      As     5
      B.     1
      BE    13
      BY    23
      By     7
    

    For your reference, here is another example of running R word count map reduce program.

    Hope this helps.

    0 讨论(0)
提交回复
热议问题