Extracting rows containing specific value using mapReduce and hadoop

前端 未结 1 921
Happy的楠姐
Happy的楠姐 2021-01-24 18:30

I\'m new to developing map-reduce function. Consider I have csv file containing four column data.

For example:

101,87,65,67  
102,43,45,         


        
1条回答
  •  鱼传尺愫
    2021-01-24 19:31

    Basically WordCount example resembles very well what you are trying to achieve. Instead of initializing the count per each word, you should have a condition to check if the tokenized String has required value and only in that case you write to context. This will work, since Mapper will receive each line of the CSV separately.

    Now Reducer will receive the list of the values, already organized per key. In Reducer, instead of having IntWritable as output value type, you can use NullWritable for return value type, so your code will only output the keys. Also you do not need the cycle in Reducer, since you only would like to output the keys.

    I do not provide you any code in my answer, since you will learn nothing from that. Make you way from the recommendations.

    EDIT: since you modified you question with request for Reducer, here are some tips how you can achieve what you want.

    One of the possibilities for achiving desired result is: in Mapper, after splitting (or tekenizing) the line, you write to context column 3 as key and column 0 as value. Your Reducer, since you do not need to any kind of aggregation, can simply write the keys and values produced by Mappers (yep, your Reducer code will end up with a single line of code). You can check one of my previous answers, the figure there explains quite well what Map and Reduce phases are doing.

    0 讨论(0)
提交回复
热议问题