Custom Map Reduce Program on Hive, what's the Rule? How about input and output?

前端 未结 1 452
难免孤独
难免孤独 2021-01-31 12:32

I got stuck for a few days because I want to create a custom map reduce program based on my query on hive, I found not many examples after googling and I\'m still confused about

1条回答
  •  攒了一身酷
    2021-01-31 12:44

    There are basically 2 ways to add custom mappers/reducers to hive queries.

    1. using transform

    SELECT TRANSFORM(stuff1, stuff2) FROM table1 USING 'script' AS thing1, thing2

    where stuff1, stuff2 are the fields in table1 and script is any executable which accepts the format i describe later. thing1, thing2 are the outputs from script

    1. using map and reduce
    FROM (
        FROM table
        MAP table.f1 table.f2
        USING 'map_script'
        AS mp1, mp2
        CLUSTER BY mp1) map_output
      INSERT OVERWRITE TABLE someothertable
        REDUCE map_output.mp1, map_output.mp2
        USING 'reduce_script'
        AS reducef1, reducef2;
    

    This is slightly more complicated but gives more control. There are 2 parts to this. In the first part the mapper script will receive data from table and map it to fields mp1 and mp2. these are then passed on to reduce_script, this script will receive sorted output on the key, which we have specified in CLUSTER BY mp1. mind you, more than one key will be handled by one reducer. The output of the reduce script will go to table someothertable

    Now all these scripts follow a simple pattern. they will read line by line from stdin. The fields will be \t separated and they will write back to stdout, in the same manner ( fields separated by '\t' )

    Check out this blog, there are some nice examples.

    http://dev.bizo.com/2009/07/custom-map-scripts-and-hive.html

    http://dev.bizo.com/2009/10/reduce-scripts-in-hive.html

    0 讨论(0)
提交回复
热议问题