how to load files on hadoop cluster using apache pig?

前端 未结 3 1828
北海茫月
北海茫月 2021-01-03 08:48

I have a pig script, and need to load files from local hadoop cluster. I can list the files using hadoop command: hadoop fs –ls /repo/mydata,` but when i tried to load file

相关标签:
3条回答
  • 2021-01-03 09:17

    Get rid of the space on the either side of "=" in=LOAD '/repo/mydata/2012/02' USING PigStorage() AS (event:chararray, user:chararray)

    0 讨论(0)
  • 2021-01-03 09:29

    I faced the same issue.. please find my suggestions below:

    1. To start working on PIG please type: [root@localhost training]# pig -x local

    2. Now type load statement as I am doing in below example: grunt> a= LOAD '/home/training/pig/TempFile.txt' using PigStorage(',') as (c1:chararray,c2:chararray,c3:chararray);

    0 讨论(0)
  • 2021-01-03 09:31

    My suggestion:

    1. Create a folder in hdfs : hadoop fs -mkdir /pigdata

    2. Load the file to the created hdfs folder: hadoop fs -put /opt/pig/tutorial/data/excite-small.log /pigdata

    (or you can do it from grunt shell as grunt> copyFromLocal /opt/pig/tutorial/data/excite-small.log /pigdata)

    1. Execute the pig latin script :

         grunt> set debug on
      
         grunt> set job.name 'first-p2-job'
      
         grunt> log = LOAD 'hdfs://hostname:54310/pigdata/excite-small.log' AS 
                    (user:chararray, time:long, query:chararray); 
         grunt> grpd = GROUP log BY user; 
         grunt> cntd = FOREACH grpd GENERATE group, COUNT(log); 
         grunt> STORE cntd INTO 'output';
      
    2. The output file will be stored in hdfs://hostname:54310/pigdata/output

    0 讨论(0)
提交回复
热议问题