how to load files on hadoop cluster using apache pig?

前端未结

关注

 3  1838

I have a pig script, and need to load files from local hadoop cluster. I can list the files using hadoop command: hadoop fs –ls /repo/mydata,` but when i tried to load file

相关标签:

3条回答

抹茶落季

2021-01-03 09:17

Get rid of the space on the either side of "=" in=LOAD '/repo/mydata/2012/02' USING PigStorage() AS (event:chararray, user:chararray)

0 讨论(0)
发布评论:

提交评论
- 加载中...
别跟我提以往

2021-01-03 09:29
I faced the same issue.. please find my suggestions below:
1. To start working on PIG please type: [root@localhost training]# pig -x local
2. Now type load statement as I am doing in below example: grunt> a= LOAD '/home/training/pig/TempFile.txt' using PigStorage(',') as (c1:chararray,c2:chararray,c3:chararray);
0 讨论(0)
发布评论:

提交评论
- 加载中...
天涯浪人

2021-01-03 09:31
My suggestion:
1. Create a folder in hdfs : hadoop fs -mkdir /pigdata
2. Load the file to the created hdfs folder: hadoop fs -put /opt/pig/tutorial/data/excite-small.log /pigdata
(or you can do it from grunt shell as grunt> copyFromLocal /opt/pig/tutorial/data/excite-small.log /pigdata)
1. Execute the pig latin script :
```
   grunt> set debug on

   grunt> set job.name 'first-p2-job'

   grunt> log = LOAD 'hdfs://hostname:54310/pigdata/excite-small.log' AS 
              (user:chararray, time:long, query:chararray); 
   grunt> grpd = GROUP log BY user; 
   grunt> cntd = FOREACH grpd GENERATE group, COUNT(log); 
   grunt> STORE cntd INTO 'output';
```
2. The output file will be stored in hdfs://hostname:54310/pigdata/output
0 讨论(0)
发布评论:

提交评论
- 加载中...