Pig ORDER command fails

问题

I am trying to analyze an apache log and the goal is the find out all user agents and their percentage in usage. The following program works fine to the line when result contains each useragent, count and percentage. The program fails at last line when tries to order according to most used. Could someone help?

logs = LOAD '$LOGS' USING ApacheCombinedLogLoader AS (remoteHost, hyphen, user, time, method, uri, protocol, statusCode, responseSize, referer, userAgent);

uarows = FOREACH logs GENERATE userAgent;
total = FOREACH (GROUP uarows ALL) GENERATE COUNT(uarows) as count;
dump total; 

gpuarows = GROUP uarows BY userAgent;
result = FOREACH gpuarows {
       subtotal = COUNT(uarows);
       GENERATE flatten(group) as ua, subtotal AS SUB_TOTAL, 100*(double)subtotal/(double)total.count AS percentage;
       };
orderresult = ORDER result BY SUB_TOTAL DESC;
dump orderresult;

what's weird is that 'dump result' works just fine, so it's the ORDER line makes trouble

errors:

013-04-13 11:33:09,976 [Thread-48] INFO  org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
2013-04-13 11:33:09,976 [Thread-48] INFO  org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
2013-04-13 11:33:09,995 [Thread-48] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0005
java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_1573648613_1365823989735
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:157)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_1573648613_1365823989735
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
    at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:177)
    at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:124)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:131)
    ... 6 more
2013-04-13 11:33:10,276 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0005
2013-04-13 11:33:10,276 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases orderresult
2013-04-13 11:33:10,276 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: orderresult[16,14] C:  R: 
2013-04-13 11:33:15,286 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2013-04-13 11:33:15,286 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local_0005 has failed! Stop running all dependent jobs
2013-04-13 11:33:15,287 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2013-04-13 11:33:15,287 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2013-04-13 11:33:15,288 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 

HadoopVersion   PigVersion  UserId  StartedAt   FinishedAt  Features
1.0.4   0.11.0  dliu    2013-04-13 11:32:27 2013-04-13 11:33:15 GROUP_BY,ORDER_BY

Some jobs have failed! Stop running all dependent jobs

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime  MinMapTIme  AvgMapTime  MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   MedianReducetime    Alias   Feature Outputs
job_local_0002  1   1   n/a n/a n/a n/a n/a n/a 1-18,logs,total,uarows  MULTI_QUERY,COMBINER    
job_local_0003  1   1   n/a n/a n/a n/a n/a n/a gpuarows,result GROUP_BY,COMBINER   
job_local_0004  1   1   n/a n/a n/a n/a n/a n/a orderresult SAMPLER 

Failed Jobs:
JobId   Alias   Feature Message Outputs
job_local_0005  orderresult ORDER_BY    Message: Job failed! Error - NA file:/tmp/temp265162785/tmp896004388,

Input(s):
Successfully read 0 records from: "file:///home/dliu/ApacheLogAnalysisWithPig/access.log"

Output(s):
Failed to produce result in "file:/tmp/temp265162785/tmp896004388"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_local_0002  ->  job_local_0003,
job_local_0003  ->  job_local_0004,
job_local_0004  ->  job_local_0005,
job_local_0005


2013-04-13 11:33:15,291 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
2013-04-13 11:33:15,297 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias orderresult
Details at logfile: /home/dliu/ApacheLogAnalysisWithPig/pig_1365823931459.log

回答1:

Make sure two things:

1) Run pig in local mode: pig -x local 2) Set either PIG_HOME or PIG_INSTALL environment variable to point to pig installation directory

回答2:

Please check that you don't have already file /tmp/temp265162785/tmp896004388 You can use the same file\directory for different tasks.

来源：https://stackoverflow.com/questions/15983956/pig-order-command-fails

标签

apache-pig

latin