可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
i have following tuple H1, want to strsplit its $0 into tuple, but always got error message:
DUMP H1: (item32;item31;,1) m = FOREACH H1 GENERATE STRSPLIT($0, ";", 50); ERROR 1000: Error during parsing. Lexical error at line 1, column 40. Encountered: <EOF> after : "\";"
anyone knows what's wrong with it? Thanks!
回答1:
There is an escaping problem in the pig parsing routines when it encounters this semicolon.
You can use a unicode escape sequence for a semicolon: \u003B
. However this must also be slash escaped and put in a single quoted string. Alternatively, you can rewrite the command over multiple lines, as per Neil's answer. In all cases, this must be a single quoted string.
H1 = LOAD 'h1.txt' as (splitme:chararray, name); A1 = FOREACH H1 GENERATE STRSPLIT(splitme,'\\u003B'); -- OK B1 = FOREACH H1 GENERATE STRSPLIT(splitme,';'); -- ERROR C1 = FOREACH H1 GENERATE STRSPLIT(splitme,':'); -- OK D1 = FOREACH H1 { -- OK splitup = STRSPLIT( splitme, ';' ); GENERATE splitup; } A2 = FOREACH H1 GENERATE STRSPLIT(splitme,"\\u003B"); -- ERROR B2 = FOREACH H1 GENERATE STRSPLIT(splitme,";"); -- ERROR C2 = FOREACH H1 GENERATE STRSPLIT(splitme,":"); -- ERROR D2 = FOREACH H1 { -- ERROR splitup = STRSPLIT( splitme, ";" ); GENERATE splitup; } Dump H1; (item32;item31;,1) Dump A1; ((item32,item31)) Dump C1; ((item32;item31;)) Dump D1; ((item32,item31))
回答2:
STRSPLIT on a semi-colon is tricky. I got it to work by putting it inside of a block.
raw = LOAD 'cname.txt' as (name,cname_string:chararray); xx = FOREACH raw { cname_split = STRSPLIT(cname_string,';'); GENERATE cname_split; }
Funny enough, this is how I originally implemented my STRSPLIT() command. Only after trying to get it to split on a semicolon did I run into the same issue.