Pig problem with split string(STRSPLIT)

匿名 (未验证) 提交于 2019-12-03 03:05:02

问题:

i have following tuple H1, want to strsplit its $0 into tuple, but always got error message:

DUMP H1: (item32;item31;,1)  m = FOREACH H1 GENERATE STRSPLIT($0, ";", 50); ERROR 1000: Error during parsing. Lexical error at line 1, column 40.  Encountered: <EOF> after : "\";" 

anyone knows what's wrong with it? Thanks!

回答1:

There is an escaping problem in the pig parsing routines when it encounters this semicolon.

You can use a unicode escape sequence for a semicolon: \u003B. However this must also be slash escaped and put in a single quoted string. Alternatively, you can rewrite the command over multiple lines, as per Neil's answer. In all cases, this must be a single quoted string.

H1 = LOAD 'h1.txt' as (splitme:chararray, name);  A1 = FOREACH H1 GENERATE STRSPLIT(splitme,'\\u003B'); -- OK B1 = FOREACH H1 GENERATE STRSPLIT(splitme,';');       -- ERROR C1 = FOREACH H1 GENERATE STRSPLIT(splitme,':');       -- OK D1 = FOREACH H1 {                                     -- OK     splitup = STRSPLIT( splitme, ';' );     GENERATE splitup; }  A2 = FOREACH H1 GENERATE STRSPLIT(splitme,"\\u003B"); -- ERROR B2 = FOREACH H1 GENERATE STRSPLIT(splitme,";");       -- ERROR C2 = FOREACH H1 GENERATE STRSPLIT(splitme,":");       -- ERROR D2 = FOREACH H1 {                                     -- ERROR     splitup = STRSPLIT( splitme, ";" );     GENERATE splitup; }  Dump H1; (item32;item31;,1)  Dump A1; ((item32,item31))  Dump C1; ((item32;item31;))  Dump D1; ((item32,item31)) 


回答2:

STRSPLIT on a semi-colon is tricky. I got it to work by putting it inside of a block.

raw = LOAD 'cname.txt' as (name,cname_string:chararray);  xx = FOREACH raw {   cname_split = STRSPLIT(cname_string,';');   GENERATE cname_split; } 

Funny enough, this is how I originally implemented my STRSPLIT() command. Only after trying to get it to split on a semicolon did I run into the same issue.



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!