问题
I am confused about reading a complex file (i.e. tuple and bag) in Pig and defining schemas,
to be more precise how I shall translate { , (, and a deliminator (e.g. |) during reading a file.
For example, I cannot figure out the content of 'complex_7.txt' with the following line in Pig:
(I am doing a reverse Eng, I have this example, and I am trying to write the text file that this schema can be used on)
a = LOAD '/user/maria_dev/complex_7.txt' AS (f1:int,f2:int,B:bag{T:tuple(t1:int,t2:int)});
dump a; This command should produce the following output from complex_7.txt, so how shall this data must be stored in this file (i.e. complex_7.txt)?? This is my question.
10,1,{(2,3),(4,6)}
11,2,{(2,3),(4,6),(8,9)}
12,3,{(1,3)}
having said that, how can I define multiple PigStorage during reading a file which contains a complex schema,
For example, if I have the following text file (say complex_8.txt), how can I read it?
1|2|{(2,3),(5,6)}, # I am not sure how the actual file must look like, but I may have a case that I may need to use multiple PigStorage separator.
I assume for reading the above data we need two delimiters, one for '|', and one for ',' (in the tuple).
UPDATE:
This answer helped me to solve this part (i.e. having multiple delimiters):
apache pig load data with multiple delimiters*
So the schema of this file must be in this format:
a = LOAD '/user/maria_dev/complex_7.txt' AS (f1:int,f2:int,B:bag{T:tuple(t1:int,t2:int)});
So I think what I am actually trying to say is that How can I translate {,), and deliminator from a file to a schema and from a schema to a file?
来源:https://stackoverflow.com/questions/60942521/rule-of-thumb-for-reading-from-a-file-and-defining-schema-for-complex-data-struc