Rule of thumb for reading from a file and defining schema for complex data structure

问题

I am confused about reading a complex file (i.e. tuple and bag) in Pig and defining schemas,

to be more precise how I shall translate { , (, and a deliminator (e.g. |) during reading a file.

For example, I cannot figure out the content of 'complex_7.txt' with the following line in Pig:

(I am doing a reverse Eng, I have this example, and I am trying to write the text file that this schema can be used on)

a = LOAD '/user/maria_dev/complex_7.txt'  AS (f1:int,f2:int,B:bag{T:tuple(t1:int,t2:int)});

dump a; This command should produce the following output from complex_7.txt, so how shall this data must be stored in this file (i.e. complex_7.txt)?? This is my question.
10,1,{(2,3),(4,6)}
11,2,{(2,3),(4,6),(8,9)}
12,3,{(1,3)}

having said that, how can I define multiple PigStorage during reading a file which contains a complex schema,

For example, if I have the following text file (say complex_8.txt), how can I read it?

 1|2|{(2,3),(5,6)}, # I am not sure how the actual file must look like, but I may have a case that I may need to use multiple PigStorage separator.

I assume for reading the above data we need two delimiters, one for '|', and one for ',' (in the tuple).

UPDATE:

This answer helped me to solve this part (i.e. having multiple delimiters):

apache pig load data with multiple delimiters*

So the schema of this file must be in this format:

a = LOAD '/user/maria_dev/complex_7.txt' AS (f1:int,f2:int,B:bag{T:tuple(t1:int,t2:int)});

So I think what I am actually trying to say is that How can I translate {,), and deliminator from a file to a schema and from a schema to a file?

来源：https://stackoverflow.com/questions/60942521/rule-of-thumb-for-reading-from-a-file-and-defining-schema-for-complex-data-struc

标签

HDFS

apache-pig

hadoop2