Pig - Remove embedded newlines and commas in gzip files
问题 I have a gzip file with data field separated by commas. I am currently using PigStorage to load the file as shown below: A = load 'myfile.gz' USING PigStorage(',') AS (id,date,text); The data in the gzip file has embedded characters - embedded newlines and commas. These characters exist in all the three fields - id, date and text. The embedded characters are always within the "" quotes. I would like to replace or remove these characters using Pig before doing any further processing. I think I