问题
I want to remove any non printable new line characters in the column data.
I have enclosed all the columns with double quotes to delete the new line characters present in the column easily and to ignore the record delimiter after each end of line.
Say,I have 4 columns seperated by comma and enclosed by quotes in a text file. I'm trying to remove \n and \r characters only if it is present in between the double quotes
Currently used trim,but it deleted every line break and made it a sequence file without any record seperator.
tr -d '\n\r' < in.txt > out.txt
Sample data:
"1","test\n
Sample","data","col4"\n
"2\n
","Test","Sample","data" \n
"3","Sam\n
ple","te\n
st","data"\n
Expected Output:
"1","testSample","data","col4"\n
"2","Test","Sample","data" \n
"3","Sample","test","data"\n
Any suggestions ? Thanks in advance
回答1:
With GNU sed
sed ':a;N;$!ba;s/\("[^\n\r]*\)[\n\r\]*\([^\n\r]*\"\)/\1\2/g' file
See this post for the newline replacement without the enclosing "
.
回答2:
Could you please try awk solution and let me know if this helps you.
awk '{gsub(/\r/,"");printf("%s%s",$0,$0~/,$/?"":RS)}' Input_file
Output will be as follows.
"1","test","Sample","data"\n
"2","Test" \n
"3","Sample"
Explanation: Using printf
to print the lines, so using 2 %s(it is used for printing strings in printf
) here, first %s simply prints the current line, second one will check if a line is ending with comma(,) if yes then it will not print anything else it will print a new line. Add gsub(/\r/,"")
before printf in case you want to remove carriage returns and want to get the expected output shown by you too.
EDIT: As your post title suggests to remove carriage returns, so in case you want to remove carriage returns then you could try following. Though you should be mentioning your problem clearly.
tr -d '\r' < Input_file > temp_file && mv temp_file Input_file
Above will remove the carriage characters from your Input_file and save it in the same Input_file too.
回答3:
Here's a possible solution:
perl -pe 'if (tr/"// % 2) { chomp; $_ .= <>; redo; }'
If the current line has unbalanced quotes (i.e. an odd number of "
), it must end in the middle of a field, so we chomp out the newline, append the next input line, and restart the loop.
来源:https://stackoverflow.com/questions/46378760/unable-to-remove-carriage-returns-and-line-feeds-in-columns-enclosed-in-double-q