Unable to remove carriage returns and line feeds in columns enclosed in double quotes [duplicate]

时光毁灭记忆、已成空白 提交于 2019-12-25 11:25:08

问题


I want to remove any non printable new line characters in the column data.

I have enclosed all the columns with double quotes to delete the new line characters present in the column easily and to ignore the record delimiter after each end of line.

Say,I have 4 columns seperated by comma and enclosed by quotes in a text file. I'm trying to remove \n and \r characters only if it is present in between the double quotes

Currently used trim,but it deleted every line break and made it a sequence file without any record seperator.

tr -d '\n\r' < in.txt > out.txt

Sample data:

"1","test\n

Sample","data","col4"\n

"2\n

","Test","Sample","data" \n

"3","Sam\n

ple","te\n

st","data"\n

Expected Output:

"1","testSample","data","col4"\n

"2","Test","Sample","data" \n

"3","Sample","test","data"\n

Any suggestions ? Thanks in advance


回答1:


With GNU sed

sed ':a;N;$!ba;s/\("[^\n\r]*\)[\n\r\]*\([^\n\r]*\"\)/\1\2/g' file

See this post for the newline replacement without the enclosing ".




回答2:


Could you please try awk solution and let me know if this helps you.

awk '{gsub(/\r/,"");printf("%s%s",$0,$0~/,$/?"":RS)}'  Input_file

Output will be as follows.

"1","test","Sample","data"\n
"2","Test" \n
"3","Sample"

Explanation: Using printf to print the lines, so using 2 %s(it is used for printing strings in printf) here, first %s simply prints the current line, second one will check if a line is ending with comma(,) if yes then it will not print anything else it will print a new line. Add gsub(/\r/,"") before printf in case you want to remove carriage returns and want to get the expected output shown by you too.

EDIT: As your post title suggests to remove carriage returns, so in case you want to remove carriage returns then you could try following. Though you should be mentioning your problem clearly.

tr -d '\r' < Input_file > temp_file && mv temp_file  Input_file

Above will remove the carriage characters from your Input_file and save it in the same Input_file too.




回答3:


Here's a possible solution:

perl -pe 'if (tr/"// % 2) { chomp; $_ .= <>; redo; }'

If the current line has unbalanced quotes (i.e. an odd number of "), it must end in the middle of a field, so we chomp out the newline, append the next input line, and restart the loop.



来源:https://stackoverflow.com/questions/46378760/unable-to-remove-carriage-returns-and-line-feeds-in-columns-enclosed-in-double-q

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!