Remove double quotes and comma from a numeric value of a .CSV file

喜欢而已 提交于 2019-12-06 09:11:49

问题


I have a .CSV file which has few records with numbers in them which are enclosed in double quotes (such as in "455,365.44") and commas in between the quotes. I need to remove the comma from the numeric values("455,365.44" should look like 455365.44 after processing) of the records so I could use them in the further processing of the file.

Here is a example of the file

column 1, column 2, column 3, column 4, column 5, column 6, column 7
12,"455,365.44","string with quotes, and with a comma in between","4,432",6787,890,88
432,"222,267.87","another, string with quotes, and with two comma in between","1,890",88,12,455
11,"4,324,653.22","simple string",77,777,333,22

and I need the result look like:

column 1, column 2, column 3, column 4, column 5, column 6, column 7
12,455365.44,"string with quotes, and with a comma in between",4432,6787,890,88
432,222267.87,"another, string with quotes, and with two comma in between",1890,88,12,455
11,4324653.22,"simple string",77,777,333,22

P.S: I need only the values which are numeric to be converted like this and the string values should remain same.

Please help...


回答1:


To remove the quotes (replace the number with the quotes with the number without them):

s/"(\d[\d.,]*)"/\1/g

See on rubular

For the commas I could only think of a lookahead and lookbehind, if thats supported by your regex implementation (replace commas with nothing if before and after is a number within quotes):

s/(?<="[\d,]+),(?=[\d,.]+")//g

You would have to execute this before removing the quotes.

It might also work without lookbehind:

s/,(?=[\d,.]*\d")//g

See on rubular

In a shell script you might want use perl e.g. execute:

cat test.csv | perl -p -e 's/,(?=[\d,.]*\d")//g and s/"(\d[\d,.]*)"/\1/g'

Explanation of the regex:

first execute:

s/,(?=[\d,.]*\d")//g 

This will remove all commas that are followed by a number ([\d,.]*\d) and a quote, thus removing only commas from numbers within quotes

next execute

s/"(\d[\d,.]*)"/\1/g

This will replace all numbers that are within quotes by the value without the quotes



来源:https://stackoverflow.com/questions/18624784/remove-double-quotes-and-comma-from-a-numeric-value-of-a-csv-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!