Say I have the following csv file:
id,message,time
123,\"Sorry, This message
has commas and newlines\",2016-03-28T20:26:39
456,\"It makes the problem non
As chepner said, you are encouraged to use a programming language which is able to parse csv.
Here comes an example in python:
import csv
with open('a.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, quotechar='"')
for row in reader:
print(row[-1]) # row[-1] gives the last column
awk -F, '!/This/{print $NF}' file
time
2016-03-28T20:26:39
2016-03-28T20:26:41
another awk
alternative using FS
$ awk -F'"' '!(NF%2){getline remainder;$0=$0 OFS remainder}
NR>1{sub(/,/,"",$NF); print $NF}' file
2016-03-28T20:26:39
2016-03-28T20:26:41
As said here
gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' file.csv \
| awk -F, '{print $NF}'
To handle specifically those newlines that are in doubly-quoted strings and leave those alone that are outside them, using GNU awk
(for RT
):
gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' file
This works by splitting the file along "
characters and removing newlines in every other block.
Output
time
2016-03-28T20:26:39
2016-03-28T20:26:41
Then use awk to split the columns and display the last column
CSV is a format which needs a proper parser (i.e. which can't be parsed with regular expressions alone). If you have Python installed, use the csv module instead of plain BASH.
If not, consider csvkit which has a lot of powerful tools to process CSV files from the command line.
See also:
sed -e 's/,/\n/g' file.csv | egrep ^201[0-9]-