Source file delimiter issue

匆匆过客 提交于 2019-12-11 08:19:29

问题


I am facing one issue with my source file. Consider I have following data in file-

"dfjsdlfkj,fsdkfj,werkj",234234,234234,,"dfsd,etwetr"

here, the delimiter is comma, but some fields have comma as a part of data. Such fields are enclosed in double quotes. I want to extract few columns from the file.

If I use cut -d "," -f 1,3 then I am getting output like-

"dfjsdlfkj,werkj"

回答1:


I suggest you to use a csv parser. For example, python has one as a built-in module, so you only have to import it:

import sys 
import csv 

with open(sys.argv[1], newline='') as csvfile:
    csvreader = csv.reader(csvfile)
    csvwriter = csv.writer(sys.stdout)
    for row in csvreader:
        csvwriter.writerow([row[e] for e in (0,2)])

Assuming your example line is in an input file named infile, run the script as:

python3 script.py infile

That yields:

"dfjsdlfkj,fsdkfj,werkj",234234



回答2:


You could try the following:

awk -f getFields.awk input.txt

where input.txt is your input file, and getFields.awk is:

{
    split("",a)
    splitLine()
    print a[1],a[3]
}

function splitLine(s,indq,t,r,len) {
# Assumptions: 
#  * spaces before or after commas are ignored
#  * spaces at beginning or end of line is ignored

# definition of a quoted parameter:
# - starts with: (^ and $ are regexp characters)
#   a) ^"
#   b) ,"
# - ends with:
#   a) "$
#   b) ",

    s=$0; k=1
    s=removeBlanks(s)
    while (s) {
        if (substr(s,1,1)=="\"")
            indq=2
        else {
            sub(/[[:blank:]]*,[[:blank:]]*"/,",\"",s)
            indq=index(s,",\"")
            if (indq) {
                t=substr(s,1,indq-1)
                splitCommaString(t)
                indq=indq+2
            }
        }
        if (indq) {
            s=substr(s,indq)
            sub(/"[[:blank:]]*,/,"\",",s)
            len=index(s,"\",")  #find closing quote
            if (!len) {
                if (match(s,/"$/)) {
                    len=RSTART-1
                }
                else 
                    len=length(s)
                r=substr(s,1,len)
                s=""
            } else {
                r=substr(s,1,len-1)
                s=substr(s,len+2)
            }
            a[k++]=r
        } else  {
            splitCommaString(s)
            s=""
        }
    }
    k=k-1
}

function splitCommaString(t,b,i) {
    n=split(t,b,",")
    for (i=1; i<=n; i++)
        a[k++]=removeBlanks(b[i])
}       

function removeBlanks(r) {
    sub(/^[[:blank:]]*/,"",r)
    sub(/[[:blank:]]*$/,"",r)
    return r
}


来源:https://stackoverflow.com/questions/20220854/source-file-delimiter-issue

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!