Use Multiple Character Delimiter in Python Pandas read_csv

后端 未结 4 1213
小蘑菇
小蘑菇 2020-11-28 15:59

It appears that the pandas read_csv function only allows single character delimiters/separators. Is there some way to allow for a string of characters to be

相关标签:
4条回答
  • 2020-11-28 16:15

    As Padraic Cunningham writes in the comment above, it's unclear why you want this. The Wiki entry for the CSV Spec states about delimiters:

    ... separated by delimiters (typically a single reserved character such as comma, semicolon, or tab; sometimes the delimiter may include optional spaces),

    It's unsurprising, that both the csv module and pandas don't support what you're asking.

    However, if you really want to do so, you're pretty much down to using Python's string manipulations. The following example shows how to turn the dataframe to a "csv" with $$ separating lines, and %% separating columns.

    '$$'.join('%%'.join(str(r) for r in rec) for rec in df.to_records())
    

    Of course, you don't have to turn it into a string like this prior to writing it into a file.

    0 讨论(0)
  • 2020-11-28 16:18

    Not a pythonic way but definitely a programming way, you can use something like this:

    import re
    
    def row_reader(row,fd):
        arr=[]
        in_arr = str.split(fd)
        i = 0
        while i < len(in_arr):
            if re.match('^".*',in_arr[i]) and not re.match('.*"$',in_arr[i]):
                flag = True
                buf=''
                while flag and i < len(in_arr):
                    buf += in_arr[i]
                    if re.match('.*"$',in_arr[i]):
                        flag = False
                    i+=1
                    buf += fd if flag else ''
                arr.append(buf)
            else:
                arr.append(in_arr[i])
                i+=1
        return arr
    
    with open(file_name,'r') as infile:
        for row in infile:
            for field in  row_reader(row,'%%'):
                print(field)
    
    0 讨论(0)
  • 2020-11-28 16:26

    Pandas does now support multi character delimiters

    import panda as pd
    pd.read_csv(csv_file, sep="\*\|\*")
    
    0 讨论(0)
  • 2020-11-28 16:27

    The solution would be to use read_table instead of read_csv:

    1*|*2*|*3*|*4*|*5
    12*|*12*|*13*|*14*|*15
    21*|*22*|*23*|*24*|*25
    

    So, we could read this with:

    pd.read_table('file.csv', header=None, sep='\*\|\*')
    
    0 讨论(0)
提交回复
热议问题