Pandas: how to read csv with multiple lines on the same cell?

后端 未结 1 537
再見小時候
再見小時候 2021-01-21 09:28

I have a csv that I am not able to read using read_csv Opening the csv with sublime text shows something like:

col1,col2,c         


        
相关标签:
1条回答
  • 2021-01-21 10:19

    It looks like you'll have to preprocess the data manually:

    with open('data.csv','r') as f:
        lines = f.read().splitlines()
    processed = []
    cum_c = 0
    buffer = ''
    for line in lines:
        buffer += line # Append the current line to a buffer
        c = buffer.count(',')
        if cum_c == 2:
            processed.append(line)
            buffer = ''
        elif cum_c > 2:
            raise # This should never happen
    

    This assumes that your data only contains unwanted newlines, e.g. if you had data with say, 3 elements in one row, 2 elements in the next, then the next row should either be blank or contain only 1 element. If it has 2 or more, i.e. it's missing a necessary newline, then an error is thrown. You can accommodate this case if necessary with a minor modification.

    Actually, it might be more efficient to remove newlines instead, but it shouldn't matter unless you have a lot of data.

    0 讨论(0)
提交回复
热议问题