Is it possible to use read_csv to read only specific lines?

前端 未结 4 585
天涯浪人
天涯浪人 2021-01-02 14:00

I have a csv file that looks like this:

TEST  
2012-05-01 00:00:00.203 ON 1  
2012-05-01 00:00:11.203 OFF 0  
2012-05-01 00:00:22.203 ON 1  
2012-05-01 00:00         


        
相关标签:
4条回答
  • 2021-01-02 14:33

    When you get the row from the csv.reader, and when you can be sure that the first element is a string, then you can use

    if not row[0].startswith('TEST'):
        process(row)
    
    0 讨论(0)
  • 2021-01-02 14:34

    Another option, since I just ran into this problem also:

    import pandas as pd
    import subprocess
    grep = subprocess.check_output(['grep', '-n', '^TITLE', filename]).splitlines()
    bad_lines = [int(s[:s.index(':')]) - 1 for s in grep]
    df = pd.read_csv(filename, skiprows=bad_lines)
    

    It's less portable than @eumiro's (read: probably doesn't work on Windows) and requires reading the file twice, but has the advantage that you don't have to store the entire file contents in memory.

    You could of course do the same thing as the grep in Python, but it'd probably be slower.

    0 讨论(0)
  • 2021-01-02 14:41

    http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html?highlight=read_csv#pandas.io.parsers.read_csv

    skiprows : list-like or integer Row numbers to skip (0-indexed) or number of rows to skip (int)

    Pass [0, 6] to skip rows with "TEST".

    0 讨论(0)
  • 2021-01-02 14:46
    from cStringIO import StringIO
    import pandas
    
    s = StringIO()
    with open('file.csv') as f:
        for line in f:
            if not line.startswith('TEST'):
                s.write(line)
    s.seek(0) # "rewind" to the beginning of the StringIO object
    
    pandas.read_csv(s) # with further parameters…
    
    0 讨论(0)
提交回复
热议问题