Python: skip comment lines marked with # in csv.DictReader

前端 未结 4 1593
旧时难觅i
旧时难觅i 2020-12-01 01:35

Processing CSV files with csv.DictReader is great - but I have CSV files with comment lines in (indicated by a hash at the start of a line), for example:

# step s         


        
相关标签:
4条回答
  • 2020-12-01 02:08

    Just posting the bugfix from @sigvaldm's solution.

    def decomment(csvfile):
    for row in csvfile:
        raw = row.split('#')[0].strip()
        if raw: yield row
    
    with open('dummy.csv') as csvfile:
        reader = csv.reader(decomment(csvfile))
        for row in reader:
            print(row)
    

    A CSV line can contain "#" characters in quoted strings and is perfectly valid. The previous solution was cutting off strings containing '#' characters.

    0 讨论(0)
  • 2020-12-01 02:12

    Good question, and a good example of how Python's CSV library lacks important functionality, such as handling basic comments (not uncommon at the top of CSV files). While Dan Stowell's solution works for the specific case of the OP, it is limited in that # must appear as the first symbol. A more generic solution would be:

    def decomment(csvfile):
        for row in csvfile:
            raw = row.split('#')[0].strip()
            if raw: yield raw
    
    with open('dummy.csv') as csvfile:
        reader = csv.reader(decomment(csvfile))
        for row in reader:
            print(row)
    

    As an example, the following dummy.csv file:

    # comment
     # comment
    a,b,c # comment
    1,2,3
    10,20,30
    # comment
    

    returns

    ['a', 'b', 'c']
    ['1', '2', '3']
    ['10', '20', '30']
    

    Of course, this works just as well with csv.DictReader().

    0 讨论(0)
  • 2020-12-01 02:23

    Another way to read a CSV file is using pandas

    Here's a sample code:

    df = pd.read_csv('test.csv',
                     sep=',',     # field separator
                     comment='#', # comment
                     index_col=0, # number or label of index column
                     skipinitialspace=True,
                     skip_blank_lines=True,
                     error_bad_lines=False,
                     warn_bad_lines=True
                     ).sort_index()
    print(df)
    df.fillna('no value', inplace=True) # replace NaN with 'no value'
    print(df)
    

    For this csv file:

    a,b,c,d,e
    1,,16,,55#,,65##77
    8,77,77,,16#86,18#
    #This is a comment
    13,19,25,28,82
    

    we will get this output:

           b   c     d   e
    a                     
    1    NaN  16   NaN  55
    8   77.0  77   NaN  16
    13  19.0  25  28.0  82
               b   c         d   e
    a                             
    1   no value  16  no value  55
    8         77  77  no value  16
    13        19  25        28  82
    
    0 讨论(0)
  • 2020-12-01 02:29

    Actually this works nicely with filter:

    import csv
    fp = open('samples.csv')
    rdr = csv.DictReader(filter(lambda row: row[0]!='#', fp))
    for row in rdr:
        print(row)
    fp.close()
    
    0 讨论(0)
提交回复
热议问题