Processing CSV files with csv.DictReader is great - but I have CSV files with comment lines in (indicated by a hash at the start of a line), for example:
# step s
Just posting the bugfix from @sigvaldm's solution.
def decomment(csvfile):
for row in csvfile:
raw = row.split('#')[0].strip()
if raw: yield row
with open('dummy.csv') as csvfile:
reader = csv.reader(decomment(csvfile))
for row in reader:
print(row)
A CSV line can contain "#" characters in quoted strings and is perfectly valid. The previous solution was cutting off strings containing '#' characters.
Good question, and a good example of how Python's CSV library lacks important functionality, such as handling basic comments (not uncommon at the top of CSV files). While Dan Stowell's solution works for the specific case of the OP, it is limited in that #
must appear as the first symbol. A more generic solution would be:
def decomment(csvfile):
for row in csvfile:
raw = row.split('#')[0].strip()
if raw: yield raw
with open('dummy.csv') as csvfile:
reader = csv.reader(decomment(csvfile))
for row in reader:
print(row)
As an example, the following dummy.csv
file:
# comment
# comment
a,b,c # comment
1,2,3
10,20,30
# comment
returns
['a', 'b', 'c']
['1', '2', '3']
['10', '20', '30']
Of course, this works just as well with csv.DictReader()
.
Another way to read a CSV file is using pandas
Here's a sample code:
df = pd.read_csv('test.csv',
sep=',', # field separator
comment='#', # comment
index_col=0, # number or label of index column
skipinitialspace=True,
skip_blank_lines=True,
error_bad_lines=False,
warn_bad_lines=True
).sort_index()
print(df)
df.fillna('no value', inplace=True) # replace NaN with 'no value'
print(df)
For this csv file:
a,b,c,d,e
1,,16,,55#,,65##77
8,77,77,,16#86,18#
#This is a comment
13,19,25,28,82
we will get this output:
b c d e
a
1 NaN 16 NaN 55
8 77.0 77 NaN 16
13 19.0 25 28.0 82
b c d e
a
1 no value 16 no value 55
8 77 77 no value 16
13 19 25 28 82
Actually this works nicely with filter
:
import csv
fp = open('samples.csv')
rdr = csv.DictReader(filter(lambda row: row[0]!='#', fp))
for row in rdr:
print(row)
fp.close()