Apparently some csv output implementation somewhere truncates field separators from the right on the last row and only the last row in the file when the fields are null.
If you use for row in reader:
, it will just stop the loop after the last item has been read.
If you have an expectation of a fixed number of columns in each row, then you should be defensive against:
(1) ANY row being shorter -- e.g. a writer (SQL Server / Query Analyzer IIRC) may omit trailing NULLs at random; users may fiddle with the file using a text editor, including leaving blank lines.
(2) ANY row being longer -- e.g. commas not quoted properly.
You don't need any fancy tricks. Just an old-fashioned if-test in your row-reading loop:
for row in csv.reader(...):
ncols = len(row)
if ncols != expected_cols:
appropriate_action()
Could you not just catch the error when the csv reader reads the last line in a
try: ... do your stuff here... except: StopIteration
condition ?
See the following python code on stackoverflow for an example of how to use the try: catch: Python CSV DictReader/Writer issues
I am aware it is an old question, but I came up with a different answer than the ones presented. The reader
object already increments the line_num attribute as you iterate through it. Then I get the total number of lines at first using row_count
, then I compare it with the line_num
.
import csv
def row_count(filename):
with open(filename) as in_file:
return sum(1 for _ in in_file)
in_filename = 'somefile.csv'
reader = csv.reader(open(in_filename), delimiter='|')
last_line_number = row_count(in_filename)
for row in reader:
if last_line_number == reader.line_num:
print "It is the last line: %s" % row
Basically you only know you've run out after you've run out. So you could wrap the reader
iterator, e.g. as follows:
def isLast(itr):
old = itr.next()
for new in itr:
yield False, old
old = new
yield True, old
and change your code to:
for line_num, (is_last, row) in enumerate(isLast(reader)):
if not is_last: assert len(row) == len(header)
etc.
if you want to get exactly the last row try this code:
with open("\\".join([myPath,files]), 'r') as f:
print f.readlines()[-1] #or your own manipulations
If you want to continue working with values from row do the following:
f.readlines()[-1].split(",")[0] #this would let you get columns by their index