Count how many records are in a CSV Python?

后端 未结 16 1325
无人共我
无人共我 2020-11-29 16:43

I\'m using python (Django Framework) to read a CSV file. I pull just 2 lines out of this CSV as you can see. What I have been trying to do is store in a variable the total n

相关标签:
16条回答
  • 2020-11-29 17:16

    2018-10-29 EDIT

    Thank you for the comments.

    I tested several kinds of code to get the number of lines in a csv file in terms of speed. The best method is below.

    with open(filename) as f:
        sum(1 for line in f)
    

    Here is the code tested.

    import timeit
    import csv
    import pandas as pd
    
    filename = './sample_submission.csv'
    
    def talktime(filename, funcname, func):
        print(f"# {funcname}")
        t = timeit.timeit(f'{funcname}("{filename}")', setup=f'from __main__ import {funcname}', number = 100) / 100
        print('Elapsed time : ', t)
        print('n = ', func(filename))
        print('\n')
    
    def sum1forline(filename):
        with open(filename) as f:
            return sum(1 for line in f)
    talktime(filename, 'sum1forline', sum1forline)
    
    def lenopenreadlines(filename):
        with open(filename) as f:
            return len(f.readlines())
    talktime(filename, 'lenopenreadlines', lenopenreadlines)
    
    def lenpd(filename):
        return len(pd.read_csv(filename)) + 1
    talktime(filename, 'lenpd', lenpd)
    
    def csvreaderfor(filename):
        cnt = 0
        with open(filename) as f:
            cr = csv.reader(f)
            for row in cr:
                cnt += 1
        return cnt
    talktime(filename, 'csvreaderfor', csvreaderfor)
    
    def openenum(filename):
        cnt = 0
        with open(filename) as f:
            for i, line in enumerate(f,1):
                cnt += 1
        return cnt
    talktime(filename, 'openenum', openenum)
    

    The result was below.

    # sum1forline
    Elapsed time :  0.6327946722068599
    n =  2528244
    
    
    # lenopenreadlines
    Elapsed time :  0.655304473598555
    n =  2528244
    
    
    # lenpd
    Elapsed time :  0.7561274056295324
    n =  2528244
    
    
    # csvreaderfor
    Elapsed time :  1.5571560935772661
    n =  2528244
    
    
    # openenum
    Elapsed time :  0.773000013928679
    n =  2528244
    

    In conclusion, sum(1 for line in f) is fastest. But there might not be significant difference from len(f.readlines()).

    sample_submission.csv is 30.2MB and has 31 million characters.

    0 讨论(0)
  • 2020-11-29 17:18

    Several of the above suggestions count the number of LINES in the csv file. But some CSV files will contain quoted strings which themselves contain newline characters. MS CSV files usually delimit records with \r\n, but use \n alone within quoted strings.

    For a file like this, counting lines of text (as delimited by newline) in the file will give too large a result. So for an accurate count you need to use csv.reader to read the records.

    0 讨论(0)
  • 2020-11-29 17:19

    To do it you need to have a bit of code like my example here:

    file = open("Task1.csv")
    numline = len(file.readlines())
    print (numline)
    

    I hope this helps everyone.

    0 讨论(0)
  • 2020-11-29 17:26

    Use "list" to fit a more workably object.

    You can then count, skip, mutate till your heart's desire:

    list(fileObject) #list values
    
    len(list(fileObject)) # get length of file lines
    
    list(fileObject)[10:] # skip first 10 lines
    
    0 讨论(0)
  • 2020-11-29 17:28

    First you have to open the file with open

    input_file = open("nameOfFile.csv","r+")
    

    Then use the csv.reader for open the csv

    reader_file = csv.reader(input_file)
    

    At the last, you can take the number of row with the instruction 'len'

    value = len(list(reader_file))
    

    The total code is this:

    input_file = open("nameOfFile.csv","r+")
    reader_file = csv.reader(input_file)
    value = len(list(reader_file))
    

    Remember that if you want to reuse the csv file, you have to make a input_file.fseek(0), because when you use a list for the reader_file, it reads all file, and the pointer in the file change its position

    0 讨论(0)
  • 2020-11-29 17:28

    This works for csv and all files containing strings in Unix-based OSes:

    import os
    
    numOfLines = int(os.popen('wc -l < file.csv').read()[:-1])
    

    In case the csv file contains a fields row you can deduct one from numOfLines above:

    numOfLines = numOfLines - 1
    
    0 讨论(0)
提交回复
热议问题