Read specific columns from a csv file with csv module?

后端 未结 12 946
闹比i
闹比i 2020-11-22 10:06

I\'m trying to parse through a csv file and extract the data from only specific columns.

Example csv:

ID | N         


        
相关标签:
12条回答
  • 2020-11-22 10:44

    If you need to process the columns separately, I like to destructure the columns with the zip(*iterable) pattern (effectively "unzip"). So for your example:

    ids, names, zips, phones = zip(*(
      (row[1], row[2], row[6], row[7])
      for row in reader
    ))
    
    0 讨论(0)
  • 2020-11-22 10:44
    SAMPLE.CSV
    a, 1, +
    b, 2, -
    c, 3, *
    d, 4, /
    column_names = ["Letter", "Number", "Symbol"]
    df = pd.read_csv("sample.csv", names=column_names)
    print(df)
    OUTPUT
      Letter  Number Symbol
    0      a       1      +
    1      b       2      -
    2      c       3      *
    3      d       4      /
    
    letters = df.Letter.to_list()
    print(letters)
    OUTPUT
    ['a', 'b', 'c', 'd']
    
    0 讨论(0)
  • 2020-11-22 10:47

    To fetch column name, instead of using readlines() better use readline() to avoid loop & reading the complete file & storing it in the array.

    with open(csv_file, 'rb') as csvfile:
    
        # get number of columns
    
        line = csvfile.readline()
    
        first_item = line.split(',')
    
    0 讨论(0)
  • 2020-11-22 10:48

    Use pandas:

    import pandas as pd
    my_csv = pd.read_csv(filename)
    column = my_csv.column_name
    # you can also use my_csv['column_name']
    

    Discard unneeded columns at parse time:

    my_filtered_csv = pd.read_csv(filename, usecols=['col1', 'col3', 'col7'])
    

    P.S. I'm just aggregating what other's have said in a simple manner. Actual answers are taken from here and here.

    0 讨论(0)
  • 2020-11-22 10:53

    With pandas you can use read_csv with usecols parameter:

    df = pd.read_csv(filename, usecols=['col1', 'col3', 'col7'])
    

    Example:

    import pandas as pd
    import io
    
    s = '''
    total_bill,tip,sex,smoker,day,time,size
    16.99,1.01,Female,No,Sun,Dinner,2
    10.34,1.66,Male,No,Sun,Dinner,3
    21.01,3.5,Male,No,Sun,Dinner,3
    '''
    
    df = pd.read_csv(io.StringIO(s), usecols=['total_bill', 'day', 'size'])
    print(df)
    
       total_bill  day  size
    0       16.99  Sun     2
    1       10.34  Sun     3
    2       21.01  Sun     3
    
    0 讨论(0)
  • 2020-11-22 10:55

    Thanks to the way you can index and subset a pandas dataframe, a very easy way to extract a single column from a csv file into a variable is:

    myVar = pd.read_csv('YourPath', sep = ",")['ColumnName']
    

    A few things to consider:

    The snippet above will produce a pandas Series and not dataframe. The suggestion from ayhan with usecols will also be faster if speed is an issue. Testing the two different approaches using %timeit on a 2122 KB sized csv file yields 22.8 ms for the usecols approach and 53 ms for my suggested approach.

    And don't forget import pandas as pd

    0 讨论(0)
提交回复
热议问题