Read specific columns from a csv file with csv module?

后端 未结 12 947
闹比i
闹比i 2020-11-22 10:06

I\'m trying to parse through a csv file and extract the data from only specific columns.

Example csv:

ID | N         


        
相关标签:
12条回答
  • 2020-11-22 10:57
    import csv
    from collections import defaultdict
    
    columns = defaultdict(list) # each value in each column is appended to a list
    
    with open('file.txt') as f:
        reader = csv.DictReader(f) # read rows into a dictionary format
        for row in reader: # read a row as {column1: value1, column2: value2,...}
            for (k,v) in row.items(): # go over each column name and value 
                columns[k].append(v) # append the value into the appropriate list
                                     # based on column name k
    
    print(columns['name'])
    print(columns['phone'])
    print(columns['street'])
    

    With a file like

    name,phone,street
    Bob,0893,32 Silly
    James,000,400 McHilly
    Smithers,4442,23 Looped St.
    

    Will output

    >>> 
    ['Bob', 'James', 'Smithers']
    ['0893', '000', '4442']
    ['32 Silly', '400 McHilly', '23 Looped St.']
    

    Or alternatively if you want numerical indexing for the columns:

    with open('file.txt') as f:
        reader = csv.reader(f)
        reader.next()
        for row in reader:
            for (i,v) in enumerate(row):
                columns[i].append(v)
    print(columns[0])
    
    >>> 
    ['Bob', 'James', 'Smithers']
    

    To change the deliminator add delimiter=" " to the appropriate instantiation, i.e reader = csv.reader(f,delimiter=" ")

    0 讨论(0)
  • 2020-11-22 11:03
    import pandas as pd 
    csv_file = pd.read_csv("file.csv") 
    column_val_list = csv_file.column_name._ndarray_values
    
    0 讨论(0)
  • 2020-11-22 11:06

    The only way you would be getting the last column from this code is if you don't include your print statement in your for loop.

    This is most likely the end of your code:

    for row in reader:
        content = list(row[i] for i in included_cols)
    print content
    

    You want it to be this:

    for row in reader:
            content = list(row[i] for i in included_cols)
            print content
    

    Now that we have covered your mistake, I would like to take this time to introduce you to the pandas module.

    Pandas is spectacular for dealing with csv files, and the following code would be all you need to read a csv and save an entire column into a variable:

    import pandas as pd
    df = pd.read_csv(csv_file)
    saved_column = df.column_name #you can also use df['column_name']
    

    so if you wanted to save all of the info in your column Names into a variable, this is all you need to do:

    names = df.Names
    

    It's a great module and I suggest you look into it. If for some reason your print statement was in for loop and it was still only printing out the last column, which shouldn't happen, but let me know if my assumption was wrong. Your posted code has a lot of indentation errors so it was hard to know what was supposed to be where. Hope this was helpful!

    0 讨论(0)
  • 2020-11-22 11:06

    Context: For this type of work you should use the amazing python petl library. That will save you a lot of work and potential frustration from doing things 'manually' with the standard csv module. AFAIK, the only people who still use the csv module are those who have not yet discovered better tools for working with tabular data (pandas, petl, etc.), which is fine, but if you plan to work with a lot of data in your career from various strange sources, learning something like petl is one of the best investments you can make. To get started should only take 30 minutes after you've done pip install petl. The documentation is excellent.

    Answer: Let's say you have the first table in a csv file (you can also load directly from the database using petl). Then you would simply load it and do the following.

    from petl import fromcsv, look, cut, tocsv 
    
    #Load the table
    table1 = fromcsv('table1.csv')
    # Alter the colums
    table2 = cut(table1, 'Song_Name','Artist_ID')
    #have a quick look to make sure things are ok. Prints a nicely formatted table to your console
    print look(table2)
    # Save to new file
    tocsv(table2, 'new.csv')
    
    0 讨论(0)
  • 2020-11-22 11:07

    I think there is an easier way

    import pandas as pd
    
    dataset = pd.read_csv('table1.csv')
    ftCol = dataset.iloc[:, 0].values
    

    So in here iloc[:, 0], : means all values, 0 means the position of the column. in the example below ID will be selected

    ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
    10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
    
    0 讨论(0)
  • 2020-11-22 11:09

    You can use numpy.loadtext(filename). For example if this is your database .csv:

    ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
    10 | Adam | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
    10 | Carl | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
    10 | Adolf | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
    10 | Den | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
    

    And you want the Name column:

    import numpy as np 
    b=np.loadtxt(r'filepath\name.csv',dtype=str,delimiter='|',skiprows=1,usecols=(1,))
    
    >>> b
    array([' Adam ', ' Carl ', ' Adolf ', ' Den '], 
          dtype='|S7')
    

    More easily you can use genfromtext:

    b = np.genfromtxt(r'filepath\name.csv', delimiter='|', names=True,dtype=None)
    >>> b['Name']
    array([' Adam ', ' Carl ', ' Adolf ', ' Den '], 
          dtype='|S7')
    
    0 讨论(0)
提交回复
热议问题