Set value for particular cell in pandas DataFrame using index

后端 未结 20 1612
野趣味
野趣味 2020-11-22 05:45

I\'ve created a Pandas DataFrame

df = DataFrame(index=[\'A\',\'B\',\'C\'], columns=[\'x\',\'y\'])

and got this

    x    y
A  NaN         


        
相关标签:
20条回答
  • 2020-11-22 06:32

    df.loc['c','x']=10 This will change the value of cth row and xth column.

    0 讨论(0)
  • 2020-11-22 06:32

    In addition to the answers above, here is a benchmark comparing different ways to add rows of data to an already existing dataframe. It shows that using at or set-value is the most efficient way for large dataframes (at least for these test conditions).

    • Create new dataframe for each row and...
      • ... append it (13.0 s)
      • ... concatenate it (13.1 s)
    • Store all new rows in another container first, convert to new dataframe once and append...
      • container = lists of lists (2.0 s)
      • container = dictionary of lists (1.9 s)
    • Preallocate whole dataframe, iterate over new rows and all columns and fill using
      • ... at (0.6 s)
      • ... set_value (0.4 s)

    For the test, an existing dataframe comprising 100,000 rows and 1,000 columns and random numpy values was used. To this dataframe, 100 new rows were added.

    Code see below:

    #!/usr/bin/env python3
    # -*- coding: utf-8 -*-
    """
    Created on Wed Nov 21 16:38:46 2018
    
    @author: gebbissimo
    """
    
    import pandas as pd
    import numpy as np
    import time
    
    NUM_ROWS = 100000
    NUM_COLS = 1000
    data = np.random.rand(NUM_ROWS,NUM_COLS)
    df = pd.DataFrame(data)
    
    NUM_ROWS_NEW = 100
    data_tot = np.random.rand(NUM_ROWS + NUM_ROWS_NEW,NUM_COLS)
    df_tot = pd.DataFrame(data_tot)
    
    DATA_NEW = np.random.rand(1,NUM_COLS)
    
    
    #%% FUNCTIONS
    
    # create and append
    def create_and_append(df):
        for i in range(NUM_ROWS_NEW):
            df_new = pd.DataFrame(DATA_NEW)
            df = df.append(df_new)
        return df
    
    # create and concatenate
    def create_and_concat(df):
        for i in range(NUM_ROWS_NEW):
            df_new = pd.DataFrame(DATA_NEW)
            df = pd.concat((df, df_new))
        return df
    
    
    # store as dict and 
    def store_as_list(df):
        lst = [[] for i in range(NUM_ROWS_NEW)]
        for i in range(NUM_ROWS_NEW):
            for j in range(NUM_COLS):
                lst[i].append(DATA_NEW[0,j])
        df_new = pd.DataFrame(lst)
        df_tot = df.append(df_new)
        return df_tot
    
    # store as dict and 
    def store_as_dict(df):
        dct = {}
        for j in range(NUM_COLS):
            dct[j] = []
            for i in range(NUM_ROWS_NEW):
                dct[j].append(DATA_NEW[0,j])
        df_new = pd.DataFrame(dct)
        df_tot = df.append(df_new)
        return df_tot
    
    
    
    
    # preallocate and fill using .at
    def fill_using_at(df):
        for i in range(NUM_ROWS_NEW):
            for j in range(NUM_COLS):
                #print("i,j={},{}".format(i,j))
                df.at[NUM_ROWS+i,j] = DATA_NEW[0,j]
        return df
    
    
    # preallocate and fill using .at
    def fill_using_set(df):
        for i in range(NUM_ROWS_NEW):
            for j in range(NUM_COLS):
                #print("i,j={},{}".format(i,j))
                df.set_value(NUM_ROWS+i,j,DATA_NEW[0,j])
        return df
    
    
    #%% TESTS
    t0 = time.time()    
    create_and_append(df)
    t1 = time.time()
    print('Needed {} seconds'.format(t1-t0))
    
    t0 = time.time()    
    create_and_concat(df)
    t1 = time.time()
    print('Needed {} seconds'.format(t1-t0))
    
    t0 = time.time()    
    store_as_list(df)
    t1 = time.time()
    print('Needed {} seconds'.format(t1-t0))
    
    t0 = time.time()    
    store_as_dict(df)
    t1 = time.time()
    print('Needed {} seconds'.format(t1-t0))
    
    t0 = time.time()    
    fill_using_at(df_tot)
    t1 = time.time()
    print('Needed {} seconds'.format(t1-t0))
    
    t0 = time.time()    
    fill_using_set(df_tot)
    t1 = time.time()
    print('Needed {} seconds'.format(t1-t0))
    
    0 讨论(0)
  • 2020-11-22 06:33

    Update: The .set_value method is going to be deprecated. .iat/.at are good replacements, unfortunately pandas provides little documentation


    The fastest way to do this is using set_value. This method is ~100 times faster than .ix method. For example:

    df.set_value('C', 'x', 10)

    0 讨论(0)
  • 2020-11-22 06:33

    This is the only thing that worked for me!

    df.loc['C', 'x'] = 10
    

    Learn more about .loc here.

    0 讨论(0)
  • 2020-11-22 06:33

    Here is a summary of the valid solutions provided by all users, for data frames indexed by integer and string.

    df.iloc, df.loc and df.at work for both type of data frames, df.iloc only works with row/column integer indices, df.loc and df.at supports for setting values using column names and / or integer indices.

    When the specified index does not exist, both df.loc and df.at would append the newly inserted rows/columns to the existing data frame, but df.iloc would raise "IndexError: positional indexers are out-of-bounds". A working example tested in Python 2.7 and 3.7 is as follows:

    import numpy as np, pandas as pd
    
    df1 = pd.DataFrame(index=np.arange(3), columns=['x','y','z'])
    df1['x'] = ['A','B','C']
    df1.at[2,'y'] = 400
    
    # rows/columns specified does not exist, appends new rows/columns to existing data frame
    df1.at['D','w'] = 9000
    df1.loc['E','q'] = 499
    
    # using df[<some_column_name>] == <condition> to retrieve target rows
    df1.at[df1['x']=='B', 'y'] = 10000
    df1.loc[df1['x']=='B', ['z','w']] = 10000
    
    # using a list of index to setup values
    df1.iloc[[1,2,4], 2] = 9999
    df1.loc[[0,'D','E'],'w'] = 7500
    df1.at[[0,2,"D"],'x'] = 10
    df1.at[:, ['y', 'w']] = 8000
    
    df1
    >>> df1
         x     y     z     w      q
    0   10  8000   NaN  8000    NaN
    1    B  8000  9999  8000    NaN
    2   10  8000  9999  8000    NaN
    D   10  8000   NaN  8000    NaN
    E  NaN  8000  9999  8000  499.0
    
    0 讨论(0)
  • 2020-11-22 06:36

    One way to use index with condition is first get the index of all the rows that satisfy your condition and then simply use those row indexes in a multiple of ways

    conditional_index = df.loc[ df['col name'] <condition> ].index
    

    Example condition is like

    ==5, >10 , =="Any string", >= DateTime
    

    Then you can use these row indexes in variety of ways like

    1. Replace value of one column for conditional_index
    df.loc[conditional_index , [col name]]= <new value>
    
    1. Replace value of multiple column for conditional_index
    df.loc[conditional_index, [col1,col2]]= <new value>
    
    1. One benefit with saving the conditional_index is that you can assign value of one column to another column with same row index
    df.loc[conditional_index, [col1,col2]]= df.loc[conditional_index,'col name']
    

    This is all possible because .index returns a array of index which .loc can use with direct addressing so it avoids traversals again and again.

    0 讨论(0)
提交回复
热议问题