I\'ve created a Pandas DataFrame
df = DataFrame(index=[\'A\',\'B\',\'C\'], columns=[\'x\',\'y\'])
and got this
x y A NaN
df.loc['c','x']=10
This will change the value of cth row and
xth column.
In addition to the answers above, here is a benchmark comparing different ways to add rows of data to an already existing dataframe. It shows that using at or set-value is the most efficient way for large dataframes (at least for these test conditions).
For the test, an existing dataframe comprising 100,000 rows and 1,000 columns and random numpy values was used. To this dataframe, 100 new rows were added.
Code see below:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Wed Nov 21 16:38:46 2018
@author: gebbissimo
"""
import pandas as pd
import numpy as np
import time
NUM_ROWS = 100000
NUM_COLS = 1000
data = np.random.rand(NUM_ROWS,NUM_COLS)
df = pd.DataFrame(data)
NUM_ROWS_NEW = 100
data_tot = np.random.rand(NUM_ROWS + NUM_ROWS_NEW,NUM_COLS)
df_tot = pd.DataFrame(data_tot)
DATA_NEW = np.random.rand(1,NUM_COLS)
#%% FUNCTIONS
# create and append
def create_and_append(df):
for i in range(NUM_ROWS_NEW):
df_new = pd.DataFrame(DATA_NEW)
df = df.append(df_new)
return df
# create and concatenate
def create_and_concat(df):
for i in range(NUM_ROWS_NEW):
df_new = pd.DataFrame(DATA_NEW)
df = pd.concat((df, df_new))
return df
# store as dict and
def store_as_list(df):
lst = [[] for i in range(NUM_ROWS_NEW)]
for i in range(NUM_ROWS_NEW):
for j in range(NUM_COLS):
lst[i].append(DATA_NEW[0,j])
df_new = pd.DataFrame(lst)
df_tot = df.append(df_new)
return df_tot
# store as dict and
def store_as_dict(df):
dct = {}
for j in range(NUM_COLS):
dct[j] = []
for i in range(NUM_ROWS_NEW):
dct[j].append(DATA_NEW[0,j])
df_new = pd.DataFrame(dct)
df_tot = df.append(df_new)
return df_tot
# preallocate and fill using .at
def fill_using_at(df):
for i in range(NUM_ROWS_NEW):
for j in range(NUM_COLS):
#print("i,j={},{}".format(i,j))
df.at[NUM_ROWS+i,j] = DATA_NEW[0,j]
return df
# preallocate and fill using .at
def fill_using_set(df):
for i in range(NUM_ROWS_NEW):
for j in range(NUM_COLS):
#print("i,j={},{}".format(i,j))
df.set_value(NUM_ROWS+i,j,DATA_NEW[0,j])
return df
#%% TESTS
t0 = time.time()
create_and_append(df)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))
t0 = time.time()
create_and_concat(df)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))
t0 = time.time()
store_as_list(df)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))
t0 = time.time()
store_as_dict(df)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))
t0 = time.time()
fill_using_at(df_tot)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))
t0 = time.time()
fill_using_set(df_tot)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))
Update: The .set_value
method is going to be deprecated. .iat/.at
are good replacements, unfortunately pandas provides little documentation
The fastest way to do this is using set_value. This method is ~100 times faster than .ix
method. For example:
df.set_value('C', 'x', 10)
This is the only thing that worked for me!
df.loc['C', 'x'] = 10
Learn more about .loc
here.
Here is a summary of the valid solutions provided by all users, for data frames indexed by integer and string.
df.iloc, df.loc and df.at work for both type of data frames, df.iloc only works with row/column integer indices, df.loc and df.at supports for setting values using column names and / or integer indices.
When the specified index does not exist, both df.loc and df.at would append the newly inserted rows/columns to the existing data frame, but df.iloc would raise "IndexError: positional indexers are out-of-bounds". A working example tested in Python 2.7 and 3.7 is as follows:
import numpy as np, pandas as pd
df1 = pd.DataFrame(index=np.arange(3), columns=['x','y','z'])
df1['x'] = ['A','B','C']
df1.at[2,'y'] = 400
# rows/columns specified does not exist, appends new rows/columns to existing data frame
df1.at['D','w'] = 9000
df1.loc['E','q'] = 499
# using df[<some_column_name>] == <condition> to retrieve target rows
df1.at[df1['x']=='B', 'y'] = 10000
df1.loc[df1['x']=='B', ['z','w']] = 10000
# using a list of index to setup values
df1.iloc[[1,2,4], 2] = 9999
df1.loc[[0,'D','E'],'w'] = 7500
df1.at[[0,2,"D"],'x'] = 10
df1.at[:, ['y', 'w']] = 8000
df1
>>> df1
x y z w q
0 10 8000 NaN 8000 NaN
1 B 8000 9999 8000 NaN
2 10 8000 9999 8000 NaN
D 10 8000 NaN 8000 NaN
E NaN 8000 9999 8000 499.0
One way to use index with condition is first get the index of all the rows that satisfy your condition and then simply use those row indexes in a multiple of ways
conditional_index = df.loc[ df['col name'] <condition> ].index
Example condition is like
==5, >10 , =="Any string", >= DateTime
Then you can use these row indexes in variety of ways like
df.loc[conditional_index , [col name]]= <new value>
df.loc[conditional_index, [col1,col2]]= <new value>
df.loc[conditional_index, [col1,col2]]= df.loc[conditional_index,'col name']
This is all possible because .index returns a array of index which .loc can use with direct addressing so it avoids traversals again and again.