Set value for particular cell in pandas DataFrame using index

后端未结

关注

 20  1612

I\'ve created a Pandas DataFrame

df = DataFrame(index=[\'A\',\'B\',\'C\'], columns=[\'x\',\'y\'])

and got this

    x    y
A  NaN

相关标签:

20条回答

轻奢々

2020-11-22 06:32

df.loc['c','x']=10 This will change the value of cth row and xth column.

0 讨论(0)
发布评论:

提交评论
- 加载中...

半阙折子戏

2020-11-22 06:32

In addition to the answers above, here is a benchmark comparing different ways to add rows of data to an already existing dataframe. It shows that using at or set-value is the most efficient way for large dataframes (at least for these test conditions).

Create new dataframe for each row and...
- ... append it (13.0 s)
- ... concatenate it (13.1 s)
Store all new rows in another container first, convert to new dataframe once and append...
- container = lists of lists (2.0 s)
- container = dictionary of lists (1.9 s)
Preallocate whole dataframe, iterate over new rows and all columns and fill using
- ... at (0.6 s)
- ... set_value (0.4 s)

For the test, an existing dataframe comprising 100,000 rows and 1,000 columns and random numpy values was used. To this dataframe, 100 new rows were added.

Code see below:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Wed Nov 21 16:38:46 2018

@author: gebbissimo
"""

import pandas as pd
import numpy as np
import time

NUM_ROWS = 100000
NUM_COLS = 1000
data = np.random.rand(NUM_ROWS,NUM_COLS)
df = pd.DataFrame(data)

NUM_ROWS_NEW = 100
data_tot = np.random.rand(NUM_ROWS + NUM_ROWS_NEW,NUM_COLS)
df_tot = pd.DataFrame(data_tot)

DATA_NEW = np.random.rand(1,NUM_COLS)


#%% FUNCTIONS

# create and append
def create_and_append(df):
    for i in range(NUM_ROWS_NEW):
        df_new = pd.DataFrame(DATA_NEW)
        df = df.append(df_new)
    return df

# create and concatenate
def create_and_concat(df):
    for i in range(NUM_ROWS_NEW):
        df_new = pd.DataFrame(DATA_NEW)
        df = pd.concat((df, df_new))
    return df


# store as dict and 
def store_as_list(df):
    lst = [[] for i in range(NUM_ROWS_NEW)]
    for i in range(NUM_ROWS_NEW):
        for j in range(NUM_COLS):
            lst[i].append(DATA_NEW[0,j])
    df_new = pd.DataFrame(lst)
    df_tot = df.append(df_new)
    return df_tot

# store as dict and 
def store_as_dict(df):
    dct = {}
    for j in range(NUM_COLS):
        dct[j] = []
        for i in range(NUM_ROWS_NEW):
            dct[j].append(DATA_NEW[0,j])
    df_new = pd.DataFrame(dct)
    df_tot = df.append(df_new)
    return df_tot




# preallocate and fill using .at
def fill_using_at(df):
    for i in range(NUM_ROWS_NEW):
        for j in range(NUM_COLS):
            #print("i,j={},{}".format(i,j))
            df.at[NUM_ROWS+i,j] = DATA_NEW[0,j]
    return df


# preallocate and fill using .at
def fill_using_set(df):
    for i in range(NUM_ROWS_NEW):
        for j in range(NUM_COLS):
            #print("i,j={},{}".format(i,j))
            df.set_value(NUM_ROWS+i,j,DATA_NEW[0,j])
    return df


#%% TESTS
t0 = time.time()    
create_and_append(df)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))

t0 = time.time()    
create_and_concat(df)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))

t0 = time.time()    
store_as_list(df)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))

t0 = time.time()    
store_as_dict(df)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))

t0 = time.time()    
fill_using_at(df_tot)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))

t0 = time.time()    
fill_using_set(df_tot)
t1 = time.time()
print('Needed {} seconds'.format(t1-t0))

0 讨论(0)

情歌与酒

2020-11-22 06:33

Update: The .set_value method is going to be deprecated. .iat/.at are good replacements, unfortunately pandas provides little documentation

The fastest way to do this is using set_value. This method is ~100 times faster than .ix method. For example:

df.set_value('C', 'x', 10)

0 讨论(0)
发布评论:

提交评论
- 加载中...
余生分开走

2020-11-22 06:33
This is the only thing that worked for me!
```
df.loc['C', 'x'] = 10
```
Learn more about .loc here.
0 讨论(0)
发布评论:

提交评论
- 加载中...

野性不改

2020-11-22 06:33

Here is a summary of the valid solutions provided by all users, for data frames indexed by integer and string.

df.iloc, df.loc and df.at work for both type of data frames, df.iloc only works with row/column integer indices, df.loc and df.at supports for setting values using column names and / or integer indices.

When the specified index does not exist, both df.loc and df.at would append the newly inserted rows/columns to the existing data frame, but df.iloc would raise "IndexError: positional indexers are out-of-bounds". A working example tested in Python 2.7 and 3.7 is as follows:

import numpy as np, pandas as pd

df1 = pd.DataFrame(index=np.arange(3), columns=['x','y','z'])
df1['x'] = ['A','B','C']
df1.at[2,'y'] = 400

# rows/columns specified does not exist, appends new rows/columns to existing data frame
df1.at['D','w'] = 9000
df1.loc['E','q'] = 499

# using df[<some_column_name>] == <condition> to retrieve target rows
df1.at[df1['x']=='B', 'y'] = 10000
df1.loc[df1['x']=='B', ['z','w']] = 10000

# using a list of index to setup values
df1.iloc[[1,2,4], 2] = 9999
df1.loc[[0,'D','E'],'w'] = 7500
df1.at[[0,2,"D"],'x'] = 10
df1.at[:, ['y', 'w']] = 8000

df1
>>> df1
     x     y     z     w      q
0   10  8000   NaN  8000    NaN
1    B  8000  9999  8000    NaN
2   10  8000  9999  8000    NaN
D   10  8000   NaN  8000    NaN
E  NaN  8000  9999  8000  499.0

0 讨论(0)

天涯浪人

2020-11-22 06:36
One way to use index with condition is first get the index of all the rows that satisfy your condition and then simply use those row indexes in a multiple of ways
```
conditional_index = df.loc[ df['col name'] <condition> ].index
```
Example condition is like
```
==5, >10 , =="Any string", >= DateTime
```
Then you can use these row indexes in variety of ways like
1. Replace value of one column for conditional_index
```
df.loc[conditional_index , [col name]]= <new value>
```
1. Replace value of multiple column for conditional_index
```
df.loc[conditional_index, [col1,col2]]= <new value>
```
1. One benefit with saving the conditional_index is that you can assign value of one column to another column with same row index
```
df.loc[conditional_index, [col1,col2]]= df.loc[conditional_index,'col name']
```
This is all possible because .index returns a array of index which .loc can use with direct addressing so it avoids traversals again and again.
0 讨论(0)
发布评论:

提交评论
- 加载中...