How to reindex a pandas dataframe within a function?

為{幸葍}努か 提交于 2021-02-05 07:55:30

问题


I'm trying to add column headers with empty values to my dataframe (just like this answer), but within a function that is already modifying it, like so:

mydf = pd.DataFrame()

def myfunc(df):
  df['newcol1'] = np.nan  # this works

  list_of_newcols = ['newcol2', 'newcol3']
  df = df.reindex(columns=df.columns.tolist() + list_of_newcols)  # this does not
  return
myfunc(mydf)

If I run the lines individually in an IPython console, it will add them. But run as a script, newcol1 will be added but 2 and 3 will not. Setting copy=False does not work either. What am I doing wrong here?


回答1:


Pandas df.reindex() produces a new object unless the indexes are equivalent, so you will need to return the new object from your function.

def myfunc(df):
  df['newcol1'] = np.nan  # this works

  list_of_newcols = ['newcol2', 'newcol3']
  df = df.reindex(columns=df.columns.tolist + list_of_newcols)  # this does not
  return df

mydf = myfunc(mydf)



回答2:


Not sure if this is the mistake you made with the actual code or while you were typing it in here, but the tolist() is a function and you must add the brackets.

df = df.reindex(columns=df.columns.tolist() + list_of_newcols)



回答3:


You don't need to set NaN values and specify again new column labels. You can reindex with an arbitrary list of strings; NaN is the default value where data is not specified.

df = pd.DataFrame({'A': [1, 2, 3]})

df = df.reindex(columns=['A', 'B', 'C'])

print(df)

   A   B   C
0  1 NaN NaN
1  2 NaN NaN
2  3 NaN NaN


来源:https://stackoverflow.com/questions/54220501/how-to-reindex-a-pandas-dataframe-within-a-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!