Overriding a pandas DataFrame column with dictionary values, where the dictionary keys match a non-index column?

后端 未结 1 1527
夕颜
夕颜 2021-01-23 04:33

I have a DataFrame df, and a dict d, like so:

>>> df
   a   b
0  5  10
1  6  11
2  7  12
3  8  13
4  9  14
>>> d = {6         


        
1条回答
  •  逝去的感伤
    2021-01-23 05:09

    Assuming it would be OK to propagate the new values to all rows where column a matches (in the event there were duplicates in column a) then:

    for a_val, b_val in d.iteritems():
        df['b'][df.a==a_val] = b_val
    

    or to avoid chaining assignment operations:

    for a_val, b_val in d.iteritems():
        df.loc[df.a==a_val, 'b'] = b_val
    

    Note that to use loc you must be working with Pandas 0.11 or newer. For older versions, you may be able to use .ix to prevent the chained assignment.

    @Jeff pointed to this link which discusses a phenomenon that I had already mentioned in this comment. Note that this is not an issue of correctness, since reversing the order of access has a predictable effect. You can see this easily, e.g. below:

    In [102]: id(df[df.a==5]['b'])
    Out[102]: 113795992
    
    In [103]: id(df['b'][df.a==5])
    Out[103]: 113725760
    

    If you get the column first and then assign based on indexes into that column, the changes effect that column. And since the column is part of the DataFrame, the changes effect the DataFrame. If you index a set of rows first, you're now no longer talking about the same DataFrame, so getting the column from the filtered object won't give you a view of the original column.

    @Jeff suggests that this makes it "incorrect" whereas my view is that this is the obvious and expected behavior. In the special case when you have a mixed data type column and there is some type promotion/demotion going on that would prevent Pandas from writing a value into the column, then you might have a correctness issue with this. But given that loc is not available until Pandas 0.11, I think it's still fair to point out how to do it with chained assignment, rather than pretending like loc is the only thing that could possibly ever be the correct choice.

    If any one can provide more definitive reasons to think it is "incorrect" (as opposed to just not preferring this stylistically), please contribute and I will try to make a more thorough write-up about the various pitfalls.

    0 讨论(0)
提交回复
热议问题