问题:

I'm trying to multiply two existing columns in a pandas Dataframe (orders_df) - Prices (stock close price) and Amount (stock quantities) and add the calculation to a new column called 'Value'. For some reason when I run this code, all the rows under the 'Value' column are positive numbers, while some of the rows should be negative. Under the Action column in the DataFrame there are seven rows with the 'Sell' string and seven with the 'Buy' string.

for i in orders_df.Action:  if i  == 'Sell':   orders_df['Value'] = orders_df.Prices*orders_df.Amount  elif i == 'Buy':   orders_df['Value'] = -orders_df.Prices*orders_df.Amount)

Please let me know what i'm doing wrong !

回答1:

If we're willing to sacrifice the succinctness of Hayden's solution, one could also do something like this:

In [22]: orders_df['C'] = orders_df.Action.apply(                lambda x: (1 if x == 'Sell' else -1))  In [23]: orders_df   # New column C represents the sign of the transaction Out[23]:    Prices  Amount Action  C 0       3      57   Sell  1 1      89      42   Sell  1 2      45      70    Buy -1 3       6      43   Sell  1 4      60      47   Sell  1 5      19      16    Buy -1 6      56      89   Sell  1 7       3      28    Buy -1 8      56      69   Sell  1 9      90      49    Buy -1

Now we have eliminated the need for the if statement. Using DataFrame.apply(), we also do away with the for loop. As Hayden noted, vectorized operations are always faster.

In [24]: orders_df['Value'] = orders_df.Prices * orders_df.Amount * orders_df.C  In [25]: orders_df   # The resulting dataframe Out[25]:    Prices  Amount Action  C  Value 0       3      57   Sell  1    171 1      89      42   Sell  1   3738 2      45      70    Buy -1  -3150 3       6      43   Sell  1    258 4      60      47   Sell  1   2820 5      19      16    Buy -1   -304 6      56      89   Sell  1   4984 7       3      28    Buy -1    -84 8      56      69   Sell  1   3864 9      90      49    Buy -1  -4410

This solution takes two lines of code instead of one, but is a bit easier to read. I suspect that the computational costs are similar as well.

回答2:

I think an elegant solution is to use the where method (also see the API docs):

In [37]: values = df.Prices * df.Amount  In [38]: df['Values'] = values.where(df.Action == 'Sell', other=-values)  In [39]: df Out[39]:     Prices  Amount Action  Values 0       3      57   Sell     171 1      89      42   Sell    3738 2      45      70    Buy   -3150 3       6      43   Sell     258 4      60      47   Sell    2820 5      19      16    Buy    -304 6      56      89   Sell    4984 7       3      28    Buy     -84 8      56      69   Sell    3864 9      90      49    Buy   -4410

Further more this should be the fastest solution.

回答3:

You can use the DataFrame apply method:

order_df['Value'] = order_df.apply(lambda row: (row['Prices']*row['Amount']                                                if row['Action']=='Sell'                                                else -row['Prices']*row['Amount']),                                    axis=1)

It is usually faster to use these methods rather than over for loops.

回答4:

Since this question came up again, I think a good clean approach is using assign.

The code is quite expressive and self-describing:

df = df.assign(Value = lambda x: x.Prices * x.Amount * x.Action.replace({'Buy' : 1, 'Sell' : -1}))

回答5:

For me, this is the clearest and most intuitive:

values = [] for action in ['Sell','Buy']:     amounts = orders_df['Amounts'][orders_df['Action'==action]].values     if action == 'Sell':         prices = orders_df['Prices'][orders_df['Action'==action]].values     else:         prices = -1*orders_df['Prices'][orders_df['Action'==action]].values     values += list(amounts*prices)   orders_df['Values'] = values

The .values method returns a numpy array allowing you to easily multiply element-wise and then you can cumulatively generate a list by 'adding' to it.

转载请标明出处:I want to multiply two columns in a pandas DataFrame and add the result into a new column

文章来源: I want to multiply two columns in a pandas DataFrame and add the result into a new column

标签

pan