可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I'm trying to multiply two existing columns in a pandas Dataframe (orders_df) - Prices (stock close price) and Amount (stock quantities) and add the calculation to a new column called 'Value'. For some reason when I run this code, all the rows under the 'Value' column are positive numbers, while some of the rows should be negative. Under the Action column in the DataFrame there are seven rows with the 'Sell' string and seven with the 'Buy' string.
for i in orders_df.Action: if i == 'Sell': orders_df['Value'] = orders_df.Prices*orders_df.Amount elif i == 'Buy': orders_df['Value'] = -orders_df.Prices*orders_df.Amount)
Please let me know what i'm doing wrong !
回答1:
If we're willing to sacrifice the succinctness of Hayden's solution, one could also do something like this:
In [22]: orders_df['C'] = orders_df.Action.apply( lambda x: (1 if x == 'Sell' else -1)) In [23]: orders_df # New column C represents the sign of the transaction Out[23]: Prices Amount Action C 0 3 57 Sell 1 1 89 42 Sell 1 2 45 70 Buy -1 3 6 43 Sell 1 4 60 47 Sell 1 5 19 16 Buy -1 6 56 89 Sell 1 7 3 28 Buy -1 8 56 69 Sell 1 9 90 49 Buy -1
Now we have eliminated the need for the if
statement. Using DataFrame.apply()
, we also do away with the for
loop. As Hayden noted, vectorized operations are always faster.
In [24]: orders_df['Value'] = orders_df.Prices * orders_df.Amount * orders_df.C In [25]: orders_df # The resulting dataframe Out[25]: Prices Amount Action C Value 0 3 57 Sell 1 171 1 89 42 Sell 1 3738 2 45 70 Buy -1 -3150 3 6 43 Sell 1 258 4 60 47 Sell 1 2820 5 19 16 Buy -1 -304 6 56 89 Sell 1 4984 7 3 28 Buy -1 -84 8 56 69 Sell 1 3864 9 90 49 Buy -1 -4410
This solution takes two lines of code instead of one, but is a bit easier to read. I suspect that the computational costs are similar as well.
回答2:
I think an elegant solution is to use the where
method (also see the API docs
):
In [37]: values = df.Prices * df.Amount In [38]: df['Values'] = values.where(df.Action == 'Sell', other=-values) In [39]: df Out[39]: Prices Amount Action Values 0 3 57 Sell 171 1 89 42 Sell 3738 2 45 70 Buy -3150 3 6 43 Sell 258 4 60 47 Sell 2820 5 19 16 Buy -304 6 56 89 Sell 4984 7 3 28 Buy -84 8 56 69 Sell 3864 9 90 49 Buy -4410
Further more this should be the fastest solution.
回答3:
You can use the DataFrame apply
method:
order_df['Value'] = order_df.apply(lambda row: (row['Prices']*row['Amount'] if row['Action']=='Sell' else -row['Prices']*row['Amount']), axis=1)
It is usually faster to use these methods rather than over for loops.
回答4:
Since this question came up again, I think a good clean approach is using assign.
The code is quite expressive and self-describing:
df = df.assign(Value = lambda x: x.Prices * x.Amount * x.Action.replace({'Buy' : 1, 'Sell' : -1}))
回答5:
For me, this is the clearest and most intuitive:
values = [] for action in ['Sell','Buy']: amounts = orders_df['Amounts'][orders_df['Action'==action]].values if action == 'Sell': prices = orders_df['Prices'][orders_df['Action'==action]].values else: prices = -1*orders_df['Prices'][orders_df['Action'==action]].values values += list(amounts*prices) orders_df['Values'] = values
The .values
method returns a numpy array
allowing you to easily multiply element-wise and then you can cumulatively generate a list by 'adding' to it.