问题
I would like to perform the following operation in a pandas or pyspark dataframe but i still havent found a solution.
I want to subtract the values from consecutive columns in a dataframe.
The operation I am describing can be seen in the image below.
Bear in mind that the output dataframe wont have any values on first column as the first column in the input table cannot be subtracted by its previous one as it doesn't exist.
回答1:
diff has an axis
param so you can just do this in one step:
In [63]:
df = pd.DataFrame(np.random.rand(3, 4), ['row1', 'row2', 'row3'], ['A', 'B', 'C', 'D'])
df
Out[63]:
A B C D
row1 0.146855 0.250781 0.766990 0.756016
row2 0.528201 0.446637 0.576045 0.576907
row3 0.308577 0.592271 0.553752 0.512420
In [64]:
df.diff(axis=1)
Out[64]:
A B C D
row1 NaN 0.103926 0.516209 -0.010975
row2 NaN -0.081564 0.129408 0.000862
row3 NaN 0.283694 -0.038520 -0.041331
回答2:
df = pd.DataFrame(np.random.rand(3, 4), ['row1', 'row2', 'row3'], ['A', 'B', 'C', 'D'])
df.T.diff().T
来源:https://stackoverflow.com/questions/38321427/subtract-consecutive-columns-in-a-pandas-or-pyspark-dataframe