问题
Is there a way I could convert the xy_mean
function to be computed using the pandas library just like the y_mean
function. I found out that the pandas function Y_mean = pd.Series(PC_list).rolling(number).mean().dropna().to_numpy()
is way faster than the numpy version ym = (np.convolve(PC_list, np.ones(shape=(number)), mode='valid')/number)[:-1]
. The equation for the xy_mean
would be ((index of value)*value + (index of value)*value)/number
The index number would be dependent on the variable number
s value. So the first set of calculations for the example below would be (457.334015*1 + 424.440002*2 +394.795990*3)/number
and the next set of numbers would be (424.440002*2 +394.795990*3 + 408.903992*4)/number
and so on. If number = 4
Than the first set of calculations would be (457.334015*1 + 424.440002*2 +394.795990*3 +408.903992*4)/number
. The set mean calculations would go on until the end of the PC_list
array.
variables:
number = 3
PC_list= np.array([457.334015,424.440002,394.795990,408.903992,398.821014,402.152008,435.790985,423.204987,411.574005,
404.424988,399.519989,377.181000,375.467010,386.944000,383.614990,375.071991,359.511993,328.865997,
320.510010,330.079010,336.187012,352.940002,365.026001,361.562012,362.299011,378.549011,390.414001,
400.869995,394.773010,382.556000])
Vanilla python version:
y_mean = sum(PC_list[i:i+number])/number
xy_mean = sum([x * (i + 1) for i, x in enumerate(PC_list[i:i+number])])/number
Numpy versions:
y_mean = (np.convolve(PC_list, np.ones(shape=(number)), mode='valid')/number)[:-1]
xy_mean = (np.convolve(PC_list, np.arange(number, 0, -1), mode='valid'))[:-1]
Pandas version
Y_mean = pd.Series(PC_list).rolling(number).mean().dropna().to_numpy()
xy_mean = ?
回答1:
You would need to define a custom function for that, and pass it to rolling.apply:
>>> multiplier = np.arange(0, number)
>>> def xymean(series):
return series.mul(multiplier).sum()
>>> pd.Series(PC_list).rolling(number).apply(xymean).dropna().to_numpy()[:-1]
array([2490.601989, 2440.743958, 2409.067016, 2413.002044, 2510.497985,
2543.348939, 2516.922974, 2459.627961, 2418.983948, 2335.007966,
2280.283019, 2288.94702 , 2300.19998 , 2279.389953, 2212.294951,
2080.693968, 1978.774017, 1960.123047, 1989.229066, 2061.27304 ,
2137.145019, 2167.67804 , 2175.047058, 2221.807067, 2290.639036,
2361.986998, 2376.473021])
>>> (np.convolve(PC_list, np.arange(number, 0, -1), mode='valid'))[:-1]
array([2490.601989, 2440.743958, 2409.067016, 2413.002044, 2510.497985,
2543.348939, 2516.922974, 2459.627961, 2418.983948, 2335.007966,
2280.283019, 2288.94702 , 2300.19998 , 2279.389953, 2212.294951,
2080.693968, 1978.774017, 1960.123047, 1989.229066, 2061.27304 ,
2137.145019, 2167.67804 , 2175.047058, 2221.807067, 2290.639036,
2361.986998, 2376.473021])
However, this will be a little slower, owing to the apply
. Furthermore, it seems like your numpy
version creates xy_sum
as opposed to xy_mean
, to make it calculate mean
you would need:
>>> (np.convolve(PC_list, np.arange(number, 0, -1), mode='valid')/number)[:-1]
array([830.200663 , 813.58131933, 803.02233867, 804.33401467,
836.83266167, 847.78297967, 838.97432467, 819.875987 ,
806.32798267, 778.33598867, 760.09433967, 762.98234 ,
766.73332667, 759.796651 , 737.43165033, 693.564656 ,
659.591339 , 653.374349 , 663.07635533, 687.09101333,
712.381673 , 722.55934667, 725.015686 , 740.60235567,
763.54634533, 787.32899933, 792.15767367])
>>> def xymean(series):
return series.mul(multiplier).mean()
>>> pd.Series(PC_list).rolling(number).apply(xymean).dropna().to_numpy()[:-1]
array([830.200663 , 813.58131933, 803.02233867, 804.33401467,
836.83266167, 847.78297967, 838.97432467, 819.875987 ,
806.32798267, 778.33598867, 760.09433967, 762.98234 ,
766.73332667, 759.796651 , 737.43165033, 693.564656 ,
659.591339 , 653.374349 , 663.07635533, 687.09101333,
712.381673 , 722.55934667, 725.015686 , 740.60235567,
763.54634533, 787.32899933, 792.15767367])
来源:https://stackoverflow.com/questions/65866920/implementing-pandas-function-to-numpy-functions