1、生成4行9列的标准正态伪随机数,并为其添加列名和日期的行索引。
a = np.random.standard_normal((9,4)) # 生成9行4列标准正态分布伪随机数
a.round(6)
# DataFrame函数参数: data, index, columns, dtype, copy
df = pd.DataFrame(a)
df.columns = ['No1', 'No2', 'No3', 'No4']
dates = pd.date_range('2015-1-1', periods=9, freq='M')
df.index = dates
print(df)
输出结果:
No1 No2 No3 No4
2015-01-31 -0.991982 -0.179645 -0.045374 0.956476
2015-02-28 1.831624 2.043415 -0.986586 1.073712
2015-03-31 1.508740 2.262742 0.022831 0.543690
2015-04-30 -0.101480 1.321918 0.144027 -0.055291
2015-05-31 -0.445274 -0.242240 -0.188466 0.803470
2015-06-30 0.170965 0.156727 1.015219 -0.430204
2015-07-31 0.282361 -0.199567 -0.942330 -0.619416
2015-08-31 0.916740 1.006132 0.845850 -0.272923
2015-09-30 2.415978 0.162904 -1.095476 0.971750
2、基本内建分析方法
print(df.sum())
print(df.mean())
print(df.cumsum())
print(df.describe())
# 也可对DataFrame对象应用大部分NumPy通用函数
print(np.sqrt(df))
# pandas可以处理不完整数据集,不考虑NaN值,只使用其他可用值
print(np.sqrt(df).sum())
3、绘图
import matplotlib.pyplot as plt
df.cumsum().plot(lw=2.0)
plt.show()
DataFrame主要方法也可以用于Series。
#DataFrame的主要方法也可用于Series对象
import matplotlib.pyplot as plt
df['No1'].cumsum().plot(style='r', lw=2.)
plt.xlabel('date')
plt.ylabel('value')
plt.show()
4、GroupBy分组操作
pandas分组功能类似SQL中的分组
#为了进行分组,我们添加一列季度值
df['Quarter'] = ['Q1','Q1','Q1','Q2','Q2','Q2','Q3','Q3','Q3']
print(df)
groups = df.groupby('Quarter')
print(type(groups))
print(groups.mean())
print(groups.max())
print(groups.size())
输出结果:
No1 No2 No3 No4 Quarter
2015-01-31 1.192926 1.497219 -0.008608 0.118276 Q1
2015-02-28 -0.572158 0.152690 1.065210 1.664274 Q1
2015-03-31 0.589818 -2.149275 -0.191612 -0.358717 Q1
2015-04-30 0.178717 -0.025317 0.355695 0.695708 Q2
2015-05-31 -0.470604 -0.324468 0.443879 0.948667 Q2
2015-06-30 0.186913 -0.636450 0.474471 -0.329598 Q2
2015-07-31 -0.166176 0.987839 -1.176883 0.615347 Q3
2015-08-31 3.501118 -2.028843 -0.533715 -0.510510 Q3
2015-09-30 -1.682162 -1.335962 0.355475 0.767243 Q3
<class 'pandas.core.groupby.generic.DataFrameGroupBy'>
No1 No2 No3 No4
Quarter
Q1 0.403529 -0.166455 0.288330 0.474611
Q2 -0.034991 -0.328745 0.424682 0.438259
Q3 0.550927 -0.792322 -0.451708 0.290693
No1 No2 No3 No4
Quarter
Q1 1.192926 1.497219 1.065210 1.664274
Q2 0.186913 -0.025317 0.474471 0.948667
Q3 3.501118 0.987839 0.355475 0.767243
Quarter
Q1 3
Q2 3
Q3 3
dtype: int64
分组可以在多列上进行
#分组可以在多列上进行,为此再添加一列,表示月份是技术还是偶数
df['Odd_Even'] = ['Odd','Even','Odd','Even','Odd','Even','Odd','Even','Odd']
groups = df.groupby(['Quarter', 'Odd_Even'])
print(groups.size())
print(groups.mean())
输出结果:
Quarter Odd_Even
Q1 Even 1
Odd 2
Q2 Even 2
Odd 1
Q3 Even 1
Odd 2
dtype: int64
No1 No2 No3 No4
Quarter Odd_Even
Q1 Even -0.572158 0.152690 1.065210 1.664274
Odd 0.891372 -0.326028 -0.100110 -0.120221
Q2 Even 0.182815 -0.330884 0.415083 0.183055
Odd -0.470604 -0.324468 0.443879 0.948667
Q3 Even 3.501118 -2.028843 -0.533715 -0.510510
Odd -0.924169 -0.174062 -0.410704 0.691295
来源:CSDN
作者:suiyl2009
链接:https://blog.csdn.net/suiyl2009/article/details/104319842