python-pandas-numpy DAY_16(1)

懵懂的女人 提交于 2020-01-24 18:55:50

学习内容
pandas的基本操作
重点
1.更改数组的索引,并对新索对应内赋值

import numpy as np
import pandas as pd

dates = pd.date_range("20190124", periods=6)
df = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates,
                  columns=['A', 'B', 'C', 'D'])
df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ["E"])
print(df)
print(df1)
df1.loc[dates[1:3], "E"] = 2
print(df1)

结果如下
             A   B   C   D
2020-01-24   0   1   2   3
2020-01-25   4   5   6   7
2020-01-26   8   9  10  11
2020-01-27  12  13  14  15
2020-01-28  16  17  18  19
2020-01-29  20  21  22  23
               A     B     C     D   E
2020-01-24   0.0   1.0   2.0   3.0 NaN
2020-01-25   4.0   5.0   6.0   7.0 NaN
2020-01-26   8.0   9.0  10.0  11.0 NaN
2020-01-27  12.0  13.0  14.0  15.0 NaN
               A     B     C     D    E
2020-01-24   0.0   1.0   2.0   3.0  NaN
2020-01-25   4.0   5.0   6.0   7.0  2.0
2020-01-26   8.0   9.0  10.0  11.0  2.0
2020-01-27  12.0  13.0  14.0  15.0  NaN

2.判断数组中是否有空数据并对其进行删除或者附上默认值

print(pd.isnull(df1).any())

A    False
B    False
C    False
D    False
E     True
dtype: bool




print(pd.isnull(df1))

                A      B      C      D      E
2020-01-24  False  False  False  False   True
2020-01-25  False  False  False  False  False
2020-01-26  False  False  False  False  False
2020-01-27  False  False  False  False   True



df2 = df1.dropna()
print(df2)
              A    B     C     D    E
2020-01-25  4.0  5.0   6.0   7.0  2.0
2020-01-26  8.0  9.0  10.0  11.0  2.0



df3 = df1.fillna(value=5.0)
print(df3)
               A     B     C     D    E
2020-01-24   0.0   1.0   2.0   3.0  5.0
2020-01-25   4.0   5.0   6.0   7.0  2.0
2020-01-26   8.0   9.0  10.0  11.0  2.0
2020-01-27  12.0  13.0  14.0  15.0  5.0

!NAN(空值)不进行计算

k = pd.Series([1, 3, 5, np.nan, 6, 8], index=dates).shift(2)#创建一个序列
print(df.sub(k, axis=0))

               A     B     C     D
2020-01-24   NaN   NaN   NaN   NaN
2020-01-25   NaN   NaN   NaN   NaN
2020-01-26   7.0   8.0   9.0  10.0
2020-01-27   9.0  10.0  11.0  12.0
2020-01-28  11.0  12.0  13.0  14.0
2020-01-29   NaN   NaN   NaN   NaN

3.apply函数的应用
我们可以通过定义函数来操作数组

print(df.apply((lambda x:x.max()-x.min()),axis=1))
2020-01-24    3
2020-01-25    3
2020-01-26    3
2020-01-27    3
2020-01-28    3
2020-01-29    3





print(df.apply((lambda x:x.max()-x.min()),axis=0))
A    20
B    20
C    20
D    20

4.数组的拼接

k1 = pd.DataFrame(np.random.randn(10, 4), columns=list("ABCD"))
print(k1)#创建数组
          A         B         C         D
0  0.992862 -1.055113 -0.974825 -0.277429
1 -2.192020  0.010336 -0.359606 -0.396194
2  0.127347  1.635541  0.229802  0.451141
3 -0.027055 -0.416263 -0.076526  0.130186
4  1.430352  0.115233 -0.906570 -0.137904
5  1.248494 -0.443819  2.787698  0.276130
6  0.055125  0.546643  0.741180  0.915405
7 -0.778035  0.413182  1.730721  1.425731
8  0.750959 -1.330331  0.137999 -1.399205
9 -0.946425 -0.964356 -0.821974 -0.261646


k2 = pd.concat(([k1.iloc[:3], k1.iloc[3:7], k1.iloc[7:]]))
print(k2)#拼接
          A         B         C         D
0  0.992862 -1.055113 -0.974825 -0.277429
1 -2.192020  0.010336 -0.359606 -0.396194
2  0.127347  1.635541  0.229802  0.451141
3 -0.027055 -0.416263 -0.076526  0.130186
4  1.430352  0.115233 -0.906570 -0.137904
5  1.248494 -0.443819  2.787698  0.276130
6  0.055125  0.546643  0.741180  0.915405
7 -0.778035  0.413182  1.730721  1.425731
8  0.750959 -1.330331  0.137999 -1.399205
9 -0.946425 -0.964356 -0.821974 -0.261646


对比发现我们创建的数组,在拼接之后内容不变!
5.数组的合并

b1 = pd.DataFrame({'key': ["foo", "foo"], "level": [1, 2]})
b2 = pd.DataFrame({'key': ["foo", "foo"], "levels": [3, 2]})
print(b1)
print(b2)
b3 = pd.merge(b1, b2, on="key")#通过key来拼接
print(b3)


   key  level
0  foo      1
1  foo      2
   key  levels
0  foo       3
1  foo       2
   key  level  levels
0  foo      1       3
1  foo      1       2
2  foo      2       3
3  foo      2       2

7.插入数据

dates = pd.date_range("20200124", periods=6)
k1 = pd.DataFrame(np.random.randn(10, 4), columns=list("ABCD"))
df = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates,
                  columns=['A', 'B', 'C', 'D'])
print(df.append(k1, ignore_index=True))#append语句操作

            A          B          C          D
0    0.000000   1.000000   2.000000   3.000000
1    4.000000   5.000000   6.000000   7.000000
2    8.000000   9.000000  10.000000  11.000000
3   12.000000  13.000000  14.000000  15.000000
4   16.000000  17.000000  18.000000  19.000000
5   20.000000  21.000000  22.000000  23.000000
6    0.513074  -0.019724   0.266624  -0.660174
7   -1.095335  -1.778028   1.012710  -2.805666
8    0.068861  -0.530661   0.377946  -0.380027
9   -1.195802  -0.502530  -0.270067  -0.329765
10  -0.746135   0.053221   0.813126  -0.003984
11   0.101050   1.130641  -0.540327  -1.511843
12  -0.543655  -1.849691   0.970787  -0.710726
13   0.657702   1.031124  -0.391400   1.630099
14   0.917138   2.269298  -1.821373  -0.996168
15   0.069104   0.228138  -0.272084   0.776543




print(df.append(k1))#不加上ignore_index=Ture,这样索引就是用自己的
                             A          B          C          D
2020-01-24 00:00:00   0.000000   1.000000   2.000000   3.000000
2020-01-25 00:00:00   4.000000   5.000000   6.000000   7.000000
2020-01-26 00:00:00   8.000000   9.000000  10.000000  11.000000
2020-01-27 00:00:00  12.000000  13.000000  14.000000  15.000000
2020-01-28 00:00:00  16.000000  17.000000  18.000000  19.000000
2020-01-29 00:00:00  20.000000  21.000000  22.000000  23.000000
0                     1.169509  -0.639238   0.802631   1.222766
1                     0.256757   0.059154   1.374081  -1.899945
2                    -1.640777  -0.563769   0.889861   1.162652
3                    -1.501253  -0.111475  -0.816034   0.550832
4                     0.671567   0.769691  -1.315246   0.464230
5                     1.468028   0.996928  -0.340389  -0.340204
6                     0.995253   2.082980  -2.283605  -0.264507
7                    -1.037400  -0.376944   0.458426  -0.650974
8                    -2.138595   2.230048   1.843457  -0.175943
9                     1.313964  -0.154494  -0.960270  -0.792406

8.按条件分组

print(df.groupby("A").sum())#按A分组在求和
     B   C   D
A             
0    1   2   3
4    5   6   7
8    9  10  11
12  13  14  15
16  17  18  19
20  21  22  23


print(df.groupby(["A","B"]).sum())#按照A,B分组,注意AB是列表内容,区分单个分组所,这里需要中括号

        C   D
A  B         
0  1    2   3
4  5    6   7
8  9   10  11
12 13  14  15
16 17  18  19
20 21  22  23
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!