Pandas的时间序列数据(26)

馋奶兔 提交于 2019-12-11 00:27:01

Pandas的时间序列数据-datetime

(本章节内容较多,右上角有目录导航,可参考部分内容定位)

时间序列数据在金融、经济、神经科学、物理学里都是一种重要的结构化的数据表现形式,以时间为基本组织领域内的观测值并进行相应的分析,即时间序列分析的主要目的是根据已有的历史数据对未来进行预测。经济数据中大多数以时间序列的形式给出。根据观察时间的不同,时间序列中的时间可以是年份、季度、月份或其他任何时间形式。pandas 最基本的时间序列类型就是以时间戳(TimeStamp)为 index 元素的 Series 类型。 Python和Pandas里提供大量的内建工具、模块可以用来创建时间序列类型的数据。

  • datetime模块,Python的datetime标准模块下的1).date子类可以创建日期时间序列的数据、2).time子类可创建小时分时间数据,而3).子类datetime则可以描述日期小时分数据。
import datetime
cur = datetime.datetime(2018,12,30, 15,30,59)
print cur,type(cur)
d = datetime.date(2018,12,30)
print d
t = datetime.datetime(2018,12,30).now()
print t

程序的执行结果:

2018-12-30 15:30:59 <type 'datetime.datetime'>
2018-12-30
2018-12-16 15:35:42.757826

4).可以使用datetime的timedelta模块给出时间间隔(差)。

import datetime
cur0 = datetime.datetime(2018,12,30, 15,30,59)
print cur0
cur1 = cur0 + datetime.timedelta(days = 1)
print cur1
cur2 = cur0 + datetime.timedelta(minutes = 10)
print cur2
cur3 = cur0 + datetime.timedelta(minutes = 29,seconds = 1)
print cur3

程序执行结果:

2018-12-30 15:30:59 #cur0
2018-12-31 15:30:59 #cur1
2018-12-30 15:40:59 #cur2
2018-12-30 16:00:00 #cur3
  • 用datetime数据创建time series时间序列数据。意思就是用datetime创建的时间作为index。
from datetime import datetime, timedelta
import numpy as np
import pandas as pd
b = datetime(2018,12,16, 17,30,55)
vi = np.random.randn(60)
ind = []
for x in range(60):
    bi = b + timedelta(minutes = x)
    ind.append(bi)
ts = pd.Series(vi, index = ind)
print ts[:5]

程序执行结果:

2018-12-16 17:30:55   -1.469098
2018-12-16 17:31:55   -0.583046
2018-12-16 17:32:55   -0.775167
2018-12-16 17:33:55   -0.740570
2018-12-16 17:34:55   -0.287118
dtype: float64

结果的第一列是时间,间隔1分钟,第2列是数据值。语句ts = pd.Series(vi, index = ind)

 

Pandas的时间序列数据-Timestamp创建

在pandas里可以使用pandas.tslib.Timestamp类来实现时间序列,本章就Timestamp进行展开,了解该类的基本使用。

from pandas.tslib import Timestamp
cur0 = Timestamp("2018-12-26 17:30:36")
print cur0
cur0 = Timestamp("17:30:36")
print cur0

程序执行结果:

2018-12-26 17:30:36
2018-12-16 17:30:36

使用pandas的timedelta模块实现时间的间隔。

from pandas.tslib import Timestamp
cur0 = Timestamp("2018-12-26 17:30:36")
print cur0
cur0 = Timestamp("17:30:36")
print cur0
import pandas as pd
from datetime import datetime
cur1 = cur0 + pd.Timedelta(days = 1)
print cur1
cur2 = datetime(2018,12,16,17,30, 36) + pd.Timedelta(days = 1)
print cur2

程序执行结果:

2018-12-26 17:30:36 # cur0
2018-12-16 17:30:36 # cur0
2018-12-17 17:30:36 # cur1
2018-12-17 17:30:36 # cur2

利用pandas的timedelta构造时间序列数据:

import numpy as np
import pandas as pd
b = datetime(2018,12,16, 17,30,55)
vi = np.random.randn(60)
ind = []
for x in range(60):
    bi = b + pd.Timedelta(minutes = x)
    ind.append(bi)
ts = pd.Series(vi, index = ind)
print ts[:5]

程序的执行结果:

2018-12-16 17:30:55   -0.816316
2018-12-16 17:31:55   -0.914680
2018-12-16 17:32:55   -0.304760
2018-12-16 17:33:55   -1.339267
2018-12-16 17:34:55    1.578459
dtype: float64

Pandas的时间序列数据-date_range函数

在pandas里可以使用date_range函数产生时间集合,即一系列的时间。有点儿像range函数,但是形参不是整数而是时间。

  • freq设置一定的时间间隔。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16', '2019-01-01', freq = "2D")
print cur0
cur1 = pd.date_range('12/16/2018', '2019-01-01', freq = "W")
print cur1
cur2 = pd.date_range('2018-12-16 17:30:30', '2019-01-01', freq = "6H")
print cur2
cur3 = pd.date_range('2018-12-16', '2019-08-01', freq = "M")
print cur3
cur4 = pd.date_range('2010-12-16', '2019-01-01', freq = "Y")
print cur4
cur5 = pd.date_range('2010', '2019', freq = "AS")
print cur5

程序的执行结果:

DatetimeIndex(['2018-12-16', '2018-12-18', '2018-12-20', '2018-12-22',
               '2018-12-24', '2018-12-26', '2018-12-28', '2018-12-30',
               '2019-01-01'],
              dtype='datetime64[ns]', freq='2D')
DatetimeIndex(['2018-12-16', '2018-12-23', '2018-12-30'], dtype='datetime64[ns]', freq='W-SUN')
DatetimeIndex(['2018-12-16 17:30:30', '2018-12-16 23:30:30',
               '2018-12-17 05:30:30', '2018-12-17 11:30:30',
               '2018-12-17 17:30:30', '2018-12-17 23:30:30',
               '2018-12-18 05:30:30', '2018-12-18 11:30:30',
               '2018-12-18 17:30:30', '2018-12-18 23:30:30'],
              dtype='datetime64[ns]', freq='6H')
DatetimeIndex(['2018-12-31', '2019-01-31', '2019-02-28', '2019-03-31',
               '2019-04-30', '2019-05-31', '2019-06-30', '2019-07-31'],
              dtype='datetime64[ns]', freq='M')
DatetimeIndex(['2010-12-31', '2011-12-31', '2012-12-31', '2013-12-31',
               '2014-12-31', '2015-12-31', '2016-12-31', '2017-12-31',
               '2018-12-31'],
              dtype='datetime64[ns]', freq='A-DEC')
DatetimeIndex(['2010-01-01', '2011-01-01', '2012-01-01', '2013-01-01',
               '2014-01-01', '2015-01-01', '2016-01-01', '2017-01-01',
               '2018-01-01', '2019-01-01'],
              dtype='datetime64[ns]', freq='AS-JAN')

freq="2D"是间隔两天,freq='6H'则为间隔6小时,freq='M'间隔以月为单位。更多的date_range函数的freq参数,常用的参考参数值如下表

Alias Description
B business day frequency
C custom business day frequency
D calendar day frequency
W weekly frequency
M month end frequency
SM semi-month end frequency (15th and end of month)
BM business month end frequency
CBM custom business month end frequency
MS month start frequency
SMS semi-month start frequency (1st and 15th)
BMS business month start frequency
CBMS custom business month start frequency
Q quarter end frequency
BQ business quarter end frequency
QS quarter start frequency
BQS business quarter start frequency
A, Y year end frequency
BA, BY business year end frequency
AS, YS year start frequency
BAS, BYS business year start frequency
BH business hour frequency
H hourly frequency
T, min minutely frequency
S secondly frequency
L, ms milliseconds
U, us microseconds
N nanoseconds

表里的T是分钟,而B则是工作日的意思。接下来可以借助date_range来创建一个时间序列。

import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16', '2019-02-05', freq = "B")
#print cur0, len(cur0)
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts[:14]

程序执行结果:

2018-12-17    0.128278
2018-12-18   -0.128049
2018-12-19    0.872805
2018-12-20   -0.809540
2018-12-21   -0.104894
2018-12-24    0.720047
2018-12-25    0.965698
2018-12-26    0.926640
2018-12-27   -1.505794
2018-12-28    0.246031
2018-12-31   -0.536505
2019-01-01    1.609414
2019-01-02    0.459005
2019-01-03    0.347774
Freq: B, dtype: float64

从结果第一列可以看出周六、周日时间不存在,freq = "B"只产生工作日的时间。

下面的例子是产生都是周几的时间。

import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16', '2019-02-05', freq = "W-WED")
print cur0

程序执行结果:

DatetimeIndex(['2018-12-19', '2018-12-26', '2019-01-02', '2019-01-09',
               '2019-01-16', '2019-01-23', '2019-01-30'],
              dtype='datetime64[ns]', freq='W-WED')
  • period设置时间的个数。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods=5, freq='2h20min')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

执行结果:

2018-12-16 18:30:34   -0.289575
2018-12-16 20:50:34   -0.782106
2018-12-16 23:10:34    0.152276
2018-12-17 01:30:34   -0.661511
2018-12-17 03:50:34   -1.676650
Freq: 140T, dtype: float64

 

Pandas的时间序列数据-date_range参数详解

  • freq = "T",按分钟为间隔(频率)产生时间序列,等价于"min"。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods=5, freq='T')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-16 18:30:34', periods=5, freq='min')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-16 18:30:34    0.489893
2018-12-16 18:31:34    0.000442
2018-12-16 18:32:34   -0.465273
2018-12-16 18:33:34   -0.173814
2018-12-16 18:34:34   -0.603672
Freq: T, dtype: float64
2018-12-16 18:30:34    0.690540
2018-12-16 18:31:34   -0.815213
2018-12-16 18:32:34    0.460163
2018-12-16 18:33:34    1.515437
2018-12-16 18:34:34   -0.832920
Freq: T, dtype: float64
  • freq = "S",则是以秒为频率产生时间序列。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods=5, freq='3T10S')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-16 18:30:34   -1.078270
2018-12-16 18:33:44   -0.120087
2018-12-16 18:36:54    1.863152
2018-12-16 18:40:04   -0.601866
2018-12-16 18:43:14    0.881057
Freq: 190S, dtype: float64

这里的时间间隔频率为3分10秒。

  • freq = "H",则是以小时为频率产生时间序列。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods=5, freq='2H')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-16 18:30:34   -0.182473
2018-12-16 20:30:34    1.037907
2018-12-16 22:30:34   -0.175579
2018-12-17 00:30:34   -0.586400
2018-12-17 02:30:34   -0.334369
Freq: 2H, dtype: float64

从结果可看出时间序列前后相差2小时。

  • freq = "B",则是以工作日为频率产生时间序列。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods = 10, freq='B')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-17 18:30:34    0.011285
2018-12-18 18:30:34    0.972737
2018-12-19 18:30:34    0.109900
2018-12-20 18:30:34   -0.969465
2018-12-21 18:30:34   -0.885282
2018-12-24 18:30:34   -1.722596
2018-12-25 18:30:34    0.678189
2018-12-26 18:30:34    0.402022
2018-12-27 18:30:34   -0.740186
2018-12-28 18:30:34    1.302828
Freq: B, dtype: float64

22、23日为周六、周日结果里缺少。

  • freq = "D",则是以日为频率产生时间序列。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods = 10, freq='2D')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-16 18:30:34    0.327716
2018-12-18 18:30:34    0.784813
2018-12-20 18:30:34    1.432993
2018-12-22 18:30:34    1.148707
2018-12-24 18:30:34    0.996547
2018-12-26 18:30:34   -0.210021
2018-12-28 18:30:34   -0.175977
2018-12-30 18:30:34    0.473569
2019-01-01 18:30:34    0.642001
2019-01-03 18:30:34    0.675140
Freq: 2D, dtype: float64

结果里的日期时间序列是日在发生变化,相差2天。

  • freq = "W",则是以周为频率产生时间序列,默认以周日为起点来构造即"W-SUN"。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W-SUN')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-16 18:30:34    0.557365
2018-12-23 18:30:34   -0.306496
2018-12-30 18:30:34   -1.172465
2019-01-06 18:30:34    0.434073
2019-01-13 18:30:34    0.106500
2019-01-20 18:30:34    0.773861
2019-01-27 18:30:34   -0.236211
2019-02-03 18:30:34   -0.303260
2019-02-10 18:30:34    0.974439
2019-02-17 18:30:34   -0.356273
Freq: W-SUN, dtype: float64
2018-12-16 18:30:34    0.180012
2018-12-23 18:30:34   -0.977006
2018-12-30 18:30:34    0.095408
2019-01-06 18:30:34   -0.097709
2019-01-13 18:30:34   -0.401469
2019-01-20 18:30:34   -0.283461
2019-01-27 18:30:34   -1.138246
2019-02-03 18:30:34   -1.675089
2019-02-10 18:30:34    0.511324
2019-02-17 18:30:34    0.728807
Freq: W-SUN, dtype: float64

时间的起点是2018-12-15周六,产生的结果第一条是2018-12-16周日,每条时间相差7天,共10条记录(periods = 10)。

import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W-Fri')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-16 18:30:34   -1.133046
2018-12-23 18:30:34   -1.083898
2018-12-30 18:30:34   -1.503690
2019-01-06 18:30:34   -0.866094
2019-01-13 18:30:34   -0.945356
2019-01-20 18:30:34    0.021928
2019-01-27 18:30:34   -0.591696
2019-02-03 18:30:34   -1.710630
2019-02-10 18:30:34    2.121283
2019-02-17 18:30:34    0.739256
Freq: W-SUN, dtype: float64
2018-12-21 18:30:34    2.082080
2018-12-28 18:30:34    1.368807
2019-01-04 18:30:34    0.599276
2019-01-11 18:30:34   -0.149521
2019-01-18 18:30:34    1.134686
2019-01-25 18:30:34   -0.582935
2019-02-01 18:30:34   -0.470655
2019-02-08 18:30:34    0.983203
2019-02-15 18:30:34   -0.067618
2019-02-22 18:30:34   -0.736081
Freq: W-FRI, dtype: float64

语句cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W-Fri')则是从2018-12-15(周六)开始产生都是星期五的时间序列,共10个时间,2018-12-15后的第一个星期五是2018-12-21,第二个周五则是2018-12-28。因此"W-FRI"则是产生每周几这样的一个时间序列。

  • freq = "M",则是以月为频率产生时间序列,以月末为时间点,而freq = "MS"则是以月初为时间点。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='M')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='MS')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-31 18:30:34    2.844877
2019-01-31 18:30:34   -0.405763
2019-02-28 18:30:34    1.048116
2019-03-31 18:30:34   -0.353364
2019-04-30 18:30:34    1.146974
2019-05-31 18:30:34   -2.594504
2019-06-30 18:30:34    1.149964
2019-07-31 18:30:34    0.152655
2019-08-31 18:30:34    0.456799
2019-09-30 18:30:34    0.356193
Freq: M, dtype: float64
2019-01-01 18:30:34   -0.410882
2019-02-01 18:30:34   -1.349693
2019-03-01 18:30:34    0.363404
2019-04-01 18:30:34    0.352792
2019-05-01 18:30:34    0.334477
2019-06-01 18:30:34    0.181288
2019-07-01 18:30:34   -0.936703
2019-08-01 18:30:34   -0.512834
2019-09-01 18:30:34   -0.243987
2019-10-01 18:30:34    0.727383
Freq: MS, dtype: float64

2018-12-15后的第一个月末日期为2018-12-31,第一个月初为2019-01-01

  • freq = "BM",则是以月末工作日为频率产生时间序列,但不是每月的最后一天。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='BM')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='BMS')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-31 18:30:34    0.338989
2019-01-31 18:30:34   -0.074689
2019-02-28 18:30:34   -1.309663
2019-03-29 18:30:34    0.139394
2019-04-30 18:30:34   -0.519024
2019-05-31 18:30:34    0.573932
2019-06-28 18:30:34    0.551329
2019-07-31 18:30:34   -0.849871
2019-08-30 18:30:34   -0.685058
2019-09-30 18:30:34   -0.160009
Freq: BM, dtype: float64
2019-01-01 18:30:34    0.499660
2019-02-01 18:30:34   -0.912324
2019-03-01 18:30:34    0.412629
2019-04-01 18:30:34    1.222422
2019-05-01 18:30:34   -0.618880
2019-06-03 18:30:34    0.132562
2019-07-01 18:30:34    0.721672
2019-08-01 18:30:34   -1.086498
2019-09-02 18:30:34   -1.670070
2019-10-01 18:30:34   -2.165835
Freq: BMS, dtype: float64

注意2019-03-29不是3月的最后一天,2019-03-302019-03-31非工作日。 而2019-06-03也非6月第一天,但是工作日,而2019-06-012019-06-02为休息日。

  • freq = "Q",则是以季度(末)为频率产生时间序列,freq = "QS"是以季度(初)。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='q')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='qs')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-31 18:30:34    0.364439
2019-03-31 18:30:34   -0.295537
2019-06-30 18:30:34    0.562707
2019-09-30 18:30:34   -0.226738
2019-12-31 18:30:34    0.623051
2020-03-31 18:30:34   -0.675792
2020-06-30 18:30:34   -0.848371
2020-09-30 18:30:34   -0.805518
2020-12-31 18:30:34   -0.061498
2021-03-31 18:30:34    0.291014
Freq: Q-DEC, dtype: float64
2019-01-01 18:30:34   -0.236873
2019-04-01 18:30:34   -1.399436
2019-07-01 18:30:34    1.011018
2019-10-01 18:30:34    1.254754
2020-01-01 18:30:34   -0.569184
2020-04-01 18:30:34   -1.480181
2020-07-01 18:30:34   -0.396710
2020-10-01 18:30:34    1.157218
2021-01-01 18:30:34   -0.119259
2021-04-01 18:30:34    0.773836
Freq: QS-JAN, dtype: float64

当然Q也可以和B组合,像之前的M一样。

  • freq = "A",则是以年(末)为频率产生时间序列,freq = "AS"则是年初。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='a')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='as')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-31 18:30:34   -0.058588
2019-12-31 18:30:34   -0.676757
2020-12-31 18:30:34   -0.368606
2021-12-31 18:30:34   -0.820318
2022-12-31 18:30:34    0.959945
2023-12-31 18:30:34   -0.144216
2024-12-31 18:30:34    0.827481
2025-12-31 18:30:34    1.812374
2026-12-31 18:30:34   -1.473202
2027-12-31 18:30:34   -1.633083
Freq: A-DEC, dtype: float64
2019-01-01 18:30:34   -0.037793
2020-01-01 18:30:34    1.067194
2021-01-01 18:30:34   -1.517820
2022-01-01 18:30:34   -0.101716
2023-01-01 18:30:34    0.413106
2024-01-01 18:30:34   -0.912453
2025-01-01 18:30:34    0.197084
2026-01-01 18:30:34   -0.513032
2027-01-01 18:30:34   -0.027010
2028-01-01 18:30:34   -0.263569
Freq: AS-JAN, dtype: float64

 

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!