Pandas的时间序列数据-datetime
(本章节内容较多,右上角有目录导航,可参考部分内容定位)
时间序列数据在金融、经济、神经科学、物理学里都是一种重要的结构化的数据表现形式,以时间为基本组织领域内的观测值并进行相应的分析,即时间序列分析的主要目的是根据已有的历史数据对未来进行预测。经济数据中大多数以时间序列的形式给出。根据观察时间的不同,时间序列中的时间可以是年份、季度、月份或其他任何时间形式。pandas 最基本的时间序列类型就是以时间戳(TimeStamp)为 index 元素的 Series 类型。 Python和Pandas里提供大量的内建工具、模块可以用来创建时间序列类型的数据。
- datetime模块,Python的datetime标准模块下的1).date子类可以创建日期时间序列的数据、2).time子类可创建小时分时间数据,而3).子类datetime则可以描述日期小时分数据。
import datetime
cur = datetime.datetime(2018,12,30, 15,30,59)
print cur,type(cur)
d = datetime.date(2018,12,30)
print d
t = datetime.datetime(2018,12,30).now()
print t
程序的执行结果:
2018-12-30 15:30:59 <type 'datetime.datetime'>
2018-12-30
2018-12-16 15:35:42.757826
4).可以使用datetime的timedelta模块给出时间间隔(差)。
import datetime
cur0 = datetime.datetime(2018,12,30, 15,30,59)
print cur0
cur1 = cur0 + datetime.timedelta(days = 1)
print cur1
cur2 = cur0 + datetime.timedelta(minutes = 10)
print cur2
cur3 = cur0 + datetime.timedelta(minutes = 29,seconds = 1)
print cur3
程序执行结果:
2018-12-30 15:30:59 #cur0
2018-12-31 15:30:59 #cur1
2018-12-30 15:40:59 #cur2
2018-12-30 16:00:00 #cur3
- 用datetime数据创建time series时间序列数据。意思就是用datetime创建的时间作为index。
from datetime import datetime, timedelta
import numpy as np
import pandas as pd
b = datetime(2018,12,16, 17,30,55)
vi = np.random.randn(60)
ind = []
for x in range(60):
bi = b + timedelta(minutes = x)
ind.append(bi)
ts = pd.Series(vi, index = ind)
print ts[:5]
程序执行结果:
2018-12-16 17:30:55 -1.469098
2018-12-16 17:31:55 -0.583046
2018-12-16 17:32:55 -0.775167
2018-12-16 17:33:55 -0.740570
2018-12-16 17:34:55 -0.287118
dtype: float64
结果的第一列是时间,间隔1分钟,第2列是数据值。语句ts = pd.Series(vi, index = ind)
Pandas的时间序列数据-Timestamp创建
在pandas里可以使用pandas.tslib.Timestamp类来实现时间序列,本章就Timestamp进行展开,了解该类的基本使用。
from pandas.tslib import Timestamp
cur0 = Timestamp("2018-12-26 17:30:36")
print cur0
cur0 = Timestamp("17:30:36")
print cur0
程序执行结果:
2018-12-26 17:30:36
2018-12-16 17:30:36
使用pandas的timedelta模块实现时间的间隔。
from pandas.tslib import Timestamp
cur0 = Timestamp("2018-12-26 17:30:36")
print cur0
cur0 = Timestamp("17:30:36")
print cur0
import pandas as pd
from datetime import datetime
cur1 = cur0 + pd.Timedelta(days = 1)
print cur1
cur2 = datetime(2018,12,16,17,30, 36) + pd.Timedelta(days = 1)
print cur2
程序执行结果:
2018-12-26 17:30:36 # cur0
2018-12-16 17:30:36 # cur0
2018-12-17 17:30:36 # cur1
2018-12-17 17:30:36 # cur2
利用pandas的timedelta构造时间序列数据:
import numpy as np
import pandas as pd
b = datetime(2018,12,16, 17,30,55)
vi = np.random.randn(60)
ind = []
for x in range(60):
bi = b + pd.Timedelta(minutes = x)
ind.append(bi)
ts = pd.Series(vi, index = ind)
print ts[:5]
程序的执行结果:
2018-12-16 17:30:55 -0.816316
2018-12-16 17:31:55 -0.914680
2018-12-16 17:32:55 -0.304760
2018-12-16 17:33:55 -1.339267
2018-12-16 17:34:55 1.578459
dtype: float64
Pandas的时间序列数据-date_range函数
在pandas里可以使用date_range函数产生时间集合,即一系列的时间。有点儿像range函数,但是形参不是整数而是时间。
- freq设置一定的时间间隔。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16', '2019-01-01', freq = "2D")
print cur0
cur1 = pd.date_range('12/16/2018', '2019-01-01', freq = "W")
print cur1
cur2 = pd.date_range('2018-12-16 17:30:30', '2019-01-01', freq = "6H")
print cur2
cur3 = pd.date_range('2018-12-16', '2019-08-01', freq = "M")
print cur3
cur4 = pd.date_range('2010-12-16', '2019-01-01', freq = "Y")
print cur4
cur5 = pd.date_range('2010', '2019', freq = "AS")
print cur5
程序的执行结果:
DatetimeIndex(['2018-12-16', '2018-12-18', '2018-12-20', '2018-12-22',
'2018-12-24', '2018-12-26', '2018-12-28', '2018-12-30',
'2019-01-01'],
dtype='datetime64[ns]', freq='2D')
DatetimeIndex(['2018-12-16', '2018-12-23', '2018-12-30'], dtype='datetime64[ns]', freq='W-SUN')
DatetimeIndex(['2018-12-16 17:30:30', '2018-12-16 23:30:30',
'2018-12-17 05:30:30', '2018-12-17 11:30:30',
'2018-12-17 17:30:30', '2018-12-17 23:30:30',
'2018-12-18 05:30:30', '2018-12-18 11:30:30',
'2018-12-18 17:30:30', '2018-12-18 23:30:30'],
dtype='datetime64[ns]', freq='6H')
DatetimeIndex(['2018-12-31', '2019-01-31', '2019-02-28', '2019-03-31',
'2019-04-30', '2019-05-31', '2019-06-30', '2019-07-31'],
dtype='datetime64[ns]', freq='M')
DatetimeIndex(['2010-12-31', '2011-12-31', '2012-12-31', '2013-12-31',
'2014-12-31', '2015-12-31', '2016-12-31', '2017-12-31',
'2018-12-31'],
dtype='datetime64[ns]', freq='A-DEC')
DatetimeIndex(['2010-01-01', '2011-01-01', '2012-01-01', '2013-01-01',
'2014-01-01', '2015-01-01', '2016-01-01', '2017-01-01',
'2018-01-01', '2019-01-01'],
dtype='datetime64[ns]', freq='AS-JAN')
freq="2D"
是间隔两天,freq='6H'
则为间隔6小时,freq='M'
间隔以月为单位。更多的date_range函数的freq参数,常用的参考参数值如下表
Alias | Description |
---|---|
B | business day frequency |
C | custom business day frequency |
D | calendar day frequency |
W | weekly frequency |
M | month end frequency |
SM | semi-month end frequency (15th and end of month) |
BM | business month end frequency |
CBM | custom business month end frequency |
MS | month start frequency |
SMS | semi-month start frequency (1st and 15th) |
BMS | business month start frequency |
CBMS | custom business month start frequency |
Q | quarter end frequency |
BQ | business quarter end frequency |
QS | quarter start frequency |
BQS | business quarter start frequency |
A, Y | year end frequency |
BA, BY | business year end frequency |
AS, YS | year start frequency |
BAS, BYS | business year start frequency |
BH | business hour frequency |
H | hourly frequency |
T, min | minutely frequency |
S | secondly frequency |
L, ms | milliseconds |
U, us | microseconds |
N | nanoseconds |
表里的T是分钟,而B则是工作日的意思。接下来可以借助date_range来创建一个时间序列。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16', '2019-02-05', freq = "B")
#print cur0, len(cur0)
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts[:14]
程序执行结果:
2018-12-17 0.128278
2018-12-18 -0.128049
2018-12-19 0.872805
2018-12-20 -0.809540
2018-12-21 -0.104894
2018-12-24 0.720047
2018-12-25 0.965698
2018-12-26 0.926640
2018-12-27 -1.505794
2018-12-28 0.246031
2018-12-31 -0.536505
2019-01-01 1.609414
2019-01-02 0.459005
2019-01-03 0.347774
Freq: B, dtype: float64
从结果第一列可以看出周六、周日时间不存在,freq = "B"
只产生工作日的时间。
下面的例子是产生都是周几的时间。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16', '2019-02-05', freq = "W-WED")
print cur0
程序执行结果:
DatetimeIndex(['2018-12-19', '2018-12-26', '2019-01-02', '2019-01-09',
'2019-01-16', '2019-01-23', '2019-01-30'],
dtype='datetime64[ns]', freq='W-WED')
- period设置时间的个数。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods=5, freq='2h20min')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
执行结果:
2018-12-16 18:30:34 -0.289575
2018-12-16 20:50:34 -0.782106
2018-12-16 23:10:34 0.152276
2018-12-17 01:30:34 -0.661511
2018-12-17 03:50:34 -1.676650
Freq: 140T, dtype: float64
Pandas的时间序列数据-date_range参数详解
- freq = "T",按分钟为间隔(频率)产生时间序列,等价于"min"。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods=5, freq='T')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-16 18:30:34', periods=5, freq='min')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
程序执行结果:
2018-12-16 18:30:34 0.489893
2018-12-16 18:31:34 0.000442
2018-12-16 18:32:34 -0.465273
2018-12-16 18:33:34 -0.173814
2018-12-16 18:34:34 -0.603672
Freq: T, dtype: float64
2018-12-16 18:30:34 0.690540
2018-12-16 18:31:34 -0.815213
2018-12-16 18:32:34 0.460163
2018-12-16 18:33:34 1.515437
2018-12-16 18:34:34 -0.832920
Freq: T, dtype: float64
- freq = "S",则是以秒为频率产生时间序列。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods=5, freq='3T10S')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
程序执行结果:
2018-12-16 18:30:34 -1.078270
2018-12-16 18:33:44 -0.120087
2018-12-16 18:36:54 1.863152
2018-12-16 18:40:04 -0.601866
2018-12-16 18:43:14 0.881057
Freq: 190S, dtype: float64
这里的时间间隔频率为3分10秒。
- freq = "H",则是以小时为频率产生时间序列。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods=5, freq='2H')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
程序执行结果:
2018-12-16 18:30:34 -0.182473
2018-12-16 20:30:34 1.037907
2018-12-16 22:30:34 -0.175579
2018-12-17 00:30:34 -0.586400
2018-12-17 02:30:34 -0.334369
Freq: 2H, dtype: float64
从结果可看出时间序列前后相差2小时。
- freq = "B",则是以工作日为频率产生时间序列。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods = 10, freq='B')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
程序执行结果:
2018-12-17 18:30:34 0.011285
2018-12-18 18:30:34 0.972737
2018-12-19 18:30:34 0.109900
2018-12-20 18:30:34 -0.969465
2018-12-21 18:30:34 -0.885282
2018-12-24 18:30:34 -1.722596
2018-12-25 18:30:34 0.678189
2018-12-26 18:30:34 0.402022
2018-12-27 18:30:34 -0.740186
2018-12-28 18:30:34 1.302828
Freq: B, dtype: float64
22、23日为周六、周日结果里缺少。
- freq = "D",则是以日为频率产生时间序列。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods = 10, freq='2D')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
程序执行结果:
2018-12-16 18:30:34 0.327716
2018-12-18 18:30:34 0.784813
2018-12-20 18:30:34 1.432993
2018-12-22 18:30:34 1.148707
2018-12-24 18:30:34 0.996547
2018-12-26 18:30:34 -0.210021
2018-12-28 18:30:34 -0.175977
2018-12-30 18:30:34 0.473569
2019-01-01 18:30:34 0.642001
2019-01-03 18:30:34 0.675140
Freq: 2D, dtype: float64
结果里的日期时间序列是日在发生变化,相差2天。
- freq = "W",则是以周为频率产生时间序列,默认以周日为起点来构造即"W-SUN"。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W-SUN')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
程序执行结果:
2018-12-16 18:30:34 0.557365
2018-12-23 18:30:34 -0.306496
2018-12-30 18:30:34 -1.172465
2019-01-06 18:30:34 0.434073
2019-01-13 18:30:34 0.106500
2019-01-20 18:30:34 0.773861
2019-01-27 18:30:34 -0.236211
2019-02-03 18:30:34 -0.303260
2019-02-10 18:30:34 0.974439
2019-02-17 18:30:34 -0.356273
Freq: W-SUN, dtype: float64
2018-12-16 18:30:34 0.180012
2018-12-23 18:30:34 -0.977006
2018-12-30 18:30:34 0.095408
2019-01-06 18:30:34 -0.097709
2019-01-13 18:30:34 -0.401469
2019-01-20 18:30:34 -0.283461
2019-01-27 18:30:34 -1.138246
2019-02-03 18:30:34 -1.675089
2019-02-10 18:30:34 0.511324
2019-02-17 18:30:34 0.728807
Freq: W-SUN, dtype: float64
时间的起点是2018-12-15
周六,产生的结果第一条是2018-12-16
周日,每条时间相差7天,共10条记录(periods = 10)。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W-Fri')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
程序执行结果:
2018-12-16 18:30:34 -1.133046
2018-12-23 18:30:34 -1.083898
2018-12-30 18:30:34 -1.503690
2019-01-06 18:30:34 -0.866094
2019-01-13 18:30:34 -0.945356
2019-01-20 18:30:34 0.021928
2019-01-27 18:30:34 -0.591696
2019-02-03 18:30:34 -1.710630
2019-02-10 18:30:34 2.121283
2019-02-17 18:30:34 0.739256
Freq: W-SUN, dtype: float64
2018-12-21 18:30:34 2.082080
2018-12-28 18:30:34 1.368807
2019-01-04 18:30:34 0.599276
2019-01-11 18:30:34 -0.149521
2019-01-18 18:30:34 1.134686
2019-01-25 18:30:34 -0.582935
2019-02-01 18:30:34 -0.470655
2019-02-08 18:30:34 0.983203
2019-02-15 18:30:34 -0.067618
2019-02-22 18:30:34 -0.736081
Freq: W-FRI, dtype: float64
语句cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W-Fri')
则是从2018-12-15
(周六)开始产生都是星期五的时间序列,共10个时间,2018-12-15
后的第一个星期五是2018-12-21
,第二个周五则是2018-12-28
。因此"W-FRI"
则是产生每周几这样的一个时间序列。
- freq = "M",则是以月为频率产生时间序列,以月末为时间点,而freq = "MS"则是以月初为时间点。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='M')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='MS')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
程序执行结果:
2018-12-31 18:30:34 2.844877
2019-01-31 18:30:34 -0.405763
2019-02-28 18:30:34 1.048116
2019-03-31 18:30:34 -0.353364
2019-04-30 18:30:34 1.146974
2019-05-31 18:30:34 -2.594504
2019-06-30 18:30:34 1.149964
2019-07-31 18:30:34 0.152655
2019-08-31 18:30:34 0.456799
2019-09-30 18:30:34 0.356193
Freq: M, dtype: float64
2019-01-01 18:30:34 -0.410882
2019-02-01 18:30:34 -1.349693
2019-03-01 18:30:34 0.363404
2019-04-01 18:30:34 0.352792
2019-05-01 18:30:34 0.334477
2019-06-01 18:30:34 0.181288
2019-07-01 18:30:34 -0.936703
2019-08-01 18:30:34 -0.512834
2019-09-01 18:30:34 -0.243987
2019-10-01 18:30:34 0.727383
Freq: MS, dtype: float64
2018-12-15
后的第一个月末日期为2018-12-31
,第一个月初为2019-01-01
。
- freq = "BM",则是以月末工作日为频率产生时间序列,但不是每月的最后一天。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='BM')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='BMS')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
程序执行结果:
2018-12-31 18:30:34 0.338989
2019-01-31 18:30:34 -0.074689
2019-02-28 18:30:34 -1.309663
2019-03-29 18:30:34 0.139394
2019-04-30 18:30:34 -0.519024
2019-05-31 18:30:34 0.573932
2019-06-28 18:30:34 0.551329
2019-07-31 18:30:34 -0.849871
2019-08-30 18:30:34 -0.685058
2019-09-30 18:30:34 -0.160009
Freq: BM, dtype: float64
2019-01-01 18:30:34 0.499660
2019-02-01 18:30:34 -0.912324
2019-03-01 18:30:34 0.412629
2019-04-01 18:30:34 1.222422
2019-05-01 18:30:34 -0.618880
2019-06-03 18:30:34 0.132562
2019-07-01 18:30:34 0.721672
2019-08-01 18:30:34 -1.086498
2019-09-02 18:30:34 -1.670070
2019-10-01 18:30:34 -2.165835
Freq: BMS, dtype: float64
注意2019-03-29
不是3月的最后一天,2019-03-30
和2019-03-31
非工作日。 而2019-06-03
也非6月第一天,但是工作日,而2019-06-01
、2019-06-02
为休息日。
- freq = "Q",则是以季度(末)为频率产生时间序列,freq = "QS"是以季度(初)。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='q')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='qs')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
程序执行结果:
2018-12-31 18:30:34 0.364439
2019-03-31 18:30:34 -0.295537
2019-06-30 18:30:34 0.562707
2019-09-30 18:30:34 -0.226738
2019-12-31 18:30:34 0.623051
2020-03-31 18:30:34 -0.675792
2020-06-30 18:30:34 -0.848371
2020-09-30 18:30:34 -0.805518
2020-12-31 18:30:34 -0.061498
2021-03-31 18:30:34 0.291014
Freq: Q-DEC, dtype: float64
2019-01-01 18:30:34 -0.236873
2019-04-01 18:30:34 -1.399436
2019-07-01 18:30:34 1.011018
2019-10-01 18:30:34 1.254754
2020-01-01 18:30:34 -0.569184
2020-04-01 18:30:34 -1.480181
2020-07-01 18:30:34 -0.396710
2020-10-01 18:30:34 1.157218
2021-01-01 18:30:34 -0.119259
2021-04-01 18:30:34 0.773836
Freq: QS-JAN, dtype: float64
当然Q也可以和B组合,像之前的M一样。
- freq = "A",则是以年(末)为频率产生时间序列,freq = "AS"则是年初。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='a')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='as')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
程序执行结果:
2018-12-31 18:30:34 -0.058588
2019-12-31 18:30:34 -0.676757
2020-12-31 18:30:34 -0.368606
2021-12-31 18:30:34 -0.820318
2022-12-31 18:30:34 0.959945
2023-12-31 18:30:34 -0.144216
2024-12-31 18:30:34 0.827481
2025-12-31 18:30:34 1.812374
2026-12-31 18:30:34 -1.473202
2027-12-31 18:30:34 -1.633083
Freq: A-DEC, dtype: float64
2019-01-01 18:30:34 -0.037793
2020-01-01 18:30:34 1.067194
2021-01-01 18:30:34 -1.517820
2022-01-01 18:30:34 -0.101716
2023-01-01 18:30:34 0.413106
2024-01-01 18:30:34 -0.912453
2025-01-01 18:30:34 0.197084
2026-01-01 18:30:34 -0.513032
2027-01-01 18:30:34 -0.027010
2028-01-01 18:30:34 -0.263569
Freq: AS-JAN, dtype: float64
来源:CSDN
作者:†徐先森®
链接:https://blog.csdn.net/qq_36622490/article/details/103479477