Pandas Inconsistent date-time format

本小妞迷上赌 提交于 2019-12-10 23:38:47

问题


I started using pandas library about a fortnight back. Learning the new features. I would appreciate help on the following problem.

I have a column with dates in mixed format. These are the 2 formats present

  1. mm/dd/yyyy
  2. dd/mm/yyyy

An extract from the dataset :-

Dates  
6/5/2016  
7/5/2016  
7/5/2016  
7/5/2016  
9/5/2016  
9/5/2016  
9/5/2016  
9/5/2016  
5/13/2016  
5/14/2016  
5/14/2016  

I am struggling to convert these to a common format. I tried using pandas's 'to_datetime'. It does not work. I am also not sure how will using regular expressions help in this case.

Another piece of information. The dates are in sorted order. Can something be done using info.

EDIT1:

I understand that it is impossible to distinguish between 6/4/2016 and 5/6/2016, if we look at them alone. However, I was hoping that the fact the dates are in ascending order and the actual dataset is spread for over a year, there would be a method to make sense of it. Is anyone aware of a function which can make sense of the format given the fact that the dates are in ascending order ?

EDIT2: Sample of 2 months :- April and May 2016. Please note that there is no pattern. So please do not suggest any solution based on patterns of the data below.

4/1/2016 4/1/2016 4/3/2016 4/3/2016 3/4/2016 4/4/2016 4/4/2016 4/5/2016 4/5/2016 4/7/2016 4/7/2016 4/8/2016 4/8/2016 4/14/2016 4/16/2016 6/4/2016 7/4/2016 8/4/2016 11/4/2016 11/4/2016 11/4/2016 11/4/2016 11/4/2016 12/4/2016 12/4/2016 12/4/2016 13/4/2016 13/4/2016 13/4/2016 13/4/2016 14/04/2016 15/4/2016 16/4/2016 16/4/2016 18/4/2016 18/4/2016 19/4/2016 19/4/2016 20/4/2016 20/4/2016 21/4/2016 21/4/2016 21/4/2016 22/4/2016 23/4/2016 23/4/2016 25/4/2016 25/4/2016 26/4/2016 26/4/2016 26/4/2016 26/4/2016 26/4/2016 26/4/2016 29/4/2016 29/4/2016 29/4/2016 30/4/2016 2/5/2016 2/5/2016 3/5/2016 3/5/2016 3/5/2016 3/5/2016 4/5/2016 5/4/2016 5/4/2016 5/4/2016 6/5/2016 6/5/2016 7/5/2016 7/5/2016 7/5/2016 9/5/2016 9/5/2016 9/5/2016 9/5/2016 10/5/2016 10/5/2016 11/5/2016 11/5/2016 12/5/2016 5/13/2016 5/14/2016 5/14/2016 5/15/2016 5/16/2016 5/16/2016 5/16/2016 5/16/2016 5/16/2016 5/16/2016 5/16/2016 5/17/2016 5/17/2016 5/18/2016 5/18/2016 5/19/2016 5/19/2016 5/20/2016 5/20/2016 5/20/2016 5/20/2016 5/20/2016 5/21/2016 5/23/2016 5/23/2016 5/23/2016 5/23/2016 5/23/2016 5/23/2016 5/24/2016 5/24/2016 5/25/2016 5/26/2016 5/26/2016 5/26/2016 5/27/2016 5/27/2016 5/27/2016 5/27/2016 5/27/2016 5/27/2016 5/27/2016 5/28/2016 5/30/2016 5/30/2016


回答1:


The real problem is that there are ambiguous dates in your dataset (do you parse it as mm/dd/yyyy or dd/mm/yyyy if it could be either?? (I've been here, and we decided just to pick what the majority seemed to be; essentially the dataset was compromised... and we had to treat it as such).


If it's a Series then hitting it with pd.to_datetime seems to work:

In [11]: s = pd.Series(['6/5/2016', '7/5/2016', '7/5/2016', '7/5/2016', '9/5/2016', '9/5/2016', '9/5/2016', '9/5/2016', '5/13/2016', '5/14/2016', '5/14/2016'])

In [12]: pd.to_datetime(s)
Out[12]:
0    2016-06-05
1    2016-07-05
2    2016-07-05
3    2016-07-05
4    2016-09-05
5    2016-09-05
6    2016-09-05
7    2016-09-05
8    2016-05-13
9    2016-05-14
10   2016-05-14
Name: 0, dtype: datetime64[ns]

Note: If you had a consistent format you can pass it in explicitly:

In [13]: pd.to_datetime(s, format="%m/%d/%Y")
Out[13]:
0    2016-06-05
1    2016-07-05
2    2016-07-05
3    2016-07-05
4    2016-09-05
5    2016-09-05
6    2016-09-05
7    2016-09-05
8    2016-05-13
9    2016-05-14
10   2016-05-14
Name: 0, dtype: datetime64[ns]


来源:https://stackoverflow.com/questions/37538080/pandas-inconsistent-date-time-format

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!