问题
I started using pandas library about a fortnight back. Learning the new features. I would appreciate help on the following problem.
I have a column with dates in mixed format. These are the 2 formats present
mm/dd/yyyy
dd/mm/yyyy
An extract from the dataset :-
Dates
6/5/2016
7/5/2016
7/5/2016
7/5/2016
9/5/2016
9/5/2016
9/5/2016
9/5/2016
5/13/2016
5/14/2016
5/14/2016
I am struggling to convert these to a common format. I tried using pandas's 'to_datetime'. It does not work. I am also not sure how will using regular expressions help in this case.
Another piece of information. The dates are in sorted order. Can something be done using info.
EDIT1:
I understand that it is impossible to distinguish between 6/4/2016 and 5/6/2016, if we look at them alone. However, I was hoping that the fact the dates are in ascending order and the actual dataset is spread for over a year, there would be a method to make sense of it. Is anyone aware of a function which can make sense of the format given the fact that the dates are in ascending order ?
EDIT2: Sample of 2 months :- April and May 2016. Please note that there is no pattern. So please do not suggest any solution based on patterns of the data below.
4/1/2016 4/1/2016 4/3/2016 4/3/2016 3/4/2016 4/4/2016 4/4/2016 4/5/2016 4/5/2016 4/7/2016 4/7/2016 4/8/2016 4/8/2016 4/14/2016 4/16/2016 6/4/2016 7/4/2016 8/4/2016 11/4/2016 11/4/2016 11/4/2016 11/4/2016 11/4/2016 12/4/2016 12/4/2016 12/4/2016 13/4/2016 13/4/2016 13/4/2016 13/4/2016 14/04/2016 15/4/2016 16/4/2016 16/4/2016 18/4/2016 18/4/2016 19/4/2016 19/4/2016 20/4/2016 20/4/2016 21/4/2016 21/4/2016 21/4/2016 22/4/2016 23/4/2016 23/4/2016 25/4/2016 25/4/2016 26/4/2016 26/4/2016 26/4/2016 26/4/2016 26/4/2016 26/4/2016 29/4/2016 29/4/2016 29/4/2016 30/4/2016 2/5/2016 2/5/2016 3/5/2016 3/5/2016 3/5/2016 3/5/2016 4/5/2016 5/4/2016 5/4/2016 5/4/2016 6/5/2016 6/5/2016 7/5/2016 7/5/2016 7/5/2016 9/5/2016 9/5/2016 9/5/2016 9/5/2016 10/5/2016 10/5/2016 11/5/2016 11/5/2016 12/5/2016 5/13/2016 5/14/2016 5/14/2016 5/15/2016 5/16/2016 5/16/2016 5/16/2016 5/16/2016 5/16/2016 5/16/2016 5/16/2016 5/17/2016 5/17/2016 5/18/2016 5/18/2016 5/19/2016 5/19/2016 5/20/2016 5/20/2016 5/20/2016 5/20/2016 5/20/2016 5/21/2016 5/23/2016 5/23/2016 5/23/2016 5/23/2016 5/23/2016 5/23/2016 5/24/2016 5/24/2016 5/25/2016 5/26/2016 5/26/2016 5/26/2016 5/27/2016 5/27/2016 5/27/2016 5/27/2016 5/27/2016 5/27/2016 5/27/2016 5/28/2016 5/30/2016 5/30/2016
回答1:
The real problem is that there are ambiguous dates in your dataset (do you parse it as mm/dd/yyyy or dd/mm/yyyy if it could be either?? (I've been here, and we decided just to pick what the majority seemed to be; essentially the dataset was compromised... and we had to treat it as such).
If it's a Series then hitting it with pd.to_datetime
seems to work:
In [11]: s = pd.Series(['6/5/2016', '7/5/2016', '7/5/2016', '7/5/2016', '9/5/2016', '9/5/2016', '9/5/2016', '9/5/2016', '5/13/2016', '5/14/2016', '5/14/2016'])
In [12]: pd.to_datetime(s)
Out[12]:
0 2016-06-05
1 2016-07-05
2 2016-07-05
3 2016-07-05
4 2016-09-05
5 2016-09-05
6 2016-09-05
7 2016-09-05
8 2016-05-13
9 2016-05-14
10 2016-05-14
Name: 0, dtype: datetime64[ns]
Note: If you had a consistent format you can pass it in explicitly:
In [13]: pd.to_datetime(s, format="%m/%d/%Y")
Out[13]:
0 2016-06-05
1 2016-07-05
2 2016-07-05
3 2016-07-05
4 2016-09-05
5 2016-09-05
6 2016-09-05
7 2016-09-05
8 2016-05-13
9 2016-05-14
10 2016-05-14
Name: 0, dtype: datetime64[ns]
来源:https://stackoverflow.com/questions/37538080/pandas-inconsistent-date-time-format