How to plot Pandas datetime series in Seaborn distplot?

后端 未结 2 1922
[愿得一人]
[愿得一人] 2021-02-20 10:45

I have a pandas dataframe with a datetime column. I would like to plot the distribution of the rows according to that date column, but I\'m currenty getting an unhelpful error.

相关标签:
2条回答
  • 2021-02-20 11:37

    I came across this question while having the same problem myself. As mentioned in comments, it seems like seaborn's distplot doesn't support dates to work with. Unfortunately, I could not find anything in official documentation to support this claim.

    I found two ways to deal with this problem. None of them is perfect, yet that's the best I found.

    Option 1: Convert dates to numbers

    Convert to some numeric metric and work with that. displot works with numbers, so if each date was represented by a number we will be okay. The mapping between dates and numbers is kinda like use MinMax Scaler. For example, We can set "2017-01-01" as 0 and "2020-06-06" as 1, and map all dates between them to values in range [0,1].

    What range of numbers to use it's depends on the range of your data, could be days/months/ years or etc.

    I'll demonstrate this approach with this toy example.

    import pandas as pd
    import datetime as dt
    
    original_dates = ["2016-03-05", "2016-03-05", "2016-02-05", "2016-02-05", "2016-02-05", "2014-03-05"]
    dates_list = [dt.datetime.strptime(date, '%Y-%m-%d').date() for date in original_dates]
    
    df = pd.DataFrame({"Date":dates_list})
    

    now dataframe is as follows:

             Date
    0  2016-03-05
    1  2016-03-05
    2  2016-02-05
    3  2016-02-05
    4  2016-02-05
    5  2014-03-05
    

    (not the best way to enter dates to dataframe of course, but it doesn't matter how).

    Now I create a new column which will hold the difference in days between minimum date:

    df["NewDate"] = df["Date"] - dt.date(2014,3,5)
    df["NewDate"] = df["NewDate"].apply(lambda x: x.days)
    

    result:

             Date  NewDate
    0  2016-03-05      731
    1  2016-03-05      731
    2  2016-02-05      702
    3  2016-02-05      702
    4  2016-02-05      702
    5  2014-03-05        0
    

    notice I "hard-coded" the minimum date. You can use better ways to find minimum and not hard-coded it. I just wanted to get this part as fast as possible.

    Now we can use displot on our new column:

    import seaborn as sns
    sns.set()
    ax = sns.distplot(df['NewDate'])
    

    output:

    As you can see, it shows the days instead of dates. For my personal problem it was okay to show it that way. If you want to show it as dates, some extra step is needed: Show xticks which are function of x-axis, not directly the data it self. Example with dates (pandas, matplotlib)

    As I said earlier, I used scaling by days difference but you can do the same with months or years. Depends on the data.

    Option 2: Use histogram directly without seaborn's displot

    In this question: Can Pandas plot a histogram of dates? there is an answer how to plot histogram with dates, using pandas's groupby.

    It's not the same as displot, but it can be close-enough solution (as displot eventually is based on matplotlib's hist).

    0 讨论(0)
  • 2021-02-20 11:44

    You could convert the dates to Categorical type, and plot the resulting codes (which are integers). Then, label the x-ticks with the Date (as category).

    import pandas as pd
    import seaborn as sns
    
    original_dates = [
        "2016-03-05", "2016-03-05", "2016-02-05",
        "2016-02-05", "2016-02-05", "2014-03-05"]
    dates_list = pd.to_datetime(original_dates)
    
    df = pd.DataFrame({"Date": dates_list})
    df['date-as-cat'] = df['Date'].astype('category')  # new 
    df['codes'] = df['date-as-cat'].cat.codes          # new 
    
    print(df)
    print(df.dtypes)
    
            Date date-as-cat  codes
    0 2016-03-05  2016-03-05      2
    1 2016-03-05  2016-03-05      2
    2 2016-02-05  2016-02-05      1
    3 2016-02-05  2016-02-05      1
    4 2016-02-05  2016-02-05      1
    5 2014-03-05  2014-03-05      0
    
    Date           datetime64[ns]
    date-as-cat          category
    codes                    int8
    dtype: object 
    

    The date-as-code and date-as-category info is obtained like this:

    x = df[['codes', 'date-as-cat']].drop_duplicates().sort_values('codes')
    print(x)
    
       codes date-as-cat
    5      0  2014-03-05
    2      1  2016-02-05
    0      2  2016-03-05
    
    0 讨论(0)
提交回复
热议问题