How to create a min-max plot by month with fill_between?

前端 未结 1 1828
灰色年华
灰色年华 2021-01-07 04:56

I have to show month names as xticks and while I plot the figure and pass x as month names it plots it wrong . I also have to overlay a scatter plot over the line graph.

相关标签:
1条回答
  • 2021-01-07 05:15

    Update Using Data from OP

    • The issue with the first method is that it requires the x-axis to be a datetime format.
    • The data in the question is being grouped and plotted against a string, which is a combination of the month and day
    • The x-axis represents 365 days, leap years have been removed.
      • Place ticks at the appropriate location for the beginning of each month
      • Add a label to the tick
    import pandas as pd
    import matplotlib.pyplot as plot
    import calendar
    
    # load the data
    df = pd.read_csv('data/so_data/62929123/data.csv', parse_dates=['Date'])
    
    # remove leap day
    df = df[~((df.Date.dt.month == 2) & (df.Date.dt.day == 29))]
    
    # add a year column
    df['Year'] = df.Date.dt.year
    
    # add a month-day column to use for groupby
    df['Month-Day'] = df.Date.dt.month.astype('str') + '-' + df.Date.dt.day.astype('str')
    
    # select 2015 data
    df_15 = df[df.Year == 2015].reset_index()
    
    # select data before 2015
    df_14 = df[df.Year < 2015].reset_index()
    
    # filter data to either max or min and groupby month-day
    max_14 = df_14[df_14.Element == 'TMAX'].groupby(['Month-Day']).agg({'Data_Value': max}).reset_index().rename(columns={'Data_Value': 'Daily_Max'})
    min_14 = df_14[df_14.Element == 'TMIN'].groupby(['Month-Day']).agg({'Data_Value': min}).reset_index().rename(columns={'Data_Value': 'Daily_Min'})
    max_15 = df_15[df_15.Element == 'TMAX'].groupby(['Month-Day']).agg({'Data_Value': max}).reset_index().rename(columns={'Data_Value': 'Daily_Max'})
    min_15 = df_15[df_15.Element == 'TMIN'].groupby(['Month-Day']).agg({'Data_Value': max}).reset_index().rename(columns={'Data_Value': 'Daily_Min'})
    
    # select max values from 2015 that are greater than the recorded max
    higher_14 = max_15[max_15 > max_14]
    
    # select min values from 2015 that are less than the recorded min
    lower_14 = min_15[min_15 < min_14]
    
    # plot the min and max lines
    ax = max_14.plot(label='Max Recorded', color='tab:red')
    min_14.plot(ax=ax, label='Min Recorded', color='tab:blue')
    
    # add the fill, between min and max
    plt.fill_between(max_14.index, max_14.Daily_Max, min_14.Daily_Min, alpha=0.10, color='tab:orange')
    
    # add points greater than max or less than min
    plt.scatter(higher_14.index, higher_14.Daily_Max, label='2015 Max > Record', alpha=0.75, color='tab:red')
    plt.scatter(lower_14.index, lower_14.Daily_Min, label='2015 Min < Record', alpha=0.75, color='tab:blue')
    
    # set plot xlim
    plt.xlim(-5, 370)
    
    # tick locations
    ticks=[-5, 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365, 370]
    
    # tick labels
    labels = list(calendar.month_abbr)  # list of months
    labels.extend(['Jan', ''])
    
    # add the custom ticks and labels
    plt.xticks(ticks=ticks, labels=labels)
    
    # plot cosmetics
    plt.legend()
    plt.xlabel('Day of Year: 0-365 Displaying Start of Month')
    plt.ylabel('Temperature °C')
    plt.title('Daily Max and Min: 2009 - 2014\nRecorded Max < 2015 Temperatures < Recorded Min')
    plt.tight_layout()
    plt.show()
    

    Original Answer

    • It was not originally clear that the x-axis values were not datetimes.
      • The dataset was not originally available.
    • The reproducible data and shaping it, is at the bottom of this answer, but it's not integral to adding months to the x-axis
    • Given the dataframes of max_15 and min_15, which are the minimum and maximum temperatures for Portland, OR in 2015.
      • The important detail is that date was converted to a datetime format with pd.to_datetime and then set as the index.
      • v is a column of floats
      • Separate MIN & MAX values into separate dataframes with Pandas: Boolean Indexing, which is also shown below in the data cleaning.
    • Reference Matplotlib: Date tick labels & Formatting date ticks using ConciseDateFormatter
      • matplotlib.dates.MonthLocator
      • matplotlib.dates.DateFormatter
      • matplotlib.axis.Axis.set_major_locator
      • matplotlib.axis.XAxis.set_major_formatter
    import pandas as pd
    import matplotlib.pyplot as plt
    import matplotlib.dates as mdates
    
    # plot styling parameters
    plt.style.use('seaborn')
    plt.rcParams['figure.figsize'] = (16.0, 10.0)
    plt.rcParams["patch.force_edgecolor"] = True
    
    # locate the Month and format the label
    months = mdates.MonthLocator()  # every month
    months_fmt = mdates.DateFormatter('%b')
    
    # plot the data
    fig, ax = plt.subplots()
    ax.plot(max_15.index, 'rolling', data=max_15, label='max rolling mean')
    ax.scatter(x=max_15.index, y='v', data=max_15, alpha=0.75, label='MAX')
    
    ax.plot(min_15.index, 'rolling', data=min_15, label='min rolling mean')
    ax.scatter(x=min_15.index, y='v', data=min_15, alpha=0.75, label='MIN')
    ax.legend()
    
    # format the ticks
    ax.xaxis.set_major_locator(months)
    ax.xaxis.set_major_formatter(months_fmt)
    

    Reproducible Data

    • This part isn't important to formatting the x-axis
    • This is just cleaning the data incase anyone wants to experiment
    • See Weather Visualization for Portland, OR: 1940 - 2020
    import pandas as pd
    
    # download data into dataframe, it's in a wide format
    pdx_19 = pd.read_csv('http://www.weather.gov/source/pqr/climate/webdata/Portland_dailyclimatedata.csv', header=6)
    
    # clean and label data
    pdx_19.drop(columns=['AVG or Total'], inplace=True)
    pdx_19.columns = list(pdx_19.columns[:3]) + [f'v_{day}' for day in pdx_19.columns[3:]]
    pdx_19.rename(columns={'Unnamed: 2': 'TYPE'}, inplace=True)
    pdx_19 = pdx_19[pdx_19.TYPE.isin(['TX', 'TN', 'PR'])]
    
    # convert to long format
    pdx = pd.wide_to_long(pdx_19, stubnames='v', sep='_', i=['YR', 'MO', 'TYPE'], j='day').reset_index()
    
    # additional cleaning
    pdx.TYPE = pdx.TYPE.map({'TX': 'MAX', 'TN': 'MIN', 'PR': 'PRE'})
    pdx.rename(columns={'YR': 'year', 'MO': 'month'}, inplace=True)
    pdx = pdx[pdx.v != '-'].copy()
    pdx['date'] = pd.to_datetime(pdx[['year', 'month', 'day']])
    pdx.drop(columns=['year', 'month', 'day'], inplace=True)
    pdx.v.replace({'M': np.nan, 'T': np.nan}, inplace=True)
    pdx.v = pdx.v.astype('float')
    
    # select on 2015
    pdx_2015 = pdx[pdx.date.dt.year == 2015].reset_index(drop=True).set_index('date')
    
    # select only MAX temps
    max_15 = pdx_2015[pdx_2015.TYPE == 'MAX'].copy()
    
    # select only MIN temps
    min_15 = pdx_2015[pdx_2015.TYPE == 'MIN'].copy()
    
    # calculate rolling mean
    max_15['rolling'] = max_15.v.rolling(7).mean()
    min_15['rolling'] = min_15.v.rolling(7).mean()
    

    max_15

               TYPE     v    rolling
    date                            
    2015-01-01  MAX  39.0        NaN
    2015-01-02  MAX  41.0        NaN
    2015-01-03  MAX  41.0        NaN
    2015-01-04  MAX  53.0        NaN
    2015-01-05  MAX  57.0        NaN
    2015-01-06  MAX  47.0        NaN
    2015-01-07  MAX  51.0  47.000000
    2015-01-08  MAX  45.0  47.857143
    2015-01-09  MAX  50.0  49.142857
    2015-01-10  MAX  42.0  49.285714
    

    min_15

               TYPE     v    rolling
    date                            
    2015-01-01  MIN  24.0        NaN
    2015-01-02  MIN  26.0        NaN
    2015-01-03  MIN  35.0        NaN
    2015-01-04  MIN  38.0        NaN
    2015-01-05  MIN  42.0        NaN
    2015-01-06  MIN  38.0        NaN
    2015-01-07  MIN  34.0  33.857143
    2015-01-08  MIN  35.0  35.428571
    2015-01-09  MIN  37.0  37.000000
    2015-01-10  MIN  36.0  37.142857
    
    0 讨论(0)
提交回复
热议问题