Why isn't this code to plot a histogram on a continuous value Pandas column working?

前端 未结 2 620
余生分开走
余生分开走 2021-02-01 15:10

I am trying to create a histogram on a continuous value column Trip_distance in a large 1.4M row pandas dataframe. Wrote the following code:

fig =          


        
相关标签:
2条回答
  • 2021-02-01 15:55

    Here's another way to plot the data, involves turning the date_time into an index, this might help you for future slicing

    #convert column to datetime
    trip_data['lpep_pickup_datetime'] = pd.to_datetime(trip_data['lpep_pickup_datetime'])
    #turn the datetime to an index
    trip_data.index = trip_data['lpep_pickup_datetime']
    #Plot
    trip_data['Trip_distance'].plot(kind='hist')
    plt.show()
    
    0 讨论(0)
  • 2021-02-01 16:06

    EDIT:

    After your comments this actually makes perfect sense why you don't get a histogram of each different value. There are 1.4 million rows, and ten discrete buckets. So apparently each bucket is exactly 10% (to within what you can see in the plot).


    A quick rerun of your data:

    In [25]: df.hist(column='Trip_distance')
    

    Prints out absolutely fine.

    The df.hist function comes with an optional keyword argument bins=10 which buckets the data into discrete bins. With only 10 discrete bins and a more or less homogeneous distribution of hundreds of thousands of rows, you might not be able to see the difference in the ten different bins in your low resolution plot:

    In [34]: df.hist(column='Trip_distance', bins=50)
    

    0 讨论(0)
提交回复
热议问题