Group DataFrame in 5-minute intervals

前端 未结 3 1908
忘了有多久
忘了有多久 2020-12-10 20:13

How do I get just the 5 minute data using Python/pandas out of this csv? For every 5 minute interval I\'m trying to get the DATE, TIME,OPEN, HIGH, LOW, CLOSE, VOLUME for tha

相关标签:
3条回答
  • 2020-12-10 20:56

    Another way using pandas is to use its TimeGrouper-function. Its purpose is just meant for use cases like yours.

    import pandas as pd
    
    df = pd.DataFrame("Your data provided above")
    df["DATE"] = pd.to_datetime(df["DATE"])
    df.set_index("DATE", inplace=True)
    
    df = df.groupby(pd.TimeGrouper('5Min')).agg({
                                            "OPEN":  "first",
                                            "HIGH":  "max",
                                            "LOW":   "min",
                                            "CLOSE": "last",
                                            "VOLUME": "sum"
                                        })
    

    The provided script uses an aggregation you might have in mind since you're dealing with stock-data. It aggregates in a way that you will end up with the 5-min candles resulting from your 1-min candles.

    0 讨论(0)
  • 2020-12-10 20:59

    You can use df.resample to do aggregation based on a date/time variable. You'll need a datetime index and you can specify that while reading the csv file:

    df = pd.read_csv("filename.csv", parse_dates = [["DATE", "TIME"]], index_col=0)
    

    This will result in a dataframe with an index where date and time are combined (source):

    df.head()
    Out[7]: 
                           OPEN    HIGH     LOW   CLOSE  VOLUME 
    DATE_TIME                                                   
    1997-02-03 09:04:00  3046.0  3048.5  3046.0  3047.5      505
    1997-02-03 09:05:00  3047.0  3048.0  3046.0  3047.0      162
    1997-02-03 09:06:00  3047.5  3048.0  3047.0  3047.5       98
    1997-02-03 09:07:00  3047.5  3047.5  3047.0  3047.5      228
    1997-02-03 09:08:00  3048.0  3048.0  3047.5  3048.0      136
    

    After that you can use resample to get the sum, mean, etc. of those five minute intervals.

    df.resample("5T").mean()
    Out[8]: 
                           OPEN    HIGH     LOW   CLOSE  VOLUME 
    DATE_TIME                                                   
    1997-02-03 09:00:00  3046.0  3048.5  3046.0  3047.5    505.0
    1997-02-03 09:05:00  3047.6  3047.9  3046.8  3047.3    159.6
    1997-02-03 09:10:00  3045.6  3045.9  3044.8  3045.0    110.2
    1997-02-03 09:15:00  3043.6  3044.0  3042.8  3043.2     69.2
    1997-02-03 09:20:00  3044.7  3045.2  3044.5  3045.0     65.8
    1997-02-03 09:25:00  3043.8  3044.0  3043.5  3043.7     59.0
    1997-02-03 09:30:00  3044.6  3045.0  3044.3  3044.6     56.0
    1997-02-03 09:35:00  3044.5  3044.5  3043.5  3044.5     44.0
    

    (T is used for minute frequency. Here is a list of other units.)

    0 讨论(0)
  • 2020-12-10 20:59

    slight modification to Markus answer. It groups and assign it to last index

    df_close_left = data_set.groupby(pd.Grouper(freq='5Min',closed='right',label='right')).agg({
                                            "open":  "first",
                                            "high":  "max",
                                            "low":   "min",
                                            "close": "last",
                                            "volume": "sum"
    
                                        })
    
    0 讨论(0)
提交回复
热议问题