Group DataFrame in 5-minute intervals

前端未结

关注

 3  1915

How do I get just the 5 minute data using Python/pandas out of this csv? For every 5 minute interval I\'m trying to get the DATE, TIME,OPEN, HIGH, LOW, CLOSE, VOLUME for tha

相关标签:

3条回答

谎友^

2020-12-10 20:56

Another way using pandas is to use its TimeGrouper-function. Its purpose is just meant for use cases like yours.

import pandas as pd

df = pd.DataFrame("Your data provided above")
df["DATE"] = pd.to_datetime(df["DATE"])
df.set_index("DATE", inplace=True)

df = df.groupby(pd.TimeGrouper('5Min')).agg({
                                        "OPEN":  "first",
                                        "HIGH":  "max",
                                        "LOW":   "min",
                                        "CLOSE": "last",
                                        "VOLUME": "sum"
                                    })

The provided script uses an aggregation you might have in mind since you're dealing with stock-data. It aggregates in a way that you will end up with the 5-min candles resulting from your 1-min candles.

0 讨论(0)

梦谈多话

2020-12-10 20:59

You can use df.resample to do aggregation based on a date/time variable. You'll need a datetime index and you can specify that while reading the csv file:

df = pd.read_csv("filename.csv", parse_dates = [["DATE", "TIME"]], index_col=0)

This will result in a dataframe with an index where date and time are combined (source):

df.head()
Out[7]: 
                       OPEN    HIGH     LOW   CLOSE  VOLUME 
DATE_TIME                                                   
1997-02-03 09:04:00  3046.0  3048.5  3046.0  3047.5      505
1997-02-03 09:05:00  3047.0  3048.0  3046.0  3047.0      162
1997-02-03 09:06:00  3047.5  3048.0  3047.0  3047.5       98
1997-02-03 09:07:00  3047.5  3047.5  3047.0  3047.5      228
1997-02-03 09:08:00  3048.0  3048.0  3047.5  3048.0      136

After that you can use resample to get the sum, mean, etc. of those five minute intervals.

df.resample("5T").mean()
Out[8]: 
                       OPEN    HIGH     LOW   CLOSE  VOLUME 
DATE_TIME                                                   
1997-02-03 09:00:00  3046.0  3048.5  3046.0  3047.5    505.0
1997-02-03 09:05:00  3047.6  3047.9  3046.8  3047.3    159.6
1997-02-03 09:10:00  3045.6  3045.9  3044.8  3045.0    110.2
1997-02-03 09:15:00  3043.6  3044.0  3042.8  3043.2     69.2
1997-02-03 09:20:00  3044.7  3045.2  3044.5  3045.0     65.8
1997-02-03 09:25:00  3043.8  3044.0  3043.5  3043.7     59.0
1997-02-03 09:30:00  3044.6  3045.0  3044.3  3044.6     56.0
1997-02-03 09:35:00  3044.5  3044.5  3043.5  3044.5     44.0

(T is used for minute frequency. Here is a list of other units.)

0 讨论(0)

梦如初夏

2020-12-10 20:59

slight modification to Markus answer. It groups and assign it to last index

df_close_left = data_set.groupby(pd.Grouper(freq='5Min',closed='right',label='right')).agg({
                                        "open":  "first",
                                        "high":  "max",
                                        "low":   "min",
                                        "close": "last",
                                        "volume": "sum"

                                    })

0 讨论(0)