How to access pandas DataFrame datetime index using strings

后端 未结 3 913
轻奢々
轻奢々 2020-12-24 06:31

This is a very simple and practical question. I have the feeling that it must be a silly detail and that there should be similar questions. I wasn\'t able to find them tho.

相关标签:
3条回答
  • 2020-12-24 07:21

    You can use the to_pydatetime function on your index so thus:

    y[y.index.to_pydatetime() == datetime.datetime(2008,1,1)]
    
    0 讨论(0)
  • 2020-12-24 07:28

    pandas is taking what's inside the [] and deciding what it should do. If it's a subset of column names, it'll return a DataFrame with those columns. If it's a range of index values, it'll return a subset of those rows. What is does not handle is taking a single index value.

    Solution

    Two work around's

    1.Turn the argument into something pandas interprets as a range.

    df['2008-01-01':'2008-01-01']
    

    2.Use the method designed to give you this result. loc[]

    df.loc['2008-01-01']
    

    Link to the documentation

    0 讨论(0)
  • 2020-12-24 07:29

    Reversing your dataframe allows the indexing to work:

    Here is your .csv datafile:

    Date,PETR4,CSNA3,VALE5
    2008-01-01,0.0,0.0,0.0
    2008-01-02,1.0,1.0,1.0
    2008-01-03,7.0,7.0,7.0
    

    Use the following incantation to read it into a DataFrame:

    >>> a = pd.read_csv('your.csv', index_col=0, parse_dates=True, infer_datetime_format=True)
    

    Then, try to index a row:

    >>> a['2008-01-01']
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1969, in __getitem__
        return self._getitem_column(key)
      File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1976, in _getitem_column
        return self._get_item_cache(key)
      File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1091, in _get_item_cache
        values = self._data.get(item)
      File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 3211, in get
        loc = self.items.get_loc(item)
      File "/usr/local/lib/python2.7/dist-packages/pandas/core/index.py", line 1759, in get_loc
        return self._engine.get_loc(key)
      File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:3979)
      File "pandas/index.pyx", line 157, in pandas.index.IndexEngine.get_loc (pandas/index.c:3843)
      File "pandas/hashtable.pyx", line 668, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12265)
      File "pandas/hashtable.pyx", line 676, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12216)
    KeyError: '2008-01-01'
    

    You end up with a traceback with KeyError

    However, if you reverse it, like this:

    >>> b = a[::-1]
    

    Then try the same index, you get the proper result:

    >>> b['2008-01-01']
                PETR4  CSNA3  VALE5
    Date                           
    2008-01-01      0      0      0
    

    I do NOT know why this is the case. Chances are, it has something to do with being a time series one way, but not the other? Someone more knowledgeable should answer that.

    Update: By RTFM, I discovered this page:

    https://pandas.pydata.org/pandas-docs/stable/timeseries.html

    If you find the section titled "Slice vs. Exact Match", there is a warning that explains this behavior. The problem seems to be that for a TimeSeries, an exact match is interpreted as a column name. For unsorted dataframes, this doesn't happen. See the warning box in the section referenced above. I still find this terribly confusing, but there you go...

    Edit: Changed the printout of b, which was wrong in the original.

    Edit1: Update with explanation in python documentation.

    0 讨论(0)
提交回复
热议问题