How do I tell if a column in a pandas dataframe is of type datetime? How do I tell if a column is numerical?

前端 未结 5 1902
情书的邮戳
情书的邮戳 2020-12-30 18:58

I am trying to filter the columns in a pandas dataframe based on whether they are of type date or not. I can figure out which ones are, but then would have to parse that ou

相关标签:
5条回答
  • 2020-12-30 19:19

    bit uglier Numpy alternative:

    In [102]: df.loc[:, [np.issubdtype(t, np.datetime64) for t in df.dtypes]]
    Out[102]:
        date_col
    0 2017-02-01
    1 2017-03-01
    2 2017-04-01
    3 2017-05-01
    
    In [103]: df.loc[:, [np.issubdtype(t, np.number) for t in df.dtypes]]
    Out[103]:
       col1  col2
    0     1     2
    1     1     2
    2     1     2
    3     1     2
    
    0 讨论(0)
  • 2020-12-30 19:26

    I just encountered this issue and found that @charlie-haley's answer isn't quite general enough for my use case. In particular np.datetime64 doesn't seem to match datetime64[ns, UTC].

    df['date_col'] = pd.to_datetime(df['date_str'], utc=True)
    print(df.date_str.dtype)  # datetime64[ns, UTC]
    

    You could also extend the list of dtypes to include other types, but that doesn't seem like a good solution for future compatability, so I ended up using the is_datetime64_any_dtype function from the pandas api instead.

    In:

    from pandas.api.types import is_datetime64_any_dtype as is_datetime
    
    df[[column for column in df.columns if is_datetime(df[column])]]
    

    Out:

                       date_col
    0 2017-02-01 00:00:00+00:00
    1 2017-03-01 00:00:00+00:00
    2 2017-04-01 00:00:00+00:00
    3 2017-05-01 00:00:00+00:00
    
    0 讨论(0)
  • 2020-12-30 19:34

    Pandas has a cool function called select_dtypes, which can take either exclude or include (or both) as parameters. It filters the dataframe based on dtypes. So in this case, you would want to include columns of dtype np.datetime64. To filter by integers, you would use [np.int64, np.int32, np.int16, np.int], for float: [np.float32, np.float64, np.float16, np.float], to filter by numerical columns only: [np.number].

    df.select_dtypes(include=[np.datetime64])
    

    Out:

        date_col
    0   2017-02-01
    1   2017-03-01
    2   2017-04-01
    3   2017-05-01
    

    In:

    df.select_dtypes(include=[np.number])
    

    Out:

        col1    col2
    0   1       2
    1   1       2
    2   1       2
    3   1       2
    
    0 讨论(0)
  • 2020-12-30 19:38

    Recently I needed to check if any element of a column was a date or numeric

    My approach was, try to convert to type (datetime or numeric), then check if any element is null

    pd.to_datetime( data_temp.eval('col_name'), format='%d/%m/%Y', errors='coerce')
    

    output:

    0   2010-09-16
    1   2010-09-16
    2   2018-06-04
    3          NaT
    4          NaT
    5   2018-11-30
    

    Then use isnull() to check if the elements could be converted

    pd.to_datetime( data_temp.eval('col_name'), format='%d/%m/%Y', errors='coerce').isnull().any()
    

    This will return True because at last one element is null/NaT

    To check for numerics

    data_temp.eval('col_name').astype(str).str.isnumeric().all()
    

    This will return True if all elements on the column are numeric

    Both will return a numpy.bool_, but it can easily be converted to bool if needed

    type(pd.to_datetime( data_temp.eval(name), format='%d/%m/%Y', errors='coerce').isnull().any())
    

    output:

    numpy.bool_
    

    --

    type(bool(pd.to_datetime( data_temp.eval(name), format='%d/%m/%Y', errors='coerce').isnull().any()))
    

    output:

    bool
    
    0 讨论(0)
  • 2020-12-30 19:39

    This code automatically identify the date column and change datatype from object to 'datetime64[ns]'. Once you got date datatype you can easily perform other operations.

    for col in data.columns:
        if data[col].dtype == 'object':
            try:
                data[col] = pd.to_datetime(data[col])
            except ValueError:
                pass
    
    0 讨论(0)
提交回复
热议问题