Pythonic type hints with pandas?

后端 未结 5 1868
名媛妹妹
名媛妹妹 2020-12-24 04:36

Let\'s take a simple function that takes a str and returns a dataframe:

import pandas as pd
def csv_to_df(path):
    return pd.read_csv(path, skiprows=1, sep         


        
相关标签:
5条回答
  • 2020-12-24 05:06

    Now there is a pip package that can help with this. https://github.com/CedricFR/dataenforce

    You can install it with pip install dataenforce and use very pythonic type hints like:

    def preprocess(dataset: Dataset["id", "name", "location"]) -> Dataset["location", "count"]:
        pass
    
    0 讨论(0)
  • 2020-12-24 05:10

    Why not just use pd.DataFrame?

    import pandas as pd
    def csv_to_df(path: str) -> pd.DataFrame:
        return pd.read_csv(path, skiprows=1, sep='\t', comment='#')
    

    Result is the same:

    > help(csv_to_df)
    Help on function csv_to_df in module __main__:
    csv_to_df(path:str) -> pandas.core.frame.DataFrame
    
    0 讨论(0)
  • 2020-12-24 05:11

    I'm currently doing the following:

    from typing import TypeVar
    PandasDataFrame = TypeVar('pandas.core.frame.DataFrame')
    def csv_to_df(path: str) -> PandasDataFrame:
        return pd.read_csv(path, skiprows=1, sep='\t', comment='#')
    

    Which gives:

    > help(csv_to_df)
    Help on function csv_to_df in module __main__:
    
    csv_to_df(path:str) -> ~pandas.core.frame.DataFrame
    

    Don't know how pythonic that is, but it's understandable enough as a type hint, I find.

    0 讨论(0)
  • 2020-12-24 05:19

    This is straying from the original question but building off of @dangom's answer using TypeVar and @Georgy's comment that there is no way to specify datatypes for DataFrame columns in type hints, you could use a simple work-around like this to specify datatypes in a DataFrame:

    from typing import TypeVar
    DataFrameStr = TypeVar("pandas.core.frame.DataFrame(str)")
    def csv_to_df(path: str) -> DataFrameStr:
        return pd.read_csv(path, skiprows=1, sep='\t', comment='#')
    
    0 讨论(0)
  • 2020-12-24 05:21

    Check out the answer given here which explains the usage of the package data-science-types.

    pip install data-science-types
    

    Demo

    # program.py
    
    import pandas as pd
    
    df: pd.DataFrame = pd.DataFrame({'col1': [1,2,3], 'col2': [4,5,6]}) # OK
    df1: pd.DataFrame = pd.Series([1,2,3]) # error: Incompatible types in assignment
    

    Run using mypy the same way:

    $ mypy program.py

    0 讨论(0)
提交回复
热议问题