Pythonic type hints with pandas?

后端未结

关注

 5  1868

Let\'s take a simple function that takes a str and returns a dataframe:

import pandas as pd
def csv_to_df(path):
    return pd.read_csv(path, skiprows=1, sep


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  挽巷        
                
              
                            
                2020-12-24 05:06
              
            
            
                                                                       
Now there is a pip package that can help with this.
https://github.com/CedricFR/dataenforce
You can install it with pip install dataenforce and use very pythonic type hints like:
def preprocess(dataset: Dataset["id", "name", "location"]) -> Dataset["location", "count"]:
    pass

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  灰色年华        
                
              
                            
                2020-12-24 05:10
              
            
            
                                                                       
Why not just use pd.DataFrame?

import pandas as pd
def csv_to_df(path: str) -> pd.DataFrame:
    return pd.read_csv(path, skiprows=1, sep='\t', comment='#')


Result is the same:

> help(csv_to_df)
Help on function csv_to_df in module __main__:
csv_to_df(path:str) -> pandas.core.frame.DataFrame

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  萌比男神i        
                
              
                            
                2020-12-24 05:11
              
            
            
                                                                       
I'm currently doing the following:

from typing import TypeVar
PandasDataFrame = TypeVar('pandas.core.frame.DataFrame')
def csv_to_df(path: str) -> PandasDataFrame:
    return pd.read_csv(path, skiprows=1, sep='\t', comment='#')


Which gives:

> help(csv_to_df)
Help on function csv_to_df in module __main__:

csv_to_df(path:str) -> ~pandas.core.frame.DataFrame


Don't know how pythonic that is, but it's understandable enough as a type hint, I find.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  小鲜肉        
                
              
                            
                2020-12-24 05:19
              
            
            
                                                                       
This is straying from the original question but building off of @dangom's answer using TypeVar and @Georgy's comment that there is no way to specify datatypes for DataFrame columns in type hints, you could use a simple work-around like this to specify datatypes in a DataFrame:

from typing import TypeVar
DataFrameStr = TypeVar("pandas.core.frame.DataFrame(str)")
def csv_to_df(path: str) -> DataFrameStr:
    return pd.read_csv(path, skiprows=1, sep='\t', comment='#')

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  萌比男神i        
                
              
                            
                2020-12-24 05:21
              
            
            
                                                                       
Check out the answer given here which explains the usage of the package data-science-types.
pip install data-science-types

Demo
# program.py

import pandas as pd

df: pd.DataFrame = pd.DataFrame({'col1': [1,2,3], 'col2': [4,5,6]}) # OK
df1: pd.DataFrame = pd.Series([1,2,3]) # error: Incompatible types in assignment

Run using mypy the same way:
$ mypy program.py
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复