Using defined strings for regex searching with python

后端未结

关注

 1  1997

I am looking to enhance the script I have below. I am wondering if it is possible to use defined strings such as \'G\', \'SG\', \'PF\', \'PG\', \'SF\', \'F\', \'UTIL\', \'


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  情书的邮戳        
                
              
                            
                2021-01-27 20:54
              
            
            
                                                                       
We can simply update your regex expression to check if the capitalised word is not directly next to the previous.

r"(?<![A-Z] )\b([A-Z]+) "


Note we have added a negative lookbehind. To not match if the previous word is not [A-Z]

You can find a more in-depth explanation on the above regex here; https://regex101.com/r/j6RbSP/1

You can now update your code to include the new regex patterns, ensure you remember to add r"" in front of the string.

import pandas as pd, numpy as np
import re

dk_cont_lineup_df = pd.DataFrame(data=np.array([['G CJ McCollum SG Donovan Mitchell PF Robert Covington PG Collin Sexton SF Bojan Bogdanovic F Larry Nance Jr. UTIL Trey Lyles C Maxi Kleber'],['UTIL Nikola Vucevic PF Kevin Love F Robert Covington SG Collin Sexton SF Bojan Bogdanovic G Coby White PG RJ Barrett C Larry Nance Jr.']]))
dk_cont_lineup_df.rename(columns={ dk_cont_lineup_df.columns[0]: 'Lineup' }, inplace = True)


def calc_col(col):
    '''This function takes a string,
    finds the upper case letters or words placed as delimeter,
    converts it to a list,
    adds a number to the list elements if recurring.
    Eg. input list :['W','W','W','D','D','G','C','C','UTIL']
    o/p list: ['W1','W2','W3','D1','D2','G','C1','C2','UTIL']
    '''
    col_list = re.findall(r"(?<![A-Z] )\b([A-Z]+) ", col)
    col_list2 = []
    for i_pos in col_list:
        cnt = col_list.count(i_pos)
        if cnt == 1:
            col_list2.append(i_pos)
        if cnt > 1:
            if i_pos in " ".join(col_list2):
                continue;
            col_list2 += [i_pos+str(k) for k in range(1,cnt+1)] 
    return col_list2


extr_row = dk_cont_lineup_df['Lineup'].replace(to_replace =r"(?<![A-Z] )\b([A-Z]+) ", value="\n", regex = True) #split the rows on 
df_final = pd.DataFrame(columns = sorted(calc_col(dk_cont_lineup_df['Lineup'].iloc[0])))

for i_pos in range(len(extr_row)): #traverse all the rows in the original dataframe and append the formatted rows to df3
    df_temp = pd.DataFrame((extr_row.values[i_pos].split("\n")[1:])).T
    df_temp.columns = calc_col(dk_cont_lineup_df['Lineup'].iloc[i_pos])
    df_temp= df_temp[sorted(df_temp)]
    df_final = df_final.append(df_temp)
df_final.reset_index(drop = True, inplace = True)

print(df_final.to_string())


Produces the desired output:

                 C                  F             G                 PF              PG                 SF                 SG             UTIL
0      Maxi Kleber   Larry Nance Jr.   CJ McCollum   Robert Covington   Collin Sexton   Bojan Bogdanovic   Donovan Mitchell       Trey Lyles 
1  Larry Nance Jr.  Robert Covington    Coby White         Kevin Love      RJ Barrett   Bojan Bogdanovic      Collin Sexton   Nikola Vucevic 

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复