Slice/split string Series at various positions

前端未结

关注

 3  1633

I\'m looking to split a string Series at different points depending on the length of certain substrings:

In [47]: df = pd.DataFrame([\'group9class1\', \'grou


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  别跟我提以往        
                
              
                            
                2021-01-18 22:27
              
            
            
                                                                       
You can also use zip together with a list comprehension.

df['group'], df['class'] = zip(
    *[(string[:n], string[n:]) 
      for string, n in zip(df.group_class, split_locations)])

>>> df
      group_class    group    class
0    group9class1   group9   class1
1   group10class2  group10   class2
2  group11class20  group11  class20

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  轻奢々        
                
              
                            
                2021-01-18 22:33
              
            
            
                                                                       
This works, by using double [[]] you can access the index value of the current element so you can index into the split_locations series:

In [119]:    
df[['group_class']].apply(lambda x: pd.Series([x.str[split_locations[x.name]:][0], x.str[:split_locations[x.name]][0]]), axis=1)
Out[119]:
         0        1
0   class1   group9
1   class2  group10
2  class20  group11


Or as @ajcr has suggested you can extract:

In [106]:

df['group_class'].str.extract(r'(?P<group>group[0-9]+)(?P<class>class[0-9]+)')
Out[106]:
     group    class
0   group9   class1
1  group10   class2
2  group11  class20


EDIT

Regex explanation:

the regex came from @ajcr (thanks!), this uses str.extract to extract groups, the groups become new columns.

So ?P<group> here identifies an id for a specific group to look for, if this is missing then an int will be returned for the column name.

so the rest should be self-explanatory: group[0-9] looks for the string group followed by the digits in range [0-9] which is what the [] indicate, this is equivalent to group\d where \d means digit.

So it could be re-written as:

df['group_class'].str.extract(r'(?P<group>group\d+)(?P<class>class\d+)')

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  挽巷        
                
              
                            
                2021-01-18 22:40
              
            
            
                                                                       
Use a regular expression to split the string

 import re

 regex = re.compile("(class)")
 str="group1class23"
 # this will split the group and the class string by adding a space between them, and using a simple split on space.
 split_string = re.sub(regex, " \\1", str).split(" ")


This will return the array:

 ['group9', 'class23']


So to append two new columns to your DataFrame you can do:

new_cols = [re.sub(regex, " \\1", x).split(" ") for x in df.group_class]
df['group'], df['class'] = zip(*new_cols)


Which results in:

      group_class    group    class
0    group9class1   group9   class1
1   group10class2  group10   class2
2  group11class20  group11  class20

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复