Match unicode emoji in python regex

前端未结

关注

 3  1168

时光说笑 2021-01-18 02:49

I need to extract the text between a number and an emoticon in a text

example text:

blah xzuyguhbc ibcbb bqw 2 extract1  ☺️ jbjhcb 6 extract2


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   清酒与你
                                             
                
                
                (楼主)
            
              
              
                2021-01-18 03:31
              

            
            
                        
Here's my stab at the solution. Not sure if it will work in all circumstances. The trick is to convert all unicode emojis into normal text. This could be done by following this post Then you can match the emoji just as any normal text. Note that it won't work if the literal strings \u or \U is in your searched text.

Example: Copy your string into a file, let's call it emo. 
In terminal:

Chip chip@ 03:24:33@ ~: cat emo | python stackoverflow.py
blah xzuyguhbc ibcbb bqw 2 extract1  \u263a\ufe0f jbjhcb 6 extract2 \U0001f645 bjvcvvv\n
------------------------
[' extract1  ', ' extract2 ']


Where stackoverflow.py file is:

import fileinput
a = fileinput.input();
for line in a:
    teststring = unicode(line,'utf-8')
    teststring = teststring.encode('unicode-escape')

import re
print teststring
print "------------------------"
m = re.findall('(?<=[\s][\d])(.*?)(?=\\\\[uU])', teststring)
print m

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复