Failed to grab dates in a cutomized manner out of a tabular content

前端未结
关注
 3  1261
醉酒成梦 2021-01-07 13:34
I\'ve written a script in python in combination with selenium to parse some dates available within a table in a webpage. The table is located under the header NPL Vict

      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   小鲜肉
                                             
                
                
                (楼主)
            
              
              
                2021-01-07 14:26
              

            
            
                        
I'm not using Selenium, but selected dates can be extracted with just BeautifulSoup. The timedates are coded as Unix timestamp inside tag classes:

from bs4 import BeautifulSoup
import requests
import re
import datetime

headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'}
r = requests.get('http://www.oddsportal.com/soccer/australia/npl-victoria/', headers=headers)
soup = BeautifulSoup(r.text, 'lxml')

for td in soup.select('table#tournamentTable td.datet'):
    for c in td['class']:
        if re.match(r't\d+', c):
            unix_timestamp = int(re.match(r't(\d+)', c)[1])
            d = datetime.datetime.utcfromtimestamp(unix_timestamp).strftime('%d %b %Y--%H:%M')
            print(d)


Prints:

10 Aug 2018--09:30
10 Aug 2018--10:15
11 Aug 2018--05:00
11 Aug 2018--05:00
11 Aug 2018--09:00
12 Aug 2018--06:00
12 Aug 2018--06:00


If you want also the matches printed:

for td in soup.select('table#tournamentTable td.datet'):
    for c in td['class']:
        if re.match(r't\d+', c):
            unix_timestamp = int(re.match(r't(\d+)', c)[1])
            d = datetime.datetime.utcfromtimestamp(unix_timestamp).strftime('%d %b %Y--%H:%M')
            print(d, end=' ')
            print(td.find_next('td').text)


Prints:

10 Aug 2018--09:30 Melbourne Knights - Port Melbourne Sharks
10 Aug 2018--10:15 Pascoe Vale - Dandenong Thunder
11 Aug 2018--05:00 Avondale FC - Bentleigh Greens
11 Aug 2018--05:00 Northcote City - Bulleen
11 Aug 2018--09:00 Hume City - Oakleigh Cannons
12 Aug 2018--06:00 Heidelberg Utd - Green Gully
12 Aug 2018--06:00 South Melbourne - Kingston City

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复