Python - save requests or BeautifulSoup object locally

后端 未结 2 1767
一整个雨季
一整个雨季 2021-01-18 01:52

I have some code that is quite long, so it takes a long time to run. I want to simply save either the requests object (in this case \"name\") or the BeautifulSoup object (i

2条回答
  •  借酒劲吻你
    2021-01-18 02:37

    Storing requests locally and restoring them as Beautifoul Soup object latter on

    If you are iterating through pages of web site you can store each page with request explained here. Create folder soupCategory in same folder where your script is.

    Use any latest user agent for headers

    headers = {'user-agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0 Safari/605.1.15'}
    
    def getCategorySoup():
        session = requests.Session()
        retry = Retry(connect=7, backoff_factor=0.5)
    
        adapter = HTTPAdapter(max_retries=retry)
        session.mount('http://', adapter)
        session.mount('https://', adapter)
    
        basic_url = "https://www.somescrappingdomain.com/apartments?adsWithImages=1&page="    
        t0 = time.time() 
        j=0    
        totalPages = 1525 # put your number of pages here        
        for i in range(1,totalPages):         
            url = basic_url+str(i)
            r  = requests.get(url, headers=headers)
            pageName = "./soupCategory/"+str(i)+".html"
            with open(pageName, mode='w', encoding='UTF-8', errors='strict', buffering=1) as f:
                f.write(r.text)        
                print (pageName, end=" ")
        t1 = time.time()
        total = t1-t0
        print ("Total time for getting ",totalPages," category pages is ", round(total), " seconds")
        return 
    

    Latter on you can create Beautifoul Soup object as @merlin2011 mentioned with:

    with open("/soupCategory/1.html") as f:
      soup = BeautifulSoup(f)
    

提交回复
热议问题