I have some code that is quite long, so it takes a long time to run. I want to simply save either the requests object (in this case \"name\") or the BeautifulSoup object (i
Since name.content
is just HTML
, you can just dump this to a file and read it back later.
Usually the bottleneck is not the parsing, but instead the network latency of making requests.
from bs4 import BeautifulSoup
import requests
url = 'https://google.com'
name = requests.get(url)
with open("/tmp/A.html", "w") as f:
f.write(name.content)
# read it back in
with open("/tmp/A.html") as f:
soup = BeautifulSoup(f)
# do something with soup
Here is some anecdotal evidence for the fact that bottleneck is in the network.
from bs4 import BeautifulSoup
import requests
import time
url = 'https://google.com'
t1 = time.clock();
name = requests.get(url)
t2 = time.clock();
soup = BeautifulSoup(name.content)
t3 = time.clock();
print t2 - t1, t3 - t2
Output, from running on Thinkpad X1 Carbon, with a fast campus network.
0.11 0.02