问题
I have two computers, both running 64-bit Windows 7. One machine has python 32-bit, one is running python 64-bit. Both machines have 8GB of RAM.
I'm using BeautifulSoup to scrape a webpage, but I've been running into issues on my python64 machine. I've been able to figure out that the output of my len(str(BeautifulSoup(request.get(http://www.sampleurl.com).text)))
in 64bit is only returning 92520 characters but on the same, static, site on my python32-bit machine, it's returning 135000 characters.
At some point in the past on my python64-bit machine I had python32-bit, but uninstalled it to install python64-bit because I was having issues installing scipy using pip install (turns out that wasn't the issue).
Anyway, I'm unsure as to why my 64bit python machine isn't returning the entire html string and I was wondering if anyone can help me understand what is going on and how can I fix it.
回答1:
This is not a 32bit / 64bit issue. You are most likely a parser issue; one machine using lxml
vs. html.parser
on the other, for example.
Different parsers deal differently with broken HTML, and lxml
is the default only when installed.
See for example:
- Beautiful Soup findAll doen't find them all
- Beautiful Soup 4 find_all don't find links that Beautiful Soup 3 finds
- BeautifulSoup fails to parse long view state
- Beautifulsoup lost nodes
- Missing parts on Beautiful Soup results
etc.
Run import lxml
on both machines to verify. When you replaced your Python installation on one machine with a 64-bit version, you likely didn't include a compatible lxml
version.
来源:https://stackoverflow.com/questions/28616558/python-64-bit-not-storing-as-long-of-string-as-32-bit-python