问题
I'm trying to read the source code from a website 100 lines at a time
For example:
self.code = urllib.request.urlopen(uri)
#Get 100 first lines
self.lines = self.getLines()
...
#Get 100 next lines
self.lines = self.getLines()
My getLines code is like this:
def getLines(self):
res = []
i = 0
while i < 100:
res.append(str(self.code.readline()))
i+=1
return res
But the problem is that getLines()
always returns the first 100 lines of the code.
I've seen some solutions with next()
or tell()
and seek()
, but it seems that those functions are not implemented in HTTPResponse class.
回答1:
according to the documentation urllib.request.urlopen(uri)
returns a file like object, so you should be able to do:
from itertools import islice
def getLines(self)
res = []
for line in islice(self.code,100):
res.append(line)
return res
there's more information on islice
in the itertools documentation. Using iterators will avoid the while
loop and manual increments.
If you absolutely must use readline()
, it's advisable to use a for
loop, i.e.
for i in xrange(100):
...
回答2:
This worked for me.
#!/usr/bin/env python
import urllib
def getLines(code):
res = []
i = 0
while i < 100:
res.append(str(code.readline()))
i+=1
return res
uri='http://www.google.com/'
code = urllib.urlopen(uri)
#Get 100 first lines
lines = getLines(code)
print lines
#Get 100 next lines
lines = getLines(code)
print lines
来源:https://stackoverflow.com/questions/10249673/python-read-lines-of-website-source-code-100-lines-at-a-time