python read lines of website source code 100 lines at a time

99封情书 提交于 2019-12-25 05:24:15

问题


I'm trying to read the source code from a website 100 lines at a time

For example:

self.code = urllib.request.urlopen(uri)

#Get 100 first lines
self.lines = self.getLines()

...

#Get 100 next lines
self.lines = self.getLines()

My getLines code is like this:

def getLines(self):
    res = []
    i = 0

    while i < 100:
        res.append(str(self.code.readline()))
        i+=1

return res

But the problem is that getLines() always returns the first 100 lines of the code.

I've seen some solutions with next() or tell() and seek(), but it seems that those functions are not implemented in HTTPResponse class.


回答1:


according to the documentation urllib.request.urlopen(uri) returns a file like object, so you should be able to do:

from itertools import islice

def getLines(self)
    res = []
    for line in islice(self.code,100): 
        res.append(line)
    return res

there's more information on islice in the itertools documentation. Using iterators will avoid the while loop and manual increments.

If you absolutely must use readline(), it's advisable to use a for loop, i.e.

for i in xrange(100): 
    ... 



回答2:


This worked for me.

#!/usr/bin/env python

import urllib

def getLines(code):
    res = []
    i = 0

    while i < 100:
        res.append(str(code.readline()))
        i+=1

    return res

uri='http://www.google.com/'
code = urllib.urlopen(uri)

#Get 100 first lines
lines = getLines(code)

print lines

#Get 100 next lines
lines = getLines(code)

print lines


来源:https://stackoverflow.com/questions/10249673/python-read-lines-of-website-source-code-100-lines-at-a-time

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!