urllib2 error no host given

会有一股神秘感。 提交于 2021-02-08 04:37:25

问题


EDIT:(SOLVED) When I am reading the values in from my file a newline char is getting added onto the end.(\n) this is splitting my request string at that point. I think it's to do with how I saved the values to the file in the first place. Many thanks.

I have I have the following code:

results = 'http://www.myurl.com/'+str(mystring)
print str(results)
request = urllib2.Request(results)
request.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)')
opener = urllib2.build_opener()
text = opener.open(request).read()

Which is in a loop. after the loop has run a few times str(mystring) changes to give a different set of results. I can loop the script as many times as I like keeping the value of str(mystring) constant but every time I change the value of str(mystring) I get an error saying no host given when the code tries to build the opener.

opener = urllib2.build_opener()

Can anyone help please?

TIA,

Paul.

EDIT:

More code here.....

import sys
import string
import httplib
import urllib2
import re
import random
import time


def StripTags(text):
    finished = 0
    while not finished:
        finished = 1
        start = text.find("<")
        if start >= 0:
            stop = text[start:].find(">")
            if stop >= 0:
                text = text[:start] + text[start+stop+1:]
                finished = 0
    return text
mystring="test"

d={}

    with open("myfile","r") as f:
        while True:
            page_counter=0
            print str(mystring)

            try:
                while page_counter <20:
                    results = 'http://www.myurl.com/'+str(mystring)
                    print str(results)
                    request = urllib2.Request(results)
                    request.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)')
                    opener = urllib2.build_opener()
                    text = opener.open(request).read()
                    finds = (re.findall('([\w\.\-]+'+mystring+')',StripTags(text)))
                    for find in finds:
                        d[find]=1
                        uniq_emails=d.keys()
                    page_counter = page_counter +1
                    print "found this " +str(finds)"
                    random.seed()
                    n = random.random()
                    i = n * 5
                    print "Pausing script for " + str(i) + " Seconds" + ""
                    time.sleep(i)
                mystring=next(f)
            except IOError:
                print "No result found!"+""

回答1:


In the while loop, you're setting results to something which is not a url:

results = 'myurl+str(mystring)'

It should probably be results = myurl+str(mystring)

By the way, it appears there's no need for all the casting to string (str()) you do: (expanded on request)

  • print str(foo): in such a case, str() is never necessary. Python will always print foo's string representation
  • results = 'http://www.myurl.com/'+str(mystring). This is also unnecessary; mystring is already a string, so 'http://www.myurl.com/' + mystring would suffice.
  • print "Pausing script for " + str(i) + " Seconds". Here you would get an error without str() since you can't do string + int. However, print "foo", 1, "bar" does work. As do print "foo %i bar" % 1 and print "foo {0} bar".format(1) (see here)



回答2:


I found the answer. It's as follows....

The values for mystring were read in from a file. In the script I wrote to write the file I opens it with "w" instead of "wb".

Each line in the file ended with a newline character "/n".

When mystring was added to the string request the new line was being created in the middle of the request string.[1]

This would never have been apparent from my code because I changed it to post here in an effort to hide the real url I am using to get my results.[2]

My actual url looks more like this.....

Myurl.com/mystring/otherstuff/page_counter/morestuff.htm

The /n being read from the file spliced my url and gave urllib problems......

[1] I use windows. It adds lots of unseen things to text files. If I'd opened the file to write to with "wb" instead of "w" the contents would have been written without the unseen /n

[2] always post your full code kids. The good people of stackoverflow can't help you unless they can see what you are doing.....

Many thanks all, I hope this helps someone out at some point.

Paul.



来源:https://stackoverflow.com/questions/14649347/urllib2-error-no-host-given

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!