Google http://maps.google.com/maps/geo query with non-english characters

徘徊边缘 提交于 2019-12-10 17:53:11

问题


I'm creating a Python (using urllib2) parser of addresses with non-english characters in it. The goal is to find coordinates of every address.

When I open this url in Firefox:

http://maps.google.com/maps/geo?q=Czech%20Republic%2010000%20Male%C5%A1ice&output=csv

it is converted (changes in address box) to

http://maps.google.com/maps/geo?q=Czech Republic 10000 Malešice&output=csv

and returns

200,6,50.0865113,14.4918052

which is a correct result.

However, if I open the same url (encoded, with %20 and such) in urllib2 (or Opera browser), the result is

200,4,49.7715220,13.2955410

which is incorrect. How can I open the first url in urllib2 to get the "200,6,50.0865113,14.4918052" result?

Edit:

Code used

import urllib2

psc = '10000'
name = 'Malešice'
url = 'http://maps.google.com/maps/geo?q=%s&output=csv' % urllib2.quote('Czech Republic %s %s' % (psc, name))

response = urllib2.urlopen(url)
data = response.read()

print 'Parsed url %s, result %s\n' % (url, data)

output

Parsed url http://maps.google.com/maps/geo?q=Czech%20Republic%2010000%20Male%C5%A1ice&output=csv, result 200,4,49.7715220,13.2955410


回答1:


I can reproduce this behavior, and at first I was dumbfounded as to why it's happening. Closer inspection of the HTTP requests with wireshark showed that the requests sent by Firefox (not surprisingly) contain a couple more HTTP-Headers.

In the end it turned out it's the Accept-Language header that makes the difference. You only get the correct result if

  • an Accept-Language header is set
  • and it has a non-english language listed first (the priorities don't seem to matter)

So, for example this Accept-Language header works:

headers = {'Accept-Language': 'de-ch,en'}

To summarize, modified like this your code works for me:

# -*- coding: utf-8 -*-
import urllib2

psc = '10000'
name = 'Malešice'
url = 'http://maps.google.com/maps/geo?q=%s&output=csv' % urllib2.quote('Czech Republic %s %s' % (psc, name))
headers = {'Accept-Language': 'de-ch,en'}

req = urllib2.Request(url, None, headers)
response = urllib2.urlopen(req)
data = response.read()

print 'Parsed url %s, result %s\n' % (url, data)

Note: In my opinion, this is a bug in Google's geocoding API. The Accept-Language header indicates what languages the user agent prefers the content in, but it shouldn't have any effect on how the request is interpreted.



来源:https://stackoverflow.com/questions/12625745/google-http-maps-google-com-maps-geo-query-with-non-english-characters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!