百度地图POI数据获取

本文主要介绍百度地图POI数据获取及后续处理的过程。POI数据获取及后续处理的过程主要包含了两个大的步骤，即

POI数据获取：从百度地图得到POI数据，以json格式保存；
数据的EXCEL导入：将json格式保存的数据转化为excel文件。

POI数据获取的原理部分还可以参照零基础掌握百度地图兴趣点获取POI爬虫（python语言爬取）（基础篇）。

兴趣点坐标获取得到。得到的POI信息包括了名称、经纬度坐标、地址等等，具体的接口使用说明可以参考百度地图WEB服务api说明中的地点检索。

从说明文档我们可以发现，POI数据获取的关键在于构造出合适的url，访问该url便能请求到相应的POI数据。因此，我们先对百度地图WEB服务api中的url进行详细的说明。

http://api.map.baidu.com/place/v2/search?query=银行&bounds=39.915,116.404,39.975,116.414&output=json&ak={您的密钥} //GET请求

以上是百度地图说明文档提供的一个搜索url示例，我们可以将其划分为以下几个部分：

前缀部分：无论进行何种搜索，需要的数据格式如何，请求的url都需要这一部分
http://api.map.baidu.com/place/v2/search?
参数部分：对请求的数据进行定制，你可以指定特定的关键词、搜索区域、输出类型以及你的ak（access key）
query=银行&bounds=39.915,116.404,39.975,116.414&output=json&ak={您的密钥}

前缀部分对所有请求都一致不需要过多的说明，而参数部分影响搜索的结果，需要详细说明。由于百度地图提供了三种POI搜索方式，即行政区划区域搜索、周边搜索、矩形区域搜索，但这几种搜索仅仅在一些参数上存在差异，大部分参数都是相同的，返回的结果也是相同的，本文仅以矩形区域搜索请求参数举例说明：

参数名	参数含义	类型	示例	是否必须
query	检索关键字。周边检索和矩形区域内检索支持多个关键字并集检索，不同关键字间以`$`符号分隔，最多支持10个关键字检索。如:”银行$酒店”	string(45)	天安门	必选
bounds	检索矩形区域，多组坐标间以”,”分隔	string(50)	38.76623,116.43213,39.54321,116.46773 lat,lng(左下角坐标),lat,lng(右上角坐标)	必选
output	输出格式为json或者xml	string(50)	json或xml	可选
scope	检索结果详细程度。取值为1 或空，则返回基本信息；取值为2，返回检索POI详细信息	string(50)	1、2	可选
page_size	单次召回POI数量，默认为10条记录，最大返回20条。多关键字检索时，返回的记录数为关键字个数*page_size。	int	10	可选
page_num	分页页码，默认为0,0代表第一页，1代表第二页，以此类推。常与page_size搭配使用。	int	0、1、2	可选
coord_type	坐标类型，1（wgs84ll即GPS经纬度），2（gcj02ll即国测局经纬度坐标），3（bd09ll即百度经纬度坐标），4（bd09mc即百度米制坐标）注：”ll为小写LL”	int	1、2、3(默认)、4	可选
ret_coordtype	可选参数，添加后POI返回国测局经纬度坐标	string(50)	gcj02ll	可选
ak	开发者的访问密钥，必填项。v2之前该属性为key。	string(50)		必选

返回参数

名称	类型	说明
status	int	本次API访问状态，如果成功返回0，如果失败返回其他数字。（见服务状态码）
total	int	POI检索总数，开发者请求中设置了page_num字段才会出现total字段。出于数据保护目的，单次请求total最多为400。
name	string	poi名称
location	object	poi经纬度坐标
address	string	poi地址信息

需要特别注意的是：

百度地图为了保护数据，单次请求total最多为400，也就是只能搜出400个结果，如果搜索结果大于400个的时候只显示400条记录；
百度地图为开发者提供的配额为2000次请求/每天，并发访问的限制为120。

第一个问题的解决可以通过划分子搜索区域，将需要搜索的矩形区域划分为多个面积更小的矩形区域，将他们的搜索结果进行合并即得到了需要的搜索结果。

第二个问题的解决通过申请多个ak，交替使用，同时减缓请求速度。

最后实现的代码如下：

# -*- coding: utf-8 -*-  # 第一行必须有，否则报中文字符非ascii码错误  import urllib import json import time  #ak需要在百度地图开放平台申请 ak = "XXX"  #关键词 query=["社会福利院"] page_size=20 page_num=0 scope=1  #范围： #左下坐标 30.379,114.118 #右上坐标 30.703,114.665 #中间坐标 30.541,114.3915  bounds=[     [30.379,114.118,30.541,114.3915],     [30.379,114.3915,30.541,114.665],     [30.541,114.118,30.703,114.3915],     [30.541,114.3915,30.703,114.665] ]  new_bounds = [] # col_row 将bounds的每一小块继续细分为3行3列，可以防止区域内的搜索数量上限400 col_row = 3  for lst in bounds:     distance_lat = (lst[2] - lst[0])/col_row     distance_lon = (lst[3] - lst[1])/col_row     for i in range(col_row):         for j in range(col_row):             lst_temp = []             lst_temp.append(lst[0]+distance_lat*i)             lst_temp.append(lst[1]+distance_lon*j)             lst_temp.append(lst[0]+distance_lat*(i+1))             lst_temp.append(lst[1]+distance_lon*(j+1))             new_bounds.append(lst_temp)  queryResults = []  for bound in new_bounds:     np=True     a=[]     while np==True:         #使用百度提供的url拼接条件         url="http://api.map.baidu.com/place/v2/search?ak="+str(ak)+"&output=json&query="+str(query[0])+"&page_size="+str(page_size)+"&page_num="+str(page_num)+"&bounds="+str(bound[0])+","+str(bound[1])+","+str(bound[2])+","+str(bound[3])          #请求url读取，创建网页对象         jsonf=urllib.urlopen(url)         page_num=page_num+1         jsonfile=jsonf.read()          #判断查询翻页进程         s=json.loads(jsonfile)         total=int(s["total"])         a.append(total)          queryResults.append(s)          max_page=int(a[0]/page_size)+1         #防止并发过高，百度地图要求并发小于120         time.sleep(1)           if page_num>max_page:             np=False             page_num=0             print "search complete"             print "output: "+str(bound)             print "total: "+str(a[0])             print ("")  results=open(".\results.txt",'a') results.write(str(queryResults).decode('unicode_escape')) results.close() print "ALL DONE!"

得到的结果保存在results.txt中，但由于字符编码的问题，结果中带有异常字符u'，需要全部手动替换为'。接着把result.txt中的搜索结果复制到result.js文件中，在文件的首位分别加上[ 和 ]符号，构成一个对象数组，方便后续导入Excel中时的遍历过程。

EXCEL的导入过程是这样实现的，先遍历对象数组中的所有对象并构造出类似于HTML中table结构的表格，然后以特定的构造方式将table转为excel文件。具体代码如下：

<!DOCTYPE html>  <head>     <meta http - equiv="content-type" content="text/html; charset=utf-8">     <tile>ARRAY TO EXCEL</tile>     <script src="https://code.jquery.com/jquery-3.3.1.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>     <script src='敬老院.js'></script>     <script src='老年公寓.js'></script>     <script src='老年之家.js'></script>     <script src='社会福利院.js'></script>     <script src='社区卫生服务站.js'></script>     <script src='社区卫生服务中心.js'></script>     <script src='社区医院.js'></script>     <script src='养老（老年）服务中心.js'></script>     <script src='养老机构.js'></script>     <script src='养老院.js'></script> </head>  <body>     <input type="button" id="wwo" value="导出" />      <script type="text/javascript">         $(document).ready(function() {             $('#wwo').click(function() {                ArrayToExcelConvertor(jly, "敬老院");                ArrayToExcelConvertor(lngy, "老年公寓");                ArrayToExcelConvertor(lnzj, "老年之家");                ArrayToExcelConvertor(shfly, "社会福利院");                ArrayToExcelConvertor(sqwsfwz, "社区卫生服务站");                ArrayToExcelConvertor(sqyy, "社区医院");                ArrayToExcelConvertor(ylfwzx, "养老服务中心");                ArrayToExcelConvertor(yljg, "养老机构");                ArrayToExcelConvertor(yly, "养老院");             });         });          function ArrayToExcelConvertor(Data, FileName) {             var excel = '<table>';             var row = "";             for (var i = 0; i < Data.length; i++) {                 if (Data[i].results.length > 0) {                     for (var j = 0; j < Data[i].results.length; j++) {                         var name = Data[i].results[j].name;                         var lng = Data[i].results[j].location.lng;                         var lat = Data[i].results[j].location.lat;                         var addr = Data[i].results[j].address;                         row += '<tr>';                         row += '<td>' + name + '</td>';                         row += '<td>' + lng + '</td>';                         row += '<td>' + lat + '</td>';                         row += '<td>' + addr + '</td>';                         row += "</tr>";                     }                 }             }             excel += row + "</table>";              var excelFile = "<html xmlns:o='urn:schemas-microsoft-com:office:office' xmlns:x='urn:schemas-microsoft-com:office:excel' xmlns='http://www.w3.org/TR/REC-html40'>";             excelFile += '<meta http-equiv="content-type" content="application/vnd.ms-excel; charset=UTF-8">';             excelFile += '<meta http-equiv="content-type" content="application/vnd.ms-excel';             excelFile += '; charset=UTF-8">';             excelFile += "<head>";             excelFile += "<!--[if gte mso 9]>";             excelFile += "<xml>";             excelFile += "<x:ExcelWorkbook>";             excelFile += "<x:ExcelWorksheets>";             excelFile += "<x:ExcelWorksheet>";             excelFile += "<x:Name>";             excelFile += "{worksheet}";             excelFile += "</x:Name>";             excelFile += "<x:WorksheetOptions>";             excelFile += "<x:DisplayGridlines/>";             excelFile += "</x:WorksheetOptions>";             excelFile += "</x:ExcelWorksheet>";             excelFile += "</x:ExcelWorksheets>";             excelFile += "</x:ExcelWorkbook>";             excelFile += "</xml>";             excelFile += "<![endif]-->";             excelFile += "</head>";             excelFile += "<body>";             excelFile += excel;             excelFile += "</body>";             excelFile += "</html>";             var uri = 'data:application/vnd.ms-excel;charset=utf-8,' + encodeURIComponent(excelFile);             var link = document.createElement("a");             link.href = uri;             link.style = "visibility:hidden";             link.download = FileName + ".xls";             document.body.appendChild(link);             link.click();             document.body.removeChild(link);         }      </script> </body>  </html>

文章来源: 百度地图POI数据获取

标签

poi

搜索百度

数据检索