问题
I am beginner in python, currently working on a small project with Python. I want to build a dynamic script for patent research for patentsview.org.
Here is my code:
import urllib.parse
import urllib.request
#http://www.patentsview.org/api/patents/query?q={"_and":
[{"inventor_last_name":author},{"_text_any":{"patent_title":[title]}}]}&o=
{"matched_subentities_only": "true"}
author = "Jobs"
andreq = "_and"
invln = "inventor_last_name"
text = "_text_any"
patent = "patent_title"
match = "matched_subentities_only"
true = "true"
title = "computer"
urlbasic = "http://www.patentsview.org/api/patents/query"
patentall = {patent:title}
textall = {text:patentall}
invall = {invln:author}
andall = invall.copy()
andall.update(textall)
valuesq = {andreq:andall}
valuesqand = {andreq:andall}
valuesq = {andreq:valuesqand}
valueso = {match:true}
#########
url = "http://www.patentsview.org/api/patents/query"
values = {"q":valuesq,
"o":valueso}
print(values)
data = urllib.parse.urlencode(values)
print(data)
############
data = data.encode("UTF-8")
print(data)
req = urllib.request.Request(url,data)
resp = urllib.request.urlopen(req)
respData = resp.read()
saveFile = open("patents.txt", "w")
saveFile.write(str(respData))
saveFile.close()
I think I got the right start for the dynamic URL - but the encoding seems to give me a HTTP Error 400: Bad request. If i dont encode, the url will be like www.somethingsomething.org/o:{....} which obviously produces an error. Here is the error:
Traceback (most recent call last):
File "C:/Users/Max/PycharmProjects/KlayerValter/testen.py", line 38, in
<module>
resp = urllib.request.urlopen(req)
File "C:\Python34\lib\urllib\request.py", line 161, in urlopen
return opener.open(url, data, timeout)
File "C:\Python34\lib\urllib\request.py", line 469, in open
response = meth(req, response)
File "C:\Python34\lib\urllib\request.py", line 579, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python34\lib\urllib\request.py", line 507, in error
return self._call_chain(*args)
File "C:\Python34\lib\urllib\request.py", line 441, in _call_chain
result = func(*args)
File "C:\Python34\lib\urllib\request.py", line 587, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
Process finished with exit code 1
If I encode, i get the same error since all brackets get converted. The API of patentsview works as follows:
http://www.patentsview.org/api/patents/query?q={"_or":[{"_and":
[{"inventor_last_name":"Whitney"},{"_text_phrase":{"patent_title":"cotton
gin"}}]},{"_and":[{"inventor_last_name":"Hopper"},{"_text_all":
{"patent_title":"COBOL"}}]}]}
For dynamic programming I had to come up with all the library names. If there is also a better solution, please help.
Best Regards.
回答1:
The api accepts and returns json data, so you should use json.dumps to encode your post data. Then use json.loads on the response if you want a dictionary, or just write to file.
from urllib.request import Request, urlopen
import json
url = "http://www.patentsview.org/api/patents/query"
author = "Jobs"
title = "computer"
data = {
'q':{
"_and":[
{"inventor_last_name":author},
{"_text_any":{"patent_title":title}}
]
},
'o':{"matched_subentities_only": "true"}
}
resp = urlopen(Request(url, json.dumps(data).encode()))
data = resp.read()
#data = json.loads(data)
As suggested by Christian, you could simply use requests, it's much better than urllib
.
data = requests.post(url, json=data).json()
As for all those variables in your code, they compose a dictionary like the one below:
values = {"q":{andreq:{andreq:{invln:author, text:{patent:title}}}}, "o":{match:true}}
I don't see why you would go through all that trouble to build a dictionary but i could be wrong. However you could wrap your code in a function with author
and title
as arguments.
With
requests
you don't have to use json.dumps
on your data, just use the json
parameter. If you want to save the response content to file you should use the content
or text
attribute.
import requests
title = "computer"
author = "Jobs"
url = "http://www.patentsview.org/api/patents/query"
data = {
"q":{ "_and":[ {"inventor_last_name":author}, {"_text_any":{"patent_title":title}}] },
"o":{"matched_subentities_only":"true"}
}
resp = requests.post(url, json=data)
with open("patents.txt", "w") as f:
f.write(resp.text)
来源:https://stackoverflow.com/questions/46059088/patentsview-api-python-3-4