问题
I want to call HDFS REST api to upload a file using httplib
.
My program created the file, but no content is in it.
=====================================================
Here is my code:
import httplib
conn=httplib.HTTPConnection("localhost:50070")
conn.request("PUT","/webhdfs/v1/levi/4?op=CREATE")
res=conn.getresponse()
print res.status,res.reason
conn.close()
conn=httplib.HTTPConnection("localhost:50075")
conn.connect()
conn.putrequest("PUT","/webhdfs/v1/levi/4?op=CREATE&user.name=levi")
conn.endheaders()
a_file=open("/home/levi/4","rb")
a_file.seek(0)
data=a_file.read()
conn.send(data)
res=conn.getresponse()
print res.status,res.reason
conn.close()
==================================================
Here is the return:
307 TEMPORARY_REDIRECT 201 Created
=========================================================
OK, the file was created, but no content was sent.
When I comment the #conn.send(data)
, the result is the same, still no content.
Maybe the file read or the send is wrong, not sure.
Do you know how this happened?
回答1:
It looks like your code is not using the "location" header from the 307 in the second PUT request.
I've been working on a fork of a python WebHDFS wrapper that may be of use, you can see the full code here: https://github.com/carlosmarin/webhdfs-py/blob/master/webhdfs/webhdfs.py
The method you'd be interested in is:
def copyfromlocal(self, source_path, target_path, replication=1, overwrite=True):
url_path = WEBHDFS_CONTEXT_ROOT + target_path + '?op=CREATE&overwrite=' + 'true' if overwrite else 'false'
with _NameNodeHTTPClient('PUT', url_path, self.namenode_host, self.namenode_port, self.username) as response:
logger.debug("HTTP Response: %d, %s" % (response.status, response.reason))
redirect_location = response.msg["location"]
logger.debug("HTTP Location: %s" % redirect_location)
(redirect_host, redirect_port, redirect_path, query) = self.parse_url(redirect_location)
# Bug in WebHDFS 0.20.205 => requires param otherwise a NullPointerException is thrown
redirect_path = redirect_path + "?" + query + "&replication=" + str(replication)
logger.debug("Redirect: host: %s, port: %s, path: %s " % (redirect_host, redirect_port, redirect_path))
fileUploadClient = HTTPConnection(redirect_host, redirect_port, timeout=600)
# This requires currently Python 2.6 or higher
fileUploadClient.request('PUT', redirect_path, open(source_path, "r").read(), headers={})
response = fileUploadClient.getresponse()
logger.debug("HTTP Response: %d, %s" % (response.status, response.reason))
fileUploadClient.close()
return json.loads(response.read())
来源:https://stackoverflow.com/questions/15870381/i-want-to-call-hdfs-rest-api-to-upload-a-file