问题
I am doing HttpPut with MultiPartEntity to write a file to HDFS via the webHDFS REST API. The request itself goes through and gives me the right responses, 307 and 201. However the image has multipart headers also written as part of it as shown below and its not a valid image to retrieve and open.
--8DkJ3RkUHahEaNE9Ktw8NC1TFOqegjfA9Ps
Content-Disposition: form-data; name="file"; filename="advert.jpg"
Content-Type: application/octet-stream
ÿØÿàJFIFHHÿÛC
// Rest of the image content
--8DkJ3RkUHahEaNE9Ktw8NC1TFOqegjfA9Ps
Removing the multipart headers from the image file, makes it a valid image, but I am not sure how I can avoid it to begin with. I am not even sure if I have control over this since the webHDFS is responsible for actually writing the file.
Here is my code for it. Is there something else I should be doing?
final String LOCATION = "Location";
final String writeURI = "http://<ip>:50070/webhdfs/v1/user/hadoop/advert.jpg";
HttpPut put = new HttpPut(writeURI);
HttpClient client = HttpClientBuilder.create().build();
HttpResponse response = client.execute(put);
put.releaseConnection();
String redirectUri = null;
Header[] headers = response.getAllHeaders();
for(Header header : headers)
{
if(LOCATION.equalsIgnoreCase(header.getName()))
{
redirectUri = header.getValue();
}
}
HttpPut realPut = new HttpPut(redirectUri);
realPut.setEntity(buildMultiPartEntity("advert.jpg"));
HttpResponse response2 = client.execute(realPut);
private HttpEntity buildMultiPartEntity(String fileName)
{
MultipartEntityBuilder multipartEntity = MultipartEntityBuilder.create();
multipartEntity.setMode(HttpMultipartMode.BROWSER_COMPATIBLE);
multipartEntity.addPart("file", new FileBody(new File(fileName)));
return multipartEntity.build();
}
Any help is appreciated.
回答1:
I met the same issue with python requests. What i did to resolve it finally is to read the image into memory before sending it out. And using one step call to the webhdfs api instead of two. Hope this can be a little bit helpful.
host_url = current_app.config.get('HDFS_URL', '')
adx_img_path = current_app.config.get('ADX_CUSTOMER_IMAGE', '')
real_path = adx_img_path + remotefile
hdfs_username = current_app.config.get('HDFS_USERNAME', 'xdisk')
parameters = '?user.name=' + hdfs_username + '&op=CREATE&data=true'
img = open(localfile, 'rb').read()
url = host_url + real_path + parameters
r = requests.put(url, data=img, headers={"Content-Type": "application/octet-stream"})
It seems by reading the image as binary/byte, the weird headers will not be added into file header. For HttpClient which you are using, i would suggest you try InputStreamBody
or ByteArrayBody
.
回答2:
Add the image as FileEntity, ByteArrayEntity or InputStreamEntity with Content-Type "application/octet-stream".
回答3:
This is the code that worked for me base on the accepted answer:
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpPut;
import org.apache.http.entity.FileEntity;
import org.apache.http.impl.client.HttpClientBuilder;
import java.io.File;
import java.io.IOException;
public class Test {
public void Test(){
try {
final String writeURI = "http://<IP>:50075/webhdfs/v1/user/sample.xml?op=CREATE&user.name=istvan&namenoderpcaddress=quickstart.cloudera:8020&overwrite=true";
HttpClient client = HttpClientBuilder.create().build();
HttpPut put = new HttpPut(writeURI);
put.setEntity(buildFileEntity("C:\\sample.xml"));
put.setHeader("Content-Type", "application/octet-stream");
HttpResponse response = client.execute(put);
System.out.println(response);
}catch(IOException e){
e.printStackTrace();
}
}
private static FileEntity buildFileEntity (String fileName)
{
FileEntity inputData = new FileEntity(new File(fileName));
return inputData;
}
public static void main(String[] args) {
new Test().Test();
}
}
Maven:
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.4</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpmime</artifactId>
<version>4.3.1</version>
</dependency>
来源:https://stackoverflow.com/questions/23248890/issues-with-uploading-an-image-to-hdfs-via-webhdfs-rest-api