问题
I am processing the bulk json payload from s3. code as follows:
import com.fasterxml.jackson.core.JsonFactory;
import com.fasterxml.jackson.core.JsonParseException;
import com.fasterxml.jackson.core.JsonParser;
import com.amazonaws.services.s3.model.S3Object;
import static com.fasterxml.jackson.core.JsonToken;
import com.google.common.util.concurrent.Futures;
import com.google.common.util.concurrent.ListenableFuture;
public boolean sync(Job job)
throws IOException
//validating the json payload from s3.
try(InputStream s3Stream = readStreamFromS3())
{
validationService.validate(s3Stream);
}
catch (S3SdkInteractionException e) {
{
logger.error(e.getLocalizedMessage();
}
//process the json payload from s3.
try (InputStream s3Stream = readStreamFromS3())
{
syncService.process(s3Stream);
}
catch (S3SdkInteractionException e) {
{
logger.error(e.getLocalizedMessage();
}
}
public InputSteam readStreamFromS3()
{
return S3Object.getObjectContent();
}
// Process will sync the user data in the s3 stream.
// I am not closing the stream till the entire stream is processed. I
// need to handle as a stream processing.
// I dont want keep the contents in memory for processing, not
feasible for my use case.
public boolean process(InputStream s3Stream)
{
jsonFactory = objectMapper.getFactory();
try(JsonParser jsonParser = jsonFactory.createParser(s3Stream) {
JsonToken jsonToken = jsonParser.nextToken();
List<HttpResponseFuture<UserResponse> userFutures = new ArrayLsit<>(20);
while(true) {
for(int i = 0; i < 20; i++)
{
try {
// stream is processed fully
if (jsonToken == null || jsonToken == JSONTOKEN.END_OBJECT) { break; }
while (!jsonToken.isStructStart()) {
jsonToken = jsonParser.nextToken();
}
// Fetch the user record from the stream
if (jsonTokenn.isStructStart()) {
Map<String,Object> userNode = jsonParser.readValueAs(Map.class);
// calling an external service and adding future response
userFutures.add(executeAsync(httpClient, userNode);
//Move to the next user record
if (jsonToken == JSONTOKEN.START_OBJECT) {
jsonToken = jsonParser.nextToken();
}
}
}
catch (JsonParseException jpe) {
logger.error(jpe.getLocalizedMessage());
break;
}
}
for(ListenableFuture<UserResponse> responseFuture : Futures.inCompletionOrder(userFutures)) {
JsonResponse response = responseFuture.get();
}
}
}
return false;
}
There is serviceA through which we are ingesting data (json payload) to S3. Another serviceB (the pseudocode shown above) will process the s3 data and call another serviceC to sync the data (json payload) in underlying store.
Problem:
I am seeing repeated s3 warning in our code. com.amazonaws.services.s3.internal.S3AbortableInputStream Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use
The validation phase is executing as expected without any issues. However on syncing the data(ie. syncService.process()), the s3Stream is getting closed before the entire payload is processed. Since the stream is getting the closed before i process the entire stream, i am in inconsistent state.
Dependency information as follows
aws-java-sdk-s3:1.11.411
guava:guava-25.0-jre
jackson-core:2.9.6
Json payload could vary between few MB's to 2 GB.
Any help would be appreciated.
来源:https://stackoverflow.com/questions/55227573/s3stream-is-getting-closed-before-processing-the-entire-payload