问题
I have 2 approaches to initialize the HttpClient
in order to make an API call from a ParDo in Apache Beam.
Approach 1:
Initialise the HttpClient
object in the StartBundle
and close the HttpClient
in FinishBundle
. The code is as follows:
public class ProcessNewIncomingRequest extends DoFn<String, KV<String, String>> {
@StartBundle
public void startBundle() {
HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(<Custom_URL>))
.build();
}
@ProcessElement
public void processElement(){
// Use the client and do an external API call
}
@FinishBundle
public void finishBundle(){
httpClient.close();
}
}
Approach 2:
Have a separate Class where all the connections are managed using the connection pool.
public class ExternalConnection{
HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(<Custom_URL>))
.build();
public Response getResponse(){
// use the client, send request and get response
}
}
public class ProcessNewIncomingRequest extends DoFn<String, KV<String, String>> {
@ProcessElement
public void processElement(){
Response response = new ExternalConnection().getResponse();
}
}
Which one of the above 2 approaches are better in terms of performance and coding design standards?
回答1:
Either approach would work fine; the StartBundle/FinishBundle
one is more contained IMHO but has the disadvantage of not working well if your bundles are very small. An even better approach might be to use DoFn's SetUp/TearDown
which can span an arbitrary number of bundles, but is tied to the lifetime of the DoFn (leveraging the pooling of DoFn instances the Beam SDKs already do).
来源:https://stackoverflow.com/questions/62778595/better-approach-to-call-external-api-in-apache-beam