I am trying to configure checkpoints for flink jobs in GCS. Everything works fine if I run a test job locally (no docker and any cluster setup) but it fails with an error i
The problem is the implementation of the scheme gs://. This is the protocol to connec to to GCS. A java program should be able to run if you add the following dependency:
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-storage</artifactId>
<version>1.35.0</version>
</dependency>
In this link you will find how to add this dependency for any other programming lanuage.
Finally I found solution here
You must create your own image and put gcs-connector into the lib directory. Otherwise you'll always get classloading issues (user code and system classloaders).
To create a custom Docker image we create the following Dockerfile:
FROM registry.platform.data-artisans.net/trial/v1.0/flink:1.4.2-dap1-scala_2.11 RUN wget -O lib/gcs-connector-latest-hadoop2.jar https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar RUN wget -O lib/gcs-connector-latest-hadoop2.jar https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar && \ wget http://ftp.fau.de/apache/flink/flink-1.4.2/flink-1.4.2-bin-hadoop28-scala_2.11.tgz && \ tar xf flink-1.4.2-bin-hadoop28-scala_2.11.tgz && \ mv flink-1.4.2/lib/flink-shaded-hadoop2* lib/ && \ rm -r flink-1.4.2* RUN mkdir etc-hadoop COPY <name of key file>.json etc-hadoop/ COPY core-site.xml etc-hadoop/ ENTRYPOINT ["/docker-entrypoint.sh"] EXPOSE 6123 8081 CMD ["jobmanager"]
The Docker image will be based on the Flink image we’re providing as part of the dA Platform trial. We are adding the Google Cloud Storage connector, Flink’s Hadoop package and the key with the configuration file.
To build the custom image, the following files should be in your current directory: core-site.xml, Dockerfile and the key-file (.json).
To finally trigger the build of the custom image, we run the following command:
$ docker build -t flink-1.4.2-gs .
Once the image has been built, we will upload the image to Google’s Container Registry. To configure Docker to properly access the registry, run this command once:
$ gcloud auth configure-docker
Next, we’ll tag and upload the container:
$ docker tag flink-1.4.2-gs:latest eu.gcr.io/<your project id>/flink-1.4.2-gs $ docker push eu.gcr.io/<your project id>/flink-1.4.2-gs
Once the upload is completed, we need to set the custom image for an Application Manager deployment. Sent the following PATCH request:
PATCH /api/v1/deployments/<your AppMgr deployment id> spec: template: spec: flinkConfiguration: fs.hdfs.hadoopconf: /opt/flink/etc-hadoop/ artifact: flinkImageRegistry: eu.gcr.io flinkImageRepository: <your project id>/flink-1.4.2-gs flinkImageTag: latest
Alternatively, use the following curl command:
$ curl -X PATCH --header 'Content-Type: application/yaml' --header 'Accept: application/yaml' -d ' spec: \ template: \ spec: \ flinkConfiguration: fs.hdfs.hadoopconf: /opt/flink/etc-hadoop/ artifact: \ flinkImageRegistry: eu.gcr.io \ flinkImageRepository: <your project id>/flink-1.4.2-gs \ flinkImageTag: latest' 'http://localhost:8080/api/v1/deployments/<your AppMgr deployment id>‘
With this change implemented, you’ll be able to checkpoint to Google’s Cloud Storage. Use the following pattern when specifying the directory gs:///checkpoints. For savepoints, set the state.savepoints.dir Flink configuration option.