Cloud Data Fusion creates a new Dataproc cluster for every pipeline run. I already have a Dataproc cluster setup which runs 24x7 and I would like to use that cluster to run
Following this process, I get a "java.io.IOException: com.jcraft.jsch.JSchException: invalid privatekey" exception when launching the pipeline.
I tried with/without expiration time in the public key. I tried to setup the key at the master instance level and project level. Connecting to the instance through "ssh -i private-key-file user@external-ip" works.
Error stack:
2020-10-27 14:28:06,718 - ERROR [runtime-scheduler-2:i.c.c.i.a.r.d.r.RemoteExecutionTwillRunnerService@528] - Fail to start program run program_run:default.apache-logs-ingest_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.99b4015f-1860-11eb-b3cf-bae7e12abd00 java.io.IOException: com.jcraft.jsch.JSchException: invalid privatekey: [B@28e946fe at io.cdap.cdap.common.ssh.DefaultSSHSession.(DefaultSSHSession.java:88) ~[na:na] at io.cdap.cdap.internal.app.runtime.distributed.remote.RemoteExecutionTwillPreparer.launch(RemoteExecutionTwillPreparer.java:110) ~[na:na] at io.cdap.cdap.internal.app.runtime.distributed.remote.AbstractRuntimeTwillPreparer.lambda$start$1(AbstractRuntimeTwillPreparer.java:466) ~[na:na] at io.cdap.cdap.internal.app.runtime.distributed.remote.RemoteExecutionTwillRunnerService$ControllerFactory.lambda$create$0(RemoteExecutionTwillRunnerService.java:504) ~[na:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_252] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_252] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_252] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[na:1.8.0_252] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_252] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_252] at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_252] Caused by: com.jcraft.jsch.JSchException: invalid privatekey: [B@28e946fe at com.jcraft.jsch.KeyPair.load(KeyPair.java:664) ~[com.jcraft.jsch-0.1.54.jar:na] at com.jcraft.jsch.IdentityFile.newInstance(IdentityFile.java:46) ~[com.jcraft.jsch-0.1.54.jar:na] at com.jcraft.jsch.JSch.addIdentity(JSch.java:442) ~[com.jcraft.jsch-0.1.54.jar:na] at io.cdap.cdap.common.ssh.DefaultSSHSession.(DefaultSSHSession.java:71) ~[na:na] ... 10 common frames omitted 2020-10-27 14:28:06,719 - DEBUG [runtime-scheduler-2:i.c.c.i.a.r.d.r.SSHRemoteProcessController@101] - Force stopping program run program_run:default.apache-logs-ingest_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.99b4015f-1860-11eb-b3cf-bae7e12abd00 2020-10-27 14:28:06,736 - WARN [runtime-scheduler-2:i.c.c.i.a.r.d.r.RemoteExecutionTwillRunnerService@538] - Force termination of remote process for program_run:default.apache-logs-ingest_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.99b4015f-1860-11eb-b3cf-bae7e12abd00 failed java.io.IOException: com.jcraft.jsch.JSchException: invalid privatekey: [B@2bc057da at io.cdap.cdap.common.ssh.DefaultSSHSession.(DefaultSSHSession.java:88) ~[na:na] at io.cdap.cdap.internal.app.runtime.distributed.remote.SSHRemoteProcessController.killProcess(SSHRemoteProcessController.java:107) ~[na:na] at io.cdap.cdap.internal.app.runtime.distributed.remote.SSHRemoteProcessController.kill(SSHRemoteProcessController.java:102) ~[na:na] at io.cdap.cdap.internal.app.runtime.distributed.remote.RemoteExecutionTwillRunnerService$ControllerFactory.lambda$create$2(RemoteExecutionTwillRunnerService.java:536) ~[na:na] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[na:1.8.0_252] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[na:1.8.0_252] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[na:1.8.0_252] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[na:1.8.0_252] at io.cdap.cdap.internal.app.runtime.distributed.remote.RemoteExecutionTwillRunnerService$ControllerFactory.lambda$create$0(RemoteExecutionTwillRunnerService.java:506) ~[na:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_252] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_252] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_252] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[na:1.8.0_252] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_252] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_252] at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_252] Caused by: com.jcraft.jsch.JSchException: invalid privatekey: [B@2bc057da at com.jcraft.jsch.KeyPair.load(KeyPair.java:664) ~[com.jcraft.jsch-0.1.54.jar:na] at com.jcraft.jsch.IdentityFile.newInstance(IdentityFile.java:46) ~[com.jcraft.jsch-0.1.54.jar:na] at com.jcraft.jsch.JSch.addIdentity(JSch.java:442) ~[com.jcraft.jsch-0.1.54.jar:na] at io.cdap.cdap.common.ssh.DefaultSSHSession.(DefaultSSHSession.java:71) ~[na:na] ... 15 common frames omitted 2020-10-27 14:28:06,742 - INFO [runtime-scheduler-2:i.c.c.i.a.r.d.AbstractTwillProgramController@77] - Twill program terminated: program_run:default.apache-logs-ingest_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.99b4015f-1860-11eb-b3cf-bae7e12abd00, twill runId: 99b4015f-1860-11eb-b3cf-bae7e12abd00 2020-10-27 14:28:06,742 - DEBUG [runtime-scheduler-2:i.c.c.i.a.r.d.AbstractTwillProgramController@84] - Twill program termination status: FAILED 2020-10-27 14:28:06,743 - DEBUG [runtime-scheduler-2:i.c.c.i.a.r.d.DistributedProgramRunner@637] - Cleanup tmp files for program:default.apache-logs-ingest_v1.-SNAPSHOT.workflow.DataPipelineWorkflow: /var/tmp/cdap/data/tmp/1603808877320-0 2020-10-27 14:28:06,743 - DEBUG [pcontroller-program:default.apache-logs-ingest_v1.-SNAPSHOT.workflow.DataPipelineWorkflow-99b4015f-1860-11eb-b3cf-bae7e12abd00:i.c.c.a.r.AbstractProgramRuntimeService@551] - Removing RuntimeInfo: Workflow DataPipelineWorkflow 99b4015f-1860-11eb-b3cf-bae7e12abd00 2020-10-27 14:28:06,743 - DEBUG [pcontroller-program:default.apache-logs-ingest_v1.-SNAPSHOT.workflow.DataPipelineWorkflow-99b4015f-1860-11eb-b3cf-bae7e12abd00:i.c.c.a.r.AbstractProgramRuntimeService@554] - RuntimeInfo removed: RuntimeInfo{programId=program:default.apache-logs-ingest_v1.-SNAPSHOT.workflow.DataPipelineWorkflow, twillRunId=99b4015f-1860-11eb-b3cf-bae7e12abd00} 2020-10-27 14:28:07,325 - DEBUG [provisioning-service-2:i.c.c.i.p.t.ProvisioningTask@121] - Executing DEPROVISION subtask REQUESTING_DELETE for program run program_run:default.apache-logs-ingest_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.99b4015f-1860-11eb-b3cf-bae7e12abd00. 2020-10-27 14:28:07,354 - WARN [provisioning-service-2:i.c.c.r.s.p.r.RemoteHadoopProvisioner@138] - Unable to clean up resources for program DataPipelineWorkflow run 99b4015f-1860-11eb-b3cf-bae7e12abd00 on the remote cluster. The run directory may need to be manually deleted on cluster node 35.233.87.155. java.io.IOException: com.jcraft.jsch.JSchException: invalid privatekey: [B@5f165c7e at io.cdap.cdap.common.ssh.DefaultSSHSession.(DefaultSSHSession.java:88) ~[na:na] at io.cdap.cdap.internal.provision.DefaultSSHContext.createSSHSession(DefaultSSHContext.java:120) ~[na:na] at io.cdap.cdap.runtime.spi.ssh.SSHContext.createSSHSession(SSHContext.java:92) ~[na:na] at io.cdap.cdap.runtime.spi.ssh.SSHContext.createSSHSession(SSHContext.java:80) ~[na:na] at io.cdap.cdap.runtime.spi.provisioner.remote.RemoteHadoopProvisioner.createSSHSession(RemoteHadoopProvisioner.java:80) ~[na:na] at io.cdap.cdap.runtime.spi.provisioner.remote.RemoteHadoopProvisioner.deleteCluster(RemoteHadoopProvisioner.java:133) ~[na:na] at io.cdap.cdap.runtime.spi.provisioner.Provisioner.deleteClusterWithStatus(Provisioner.java:142) [na:na] at io.cdap.cdap.internal.provision.task.ClusterDeleteSubtask.execute(ClusterDeleteSubtask.java:42) [na:na] at io.cdap.cdap.internal.provision.task.ProvisioningSubtask.execute(ProvisioningSubtask.java:54) [na:na] at io.cdap.cdap.internal.provision.task.ProvisioningTask.lambda$executeOnce$0(ProvisioningTask.java:123) [na:na] at io.cdap.cdap.common.service.Retries.callWithRetries(Retries.java:183) ~[na:na] at io.cdap.cdap.common.service.Retries.callWithInterruptibleRetries(Retries.java:257) ~[na:na] at io.cdap.cdap.internal.provision.task.ProvisioningTask.executeOnce(ProvisioningTask.java:123) [na:na] at io.cdap.cdap.internal.provision.ProvisioningService.lambda$null$21(ProvisioningService.java:637) ~[na:na] at io.cdap.cdap.internal.provision.ProvisioningService.callWithProgramLogging(ProvisioningService.java:813) ~[na:na] at io.cdap.cdap.internal.provision.ProvisioningService.lambda$null$22(ProvisioningService.java:635) ~[na:na] at io.cdap.cdap.common.async.KeyedExecutor$2.run(KeyedExecutor.java:99) ~[na:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_252] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_252] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_252] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[na:1.8.0_252] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_252] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_252] at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_252] Caused by: com.jcraft.jsch.JSchException: invalid privatekey: [B@5f165c7e at com.jcraft.jsch.KeyPair.load(KeyPair.java:664) ~[com.jcraft.jsch-0.1.54.jar:na] at com.jcraft.jsch.IdentityFile.newInstance(IdentityFile.java:46) ~[com.jcraft.jsch-0.1.54.jar:na] at com.jcraft.jsch.JSch.addIdentity(JSch.java:442) ~[com.jcraft.jsch-0.1.54.jar:na] at io.cdap.cdap.common.ssh.DefaultSSHSession.(DefaultSSHSession.java:71) ~[na:na] ... 23 common frames omitted 2020-10-27 14:28:07,354 - DEBUG [provisioning-service-2:i.c.c.i.p.t.ProvisioningTask@125] - Completed DEPROVISION subtask REQUESTING_DELETE for program run program_run:default.apache-logs-ingest_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.99b4015f-1860-11eb-b3cf-bae7e12abd00. 2020-10-27 14:28:07,370 - DEBUG [provisioning-service-2:i.c.c.i.p.t.ProvisioningTask@121] - Executing DEPROVISION subtask POLLING_DELETE for program run program_run:default.apache-logs-ingest_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.99b4015f-1860-11eb-b3cf-bae7e12abd00. 2020-10-27 14:28:07,481 - DEBUG [provisioning-service-2:i.c.c.i.p.t.ProvisioningTask@125] - Completed DEPROVISION subtask POLLING_DELETE for program run program_run:default.apache-logs-ingest_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.99b4015f-1860-11eb-b3cf-bae7e12abd00. 2020-10-27 14:28:07,497 - DEBUG [provisioning-service-2:i.c.c.i.p.t.ProvisioningTask@112] - Completed DEPROVISION task for program run program_run:default.apache-logs-ingest_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.99b4015f-1860-11eb-b3cf-bae7e12abd00.
This can be achieved by setting up a new compute profile using Remote Hadoop provisioner under System admin -> Configuration -> System Compute profile -> Create a new Compute profile. This feature is available only on the Enterprise edition of Cloud Data Fusion ("Execution environment selection").
Here are the detailed steps.
SSH Setup on Dataproc Cluster
a. Navigate to Dataproc console on Google Cloud Platform. Go to “Cluster details” by clicking on your Dataproc cluster name.
b. Under “VM Instances”, click on the “SSH“ button to connect to the Dataproc VM.
c. Follow the steps here to create a new SSH key, format the public key file to enforce an expiration time, and add the newly created SSH public key at project or instance level.
d. If the SSH is setup successfully, you should be able to see the SSH key you just added in the Metadata section of your Compute Engine console, as well as the authorized_keys file in your Dataproc VM.
Create a customized system compute profile for your Data Fusion instance
a. Navigate to your Data Fusion instance console by clicking on “View Instance"
b. Click on “System Admin“ on the top right corner.
c. Under “Configuration“ tab, expand “System Compute Profiles”. Click on “Create New Profile“, and choose “Remote Hadoop Provisioner“ on the next page.
d. Fill out the general information for the profile.
e. You can find the SSH host IP information on the “VM instance details“ page under Compute Engine.
f. Copy the SSH private key created in step 1, and paste it to the “SSH Private Key“ field.
g. Click “Create” to create the profile.
Configure your Data Fusion pipeline to use the customized profile
a. Click on the pipeline to run against remote hadoop
b. Click on Configure -> Compute config and choose the remote hadoop provisioner config