CDAP Source plugin to read data from Sftp server

主宰稳场 提交于 2019-12-24 06:26:35

问题


I want to read a csv file that is available to Sftp server by using a cdap source plugin.

I came across FTP Batch Source plugin that does the same. But when running this i am getting below exception.

Caused by: java.io.IOException: No FileSystem for scheme: sftp
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2798) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2809) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
    at co.cask.hydrator.format.plugin.AbstractFileSource.prepareRun(AbstractFileSource.java:129) ~[na:na]
    at co.cask.hydrator.format.plugin.AbstractFileSource.prepareRun(AbstractFileSource.java:63) ~[na:na]
    at co.cask.cdap.etl.common.plugin.WrappedBatchSource$1.call(WrappedBatchSource.java:53) ~[na:na]
    at co.cask.cdap.etl.common.plugin.WrappedBatchSource$1.call(WrappedBatchSource.java:50) ~[na:na]
    at co.cask.cdap.etl.common.plugin.Caller$1.call(Caller.java:30) ~[na:na]
    at co.cask.cdap.etl.common.plugin.StageLoggingCaller.call(StageLoggingCaller.java:40) ~[na:na]
    at co.cask.cdap.etl.common.plugin.WrappedBatchSource.prepareRun(WrappedBatchSource.java:50) ~[na:na]
    at co.cask.cdap.etl.common.plugin.WrappedBatchSource.prepareRun(WrappedBatchSource.java:36) ~[na:na]
    at co.cask.cdap.etl.common.plugin.WrappedBatchSource$1.call(WrappedBatchSource.java:53) ~[na:na]
    at co.cask.cdap.etl.common.plugin.WrappedBatchSource$1.call(WrappedBatchSource.java:50) ~[na:na]
    at co.cask.cdap.etl.common.plugin.Caller$1.call(Caller.java:30) ~[na:na]
    at co.cask.cdap.etl.common.plugin.StageLoggingCaller.call(StageLoggingCaller.java:40) ~[na:na]
    at co.cask.cdap.etl.common.plugin.WrappedBatchSource.prepareRun(WrappedBatchSource.java:50) ~[na:na]
    at co.cask.cdap.etl.common.plugin.WrappedBatchSource.prepareRun(WrappedBatchSource.java:36) ~[na:na]
    at co.cask.cdap.etl.common.submit.SubmitterPlugin$3.run(SubmitterPlugin.java:83) ~[na:na]
    at co.cask.cdap.internal.app.runtime.AbstractContext$2.run(AbstractContext.java:534) ~[na:na]
    at co.cask.cdap.data2.transaction.Transactions$CacheBasedTransactional.finishExecute(Transactions.java:224) ~[na:na]
    ... 18 common frames omitted

I am using below version of libraries which is also a ristriction.

  1. Hadoop - 2.7.3
  2. Spark - 2.3.0

I also came across this question which suggest using this and setting proeprty fs.sftp.impl to org.apache.hadoop.fs.sftp.SFTPFileSystem will solve the issue but not sure how use above code and set this proeprty.


回答1:


You need to set a file system properties under the Advanced section when using SFTP as the protocol:

{
  "fs.sftp.impl": "org.apache.hadoop.fs.sftp.SFTPFileSystem"
}


来源:https://stackoverflow.com/questions/58688246/cdap-source-plugin-to-read-data-from-sftp-server

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!