问题
I want to read a csv file that is available to Sftp server by using a cdap source plugin.
I came across FTP Batch Source plugin that does the same. But when running this i am getting below exception.
Caused by: java.io.IOException: No FileSystem for scheme: sftp
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2798) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2809) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
at co.cask.hydrator.format.plugin.AbstractFileSource.prepareRun(AbstractFileSource.java:129) ~[na:na]
at co.cask.hydrator.format.plugin.AbstractFileSource.prepareRun(AbstractFileSource.java:63) ~[na:na]
at co.cask.cdap.etl.common.plugin.WrappedBatchSource$1.call(WrappedBatchSource.java:53) ~[na:na]
at co.cask.cdap.etl.common.plugin.WrappedBatchSource$1.call(WrappedBatchSource.java:50) ~[na:na]
at co.cask.cdap.etl.common.plugin.Caller$1.call(Caller.java:30) ~[na:na]
at co.cask.cdap.etl.common.plugin.StageLoggingCaller.call(StageLoggingCaller.java:40) ~[na:na]
at co.cask.cdap.etl.common.plugin.WrappedBatchSource.prepareRun(WrappedBatchSource.java:50) ~[na:na]
at co.cask.cdap.etl.common.plugin.WrappedBatchSource.prepareRun(WrappedBatchSource.java:36) ~[na:na]
at co.cask.cdap.etl.common.plugin.WrappedBatchSource$1.call(WrappedBatchSource.java:53) ~[na:na]
at co.cask.cdap.etl.common.plugin.WrappedBatchSource$1.call(WrappedBatchSource.java:50) ~[na:na]
at co.cask.cdap.etl.common.plugin.Caller$1.call(Caller.java:30) ~[na:na]
at co.cask.cdap.etl.common.plugin.StageLoggingCaller.call(StageLoggingCaller.java:40) ~[na:na]
at co.cask.cdap.etl.common.plugin.WrappedBatchSource.prepareRun(WrappedBatchSource.java:50) ~[na:na]
at co.cask.cdap.etl.common.plugin.WrappedBatchSource.prepareRun(WrappedBatchSource.java:36) ~[na:na]
at co.cask.cdap.etl.common.submit.SubmitterPlugin$3.run(SubmitterPlugin.java:83) ~[na:na]
at co.cask.cdap.internal.app.runtime.AbstractContext$2.run(AbstractContext.java:534) ~[na:na]
at co.cask.cdap.data2.transaction.Transactions$CacheBasedTransactional.finishExecute(Transactions.java:224) ~[na:na]
... 18 common frames omitted
I am using below version of libraries which is also a ristriction.
Hadoop - 2.7.3
Spark - 2.3.0
I also came across this question which suggest using this and setting proeprty fs.sftp.impl
to org.apache.hadoop.fs.sftp.SFTPFileSystem
will solve the issue but not sure how use above code and set this proeprty.
回答1:
You need to set a file system properties under the Advanced section when using SFTP as the protocol:
{
"fs.sftp.impl": "org.apache.hadoop.fs.sftp.SFTPFileSystem"
}
来源:https://stackoverflow.com/questions/58688246/cdap-source-plugin-to-read-data-from-sftp-server