We have HDInsight cluster in Azure running, but it doesn't allow to spin up edge/gateway node at the time of cluster creation. So I was creating this edge/gateway node by installing
echo 'deb http://private-repo-1.hortonworks.com/HDP/ubuntu14/2.x/updates/2.4.2.0 HDP main' >> /etc/apt/sources.list.d/HDP.list
echo 'deb http://private-repo-1.hortonworks.com/HDP-UTILS-1.1.0.20/repos/ubuntu14 HDP-UTILS main' >> /etc/apt/sources.list.d/HDP.list
echo 'deb [arch=amd64] https://apt-mo.trafficmanager.net/repos/azurecore/ trusty main' >> /etc/apt/sources.list.d/azure-public-trusty.list
gpg --keyserver pgp.mit.edu --recv-keys B9733A7A07513CAD
gpg -a --export 07513CAD | apt-key add -
gpg --keyserver pgp.mit.edu --recv-keys B02C46DF417A0893
gpg -a --export 417A0893 | apt-key add -
apt-get -y install openjdk-7-jdk
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
apt-get -y install hadoop hadoop-hdfs hadoop-yarn hadoop-mapreduce hadoop-client openssl libhdfs0 liblzo2-2 liblzo2-dev hadoop-lzo phoenix hive hive-hcatalog tez mysql-connector-java* oozie oozie-client sqoop flume flume-agent spark spark-python spark-worker spark-yarn-shuffle
Then I copied /usr/lib/python2.7/dist-packages/hdinsight_common/
/usr/share/java/
/usr/lib/hdinsight-datalake/
/etc/spark/conf/
/etc/hadoop/conf/
But when I run spark-shell
I get following error
java.io.IOException: No FileSystem for scheme: wasb
Here is the full stack https://gist.github.com/anonymous/ebb6c9d71865c9c8e125aadbbdd6a5bc
I am not sure which package/jar is missing here.
Anyone has any clue what I am doing wrong ?
Thanks
Another way of setting Azure Storage (wasb and wasbs files) in spark-shell is:
- Copy azure-storage and hadoop-azure jars in the ./jars directory of spark installation.
Run the spark-shell with the parameters —jars [a comma separated list with routes to those jars] Example:
$ bin/spark-shell --master "local[*]" --jars jars/hadoop-azure-2.7.0.jar,jars/azure-storage-2.0.0.jar
Add the following lines to the Spark Context:
sc.hadoopConfiguration.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem") sc.hadoopConfiguration.set("fs.azure.account.key.my_account.blob.core.windows.net", "my_key")
Run a simple query:
sc.textFile("wasb://my_container@my_account_host/myfile.txt").count()
- Enjoy :)
With this settings you could easily could setup a Spark application, passing the parameters to the 'hadoopConfiguration' on the current Spark Context
Hai Ning from Microsoft has written an excellent blog post on to setup wasb on an apache hadoop installation.
Here is the summary:
Add
hadoop-azure-*.jar
andazure-storage-*.jar
to hadoop classpath1.1 Find the jars in your local installation. It's at /usr/hdp/current/hadoop-client folder on HDInsight cluster.
1.2 Update
HADOOP_CLASSPATH
variable athadoop-env.sh
. Use exact jar name as java classpath doesn't support partial wildcard.Update core-site.xml
<property> <name>fs.AbstractFileSystem.wasb.Impl</name> <value>org.apache.hadoop.fs.azure.Wasb</value> </property> <property> <name>fs.azure.account.key.my_blob_account_name.blob.core.windows.net</name> <value>my_blob_account_key</value> </property> <!-- optionally set the default file system to a container --> <property> <name>fs.defaultFS</name> <value>wasb://my_container_name@my_blob_account_name.blob.core.windows.net</value> </property>
See exact steps here: https://github.com/hning86/articles/blob/master/hadoopAndWasb.md
来源:https://stackoverflow.com/questions/38254771/spark-shell-error-no-filesystem-for-scheme-wasb