We have HDInsight cluster in Azure running, but it doesn\'t allow to spin up edge/gateway node at the time of cluster creation. So I was creating this edge/gateway node by i
Hai Ning from Microsoft has written an excellent blog post on to setup wasb on an apache hadoop installation.
Here is the summary:
Add hadoop-azure-*.jar
and azure-storage-*.jar
to hadoop classpath
1.1 Find the jars in your local installation. It's at /usr/hdp/current/hadoop-client folder on HDInsight cluster.
1.2 Update HADOOP_CLASSPATH
variable at hadoop-env.sh
. Use exact jar name as java classpath doesn't support partial wildcard.
Update core-site.xml
<property>
<name>fs.AbstractFileSystem.wasb.Impl</name>
<value>org.apache.hadoop.fs.azure.Wasb</value>
</property>
<property>
<name>fs.azure.account.key.my_blob_account_name.blob.core.windows.net</name>
<value>my_blob_account_key</value>
</property>
<!-- optionally set the default file system to a container -->
<property>
<name>fs.defaultFS</name>
<value>wasb://my_container_name@my_blob_account_name.blob.core.windows.net</value>
</property>
See exact steps here: https://github.com/hning86/articles/blob/master/hadoopAndWasb.md
Another way of setting Azure Storage (wasb and wasbs files) in spark-shell is:
Run the spark-shell with the parameters —jars [a comma separated list with routes to those jars] Example:
$ bin/spark-shell --master "local[*]" --jars jars/hadoop-azure-2.7.0.jar,jars/azure-storage-2.0.0.jar
Add the following lines to the Spark Context:
sc.hadoopConfiguration.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")
sc.hadoopConfiguration.set("fs.azure.account.key.my_account.blob.core.windows.net", "my_key")
Run a simple query:
sc.textFile("wasb://my_container@my_account_host/myfile.txt").count()
With this settings you could easily could setup a Spark application, passing the parameters to the 'hadoopConfiguration' on the current Spark Context