spark-shell error : No FileSystem for scheme: wasb

后端 未结 2 1864
無奈伤痛
無奈伤痛 2021-01-02 23:16

We have HDInsight cluster in Azure running, but it doesn\'t allow to spin up edge/gateway node at the time of cluster creation. So I was creating this edge/gateway node by i

相关标签:
2条回答
  • 2021-01-03 00:01

    Hai Ning from Microsoft has written an excellent blog post on to setup wasb on an apache hadoop installation.

    Here is the summary:

    1. Add hadoop-azure-*.jar and azure-storage-*.jar to hadoop classpath

      1.1 Find the jars in your local installation. It's at /usr/hdp/current/hadoop-client folder on HDInsight cluster.

      1.2 Update HADOOP_CLASSPATH variable at hadoop-env.sh. Use exact jar name as java classpath doesn't support partial wildcard.

    2. Update core-site.xml

      <property>         
              <name>fs.AbstractFileSystem.wasb.Impl</name>                           
              <value>org.apache.hadoop.fs.azure.Wasb</value> 
      </property>
      
      <property>
              <name>fs.azure.account.key.my_blob_account_name.blob.core.windows.net</name> 
              <value>my_blob_account_key</value> 
      </property>
      
      <!-- optionally set the default file system to a container --> 
      <property>
              <name>fs.defaultFS</name>          
              <value>wasb://my_container_name@my_blob_account_name.blob.core.windows.net</value>
      </property>
      

    See exact steps here: https://github.com/hning86/articles/blob/master/hadoopAndWasb.md

    0 讨论(0)
  • 2021-01-03 00:18

    Another way of setting Azure Storage (wasb and wasbs files) in spark-shell is:

    1. Copy azure-storage and hadoop-azure jars in the ./jars directory of spark installation.
    2. Run the spark-shell with the parameters —jars [a comma separated list with routes to those jars] Example:

      
      $ bin/spark-shell --master "local[*]" --jars jars/hadoop-azure-2.7.0.jar,jars/azure-storage-2.0.0.jar
      
    3. Add the following lines to the Spark Context:

      
      sc.hadoopConfiguration.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")
      sc.hadoopConfiguration.set("fs.azure.account.key.my_account.blob.core.windows.net", "my_key")
      
    4. Run a simple query:

      
      sc.textFile("wasb://my_container@my_account_host/myfile.txt").count()
      
    5. Enjoy :)

    With this settings you could easily could setup a Spark application, passing the parameters to the 'hadoopConfiguration' on the current Spark Context

    0 讨论(0)
提交回复
热议问题