hdinsight

AzureException: Unable to access container using anonymous credentials, and no credentials found for them in the configuration

廉价感情. 提交于 2019-12-04 07:17:30
I am trying to use Hadoop of Azure HDInsight. I am logging into the cluster by ssh and running the following hadoop jar jar_name class_name wasb://container@storagename.core.windows.net/inputdir wasb://container@storagename.core.windows.net/outputdir But I get the following exception: Exception in thread "main" org.apache.hadoop.fs.azure.AzureException: org.apache.hadoop.fs.azure.AzureException: Unable to access container xxx in account yyy.core.windows.net using anonymous credentials, and no credentials found for them in the configuration. I am using azure cli and I ran "azure login" before

ConcurrentModificationException when using Spark collectionAccumulator

对着背影说爱祢 提交于 2019-12-04 03:36:48
I'm trying to run a Spark-based application on an Azure HDInsight on-demand cluster, and am seeing lots of SparkExceptions (caused by ConcurrentModificationExceptions) being logged. The application runs without these errors when I start a local Spark instance. I've seen reports of similar errors when using accumulators and my code is indeed using a CollectionAccumulator, however I have placed synchronized blocks everywhere I use it, and it makes no difference. The accumulator-related code looks like this: class MySparkClass(sc : SparkContext) { val myAccumulator = sc.collectionAccumulator

How to connect Hive to asp.net project

拈花ヽ惹草 提交于 2019-12-01 13:32:22
Hi I'm very new to Hadoop. I have installed Microsoft HDInsight to my local system. Now I want to connect to hive and HBase but for HIVE connection I have to specify Connection string, port, username, password. But I'm not able to figure out how I will get this value. I have tried with localhost and 8085 as a port but this doesn't work. I also done it by giving localhost IP and my system IP too. Please help with this and let me know how i should proceed for HBase connectivity Your best bet is probably to use Microsoft's Hive SDK (also available on nuget as Microsoft.Hadoop.Hive) There is a

spark-shell error : No FileSystem for scheme: wasb

怎甘沉沦 提交于 2019-11-30 15:57:07
We have HDInsight cluster in Azure running, but it doesn't allow to spin up edge/gateway node at the time of cluster creation. So I was creating this edge/gateway node by installing echo 'deb http://private-repo-1.hortonworks.com/HDP/ubuntu14/2.x/updates/2.4.2.0 HDP main' >> /etc/apt/sources.list.d/HDP.list echo 'deb http://private-repo-1.hortonworks.com/HDP-UTILS-1.1.0.20/repos/ubuntu14 HDP-UTILS main' >> /etc/apt/sources.list.d/HDP.list echo 'deb [arch=amd64] https://apt-mo.trafficmanager.net/repos/azurecore/ trusty main' >> /etc/apt/sources.list.d/azure-public-trusty.list gpg --keyserver

How to load CSVs with timestamps in custom format?

与世无争的帅哥 提交于 2019-11-30 08:33:48
问题 I have a timestamp field in a csv file that I load to a dataframe using spark csv library. The same piece of code works on my local machine with Spark 2.0 version but throws an error on Azure Hortonworks HDP 3.5 and 3.6. I have checked and Azure HDInsight 3.5 is also using the same Spark version so I don't think it's a problem with Spark version. import org.apache.spark.sql.types._ val sourceFile = "C:\\2017\\datetest" val sourceSchemaStruct = new StructType() .add("EventDate",DataTypes

Remotely execute a Spark job on an HDInsight cluster

余生长醉 提交于 2019-11-29 21:43:59
问题 I am trying to automatically launch a Spark job on an HDInsight cluster from Microsoft Azure . I am aware that several methods exist to automate Hadoop job submission (provided by Azure itself), but so far I have not been able to found a way to remotely run a Spark job withouth setting a RDP with the master instance. Is there any way to achieve this? 回答1: Spark-jobserver provides a RESTful interface for submitting and managing Apache Spark jobs, jars, and job contexts. https://github.com

Differences between Azure Block Blob and Page Blob?

余生颓废 提交于 2019-11-28 16:30:38
As I recently started mingling around with Windows Azure , I've came up to a situation where, which one to go for between the Block Blob & Page Blob . I'm currently in progress of uploading some text, csv or dat files to a blob storage and then do a MapReduce program for it using my C# program. Yes I've gone through some articles such as article1 , article2 . But couldn't get a clear idea from them. To cut short, Block Blob vs Page Blob . Any help would be appreciated. The differences are very-well documented on msdn, here . TL;DR: Block blobs are for your discrete storage objects like jpg's,

Differences between Azure Block Blob and Page Blob?

此生再无相见时 提交于 2019-11-27 09:46:29
问题 As I recently started mingling around with Windows Azure , I've came up to a situation where, which one to go for between the Block Blob & Page Blob . I'm currently in progress of uploading some text, csv or dat files to a blob storage and then do a MapReduce program for it using my C# program. Yes I've gone through some articles such as article1, article2. But couldn't get a clear idea from them. To cut short, Block Blob vs Page Blob . Any help would be appreciated. 回答1: The differences are