MapR

Hiveserver2: Failed to create/change scratchdir permissions to 777: Could not create FileClient

扶醉桌前 提交于 2019-12-24 21:39:49
问题 I'm running a MapR Community Edition Hadoop cluster (M3). Unfortunately, the HiveServer2 service crashes and, according the log file in /opt/mapr/hive/hive-0.13/logs/mapr/hive.log , there's a problem with permissions on the scratch directory: 2015-02-24 21:21:08,187 WARN [main]: server.HiveServer2 (HiveServer2.java:init(74)) - Failed to create/change scratchdir permissions to 777: Could not create FileClient java.io.IOException: Could not create FileClient I checked the settings for the

How to use .jar in a pig file

安稳与你 提交于 2019-12-24 02:38:05
问题 I have two input files smt.txt and smo.txt. The jar file reads the text files and split the data according to some rule which is described in java file. And the pig file takes these data put into output files with doing mapreduce. register 'maprfs:///user/username/fl.jar'; DEFINE FixedLoader fl(); mt = load 'maprfs:///user/username/smt.txt' using FixedLoader('-30','30-33',...........) AS (.........); mo = load 'maprfs:///user/username/smo.txt*' using FixedLoader('-30','30-33',.....) AS (.....

Spark and Hive table schema out of sync after external overwrite

回眸只為那壹抹淺笑 提交于 2019-12-20 20:31:11
问题 I'm am having issues with the schema for Hive tables being out of sync between Spark and Hive on a Mapr cluster with Spark 2.1.0 and Hive 2.1.1. I need to try to resolve this problem specifically for managed tables, but the issue can be reproduced with unmanaged/external tables. Overview of Steps Use saveAsTable to save a dataframe to a given table. Use mode("overwrite").parquet("path/to/table") to overwrite the data for the previously saved table. I am actually modifying the data through a

pip install pandas couldn't find any downloads that satisfy the requirement pandas

不羁的心 提交于 2019-12-14 04:20:17
问题 while I'm trying to install pandas , getting below error . can you please suggest me to get solved ? [mapr@csdssqwqasw22 ~]$ pip install pandas Downloading/unpacking pandas Cannot fetch index base URL https://pypi.python.org/simple/ Could not find any downloads that satisfy the requirement pandas Cleaning up... No distributions at all found for pandas Storing debug log for failure in /home/mapr/.pip/pip.log 来源: https://stackoverflow.com/questions/28577947/pip-install-pandas-couldnt-find-any

MAPR -File Read and Write Process

耗尽温柔 提交于 2019-12-13 02:35:13
问题 I am not able to find a specific link that explains to me how the meta data is distributed in MAPR(File meta data). When I look at cloudera / hortonworks /apache hadoop I know the meta data is stored in namenode's memory which is then fetched to locate the nodes that holds the blocks. How does it work in MAPR is what I am trying to understand. Any help would be greatly appreciated. 回答1: MapR natively implemented a Network File System (NFS) interface to MapR-FS so that any reads and writes

Failed to load class for data source: Libsvm in spark ML pyspark/scala

為{幸葍}努か 提交于 2019-12-12 04:16:32
问题 When I try to import a libsvm file in pyspark/scala using "sqlContext.read.format("libsvm").load" , I get the following error - "Failed to load class for data source: Libsvm." At the same time, if I use "MLUtils.loadLibSVMFile" it works perfectly fine. I need to use both Spark ML (to get class probabilities) and MLlib for an evaluation. Have attached the error screenshot. This is a MapR cluster. Spark version 1.5.2 Error 回答1: libsvm source format is available since version 1.6 of Spark. 回答2:

what is the easiest way to get started with hadoop on ec2 preferably mapr

余生颓废 提交于 2019-12-12 02:16:03
问题 I want to get a small working mapr hadoop instance up on ec2 so i can play around with it and begin to learn more about it. How would I proceed? The mapr site (1) mentions starting with vmplayer (2). So, does one install vmplayer on an ec2 AMI and than install mapr or are there AMI's available with vmplayer already installed and/or with vmplayer+mapr already installed? (1) http://mapr.com/download (2) https://my.vmware.com/web/vmware/downloads 回答1: This question is also answered at http:/

maprsteam with spring integration java client

余生颓废 提交于 2019-12-11 16:30:15
问题 I am looking for a solution to use maprstream with spring integration. I could able to create the stream and topic and also could able to consume/publish messages using stream:topic combination. Used the Kafka client by referring the link: But struggling to consume/publish message using Spring integration and couldn't see any sample programs explaining the same. Can someone please help me on this? 回答1: spring-integration-kafka 2.0 (via spring-kafka 1.0) uses the 0.9 Kafka API. See the

Store documents (.pdf, .doc and .txt files) in MaprDB

会有一股神秘感。 提交于 2019-12-11 08:25:11
问题 I need to store documents such as .pdf, .doc and .txt files to MaprDB. I saw one example in Hbase where it stores files in binary and is retrieved as files in Hue, but I not sure how it could be implemented. Any idea how can a document be stored in MaprDB? 回答1: First thing is , Im not aware about Maprdb as Im using Cloudera. But I have experience in hbase storing many types of objects in hbase as byte array like below mentioned. Most primitive way of storing in hbase or any other db is byte

spark Yarn mode how to get applicationId from spark-submit

那年仲夏 提交于 2019-12-08 02:04:25
问题 When I submit spark job using spark-submit with master yarn and deploy-mode cluster, it doesn't print/return any applicationId and once job is completed I have to manually check MapReduce jobHistory or spark HistoryServer to get the job details. My cluster is used by many users and it takes lot of time to spot my job in jobHistory/HistoryServer. is there any way to configure spark-submit to return the applicationId? Note: I found many similar questions but their solutions retrieve