问题
We have been trying to create a simple Hive UDF to mask some fields in a Hive Table. We are using an external file (placed on HDFS) to grab a piece of text to make a salting to the masking process. It seems we are doing everything ok but when we tried to create the external function it throws the error:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.FunctionTask. Could not initialize class co.company.Mask
This is our code for the UDF:
package co.company;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import org.apache.commons.codec.digest.DigestUtils;
@Description(
name = "masker",
value = "_FUNC_(str) - mask a string",
extended = "Example: \n" +
" SELECT masker(column) FROM hive_table; "
)
public class Mask extends UDF {
private static final String arch_clave = "/user/username/filename.dat";
private static String clave = null;
public static String getFirstLine( String arch ) {
try {
FileSystem fs = FileSystem.get(new Configuration());
FSDataInputStream in = fs.open(new Path(arch));
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String ret = br.readLine();
br.close();
return ret;
} catch (Exception e) {
System.out.println("out: Error Message: " + arch + " exc: " + e.getMessage());
return null;
}
}
public Text evaluate(Text s) {
clave = getFirstLine( arch_clave );
Text to_value = new Text( DigestUtils.shaHex( s + clave) );
return to_value;
}
}
We are uploading the jar file and creating the UDF through HUE's interface (Sadly, we don't have yet console access to the Hadoop cluster.
On Hue's Hive Interface, our commands are:
add jar hdfs:///user/my_username/myJar.jar
And then to create the Function we execute:
CREATE TEMPORARY FUNCTION masker as 'co.company.Mask';
Sadly the error thrown when we tried to create the UDF is not very helpful. This is the log for the creation of the UDF. Any Help is greatly appreciated. Thank you very much.
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO parse.ParseDriver: Parsing command: CREATE TEMPORARY FUNCTION enmascarar as 'co.bancolombia.analitica.Enmascarar'
14/12/10 08:32:15 INFO parse.ParseDriver: Parse Completed
14/12/10 08:32:15 INFO log.PerfLogger: </PERFLOG method=parse start=1418218335753 end=1418218335754 duration=1 from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO parse.FunctionSemanticAnalyzer: analyze done
14/12/10 08:32:15 INFO ql.Driver: Semantic Analysis Completed
14/12/10 08:32:15 INFO log.PerfLogger: </PERFLOG method=semanticAnalyze start=1418218335754 end=1418218335757 duration=3 from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
14/12/10 08:32:15 INFO log.PerfLogger: </PERFLOG method=compile start=1418218335753 end=1418218335757 duration=4 from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=acquireReadWriteLocks from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO lockmgr.DummyTxnManager: Creating lock manager of type org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager
14/12/10 08:32:15 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=server1.domain:2181,server2.domain.corp:2181,server3.domain:2181 sessionTimeout=600000 watcher=org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager$DummyWatcher@2ebe4e81
14/12/10 08:32:15 INFO log.PerfLogger: </PERFLOG method=acquireReadWriteLocks start=1418218335760 end=1418218335797 duration=37 from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO ql.Driver: Starting command: CREATE TEMPORARY FUNCTION enmascarar as 'co.company.Mask'
14/12/10 08:32:15 INFO log.PerfLogger: </PERFLOG method=TimeToSubmit start=1418218335760 end=1418218335798 duration=38 from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=task.FUNCTION.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 ERROR ql.Driver: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.FunctionTask. Could not initialize class co.company.MasK
14/12/10 08:32:15 INFO log.PerfLogger: </PERFLOG method=Driver.execute start=1418218335797 end=1418218335800 duration=3 from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO ZooKeeperHiveLockManager: about to release lock for default
14/12/10 08:32:15 INFO ZooKeeperHiveLockManager: about to release lock for colaboradores
14/12/10 08:32:15 INFO log.PerfLogger: </PERFLOG method=releaseLocks start=1418218335800 end=1418218335822 duration=22 from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 ERROR operation.Operation: Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.FunctionTask. Could not initialize class co.company.Mask
at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:147)
at org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69)
at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:213)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
回答1:
This issue was solved but it wasn't related to the code. The code above is fine to read a file in HDFS from a HIVE UDF (Awufully inneficient because it reads the file each time the evaluation function is called, buth it manages to read the file).
It turns out that When creating a Hive UDF through HUE, you upload the jar and then you create the function. However, if you changed your function and reuploaded the jar, it still maintained the previous definition of the function.
We defined the same UDF class in another packagein the jar, droped the original function in HIVE and created again the function (with the new class) through HUE:
add jar hdfs:///user/my_username/myJar2.jar;
drop function if exists masker;
create temporary function masker as 'co.company.otherpackage.Mask';
It seems a bug report is needed for HIVE (or HUE?, Thrift?), I still need to understand better which part of the system is at fault.
I hope it helps someone in the future.
回答2:
This will not work because new Configuration() will be initialized by default with core-default.xml and core-site.xml, see sources.
In the same time, you may (and should) have hdfs-site.xml etc.
Unfortunately I didn't found the reliable way to get Configuration on HiveUDF and this is long story why.
In general, IMHO, you have to use next approaches, one-by-one:
- public void configure(MapredContext context) on your UDF, nevertheless it may not be invoked due to defect with vectorization and/or use of other than MR engines or local execution (... limit 5 will trigger the issue) etc.
- SessionState.get().getConf() if SessionState.get() is not null
- Initialize Configuration and add more resources than default (see list in Configuration sources)
Use RHive approach and load all .xml from Hadoop configuration (FSUtils.java)
public static Configuration getConf() throws IOException{ if(conf != null) return conf; conf = new Configuration(); String hadoopConfPath = System.getProperty("HADOOP_CONF_DIR"); if (StringUtils.isNotEmpty(hadoopConfPath)) { File dir = new File(hadoopConfPath); if (!dir.exists() || !dir.isDirectory()) { return conf; } File[] files = dir.listFiles( new FilenameFilter() { public boolean accept(File dir, String name) { return name.endsWith("xml"); } } ); for (File file : files) { try { URL url = new URL("file://" + file.getCanonicalPath()); conf.addResource(url); } catch (Exception e) { } } } return conf;
}
So, here is complete solution
In UDF add setters
public abstract class BasicUDF extends GenericUDF implements Configurable {
/**
* Invocation context
*/
private MapredContext mapReduceContext = null;
/**
* Hadoop Configuration
*/
private Configuration hadoopConfiguration = null;
/**
Save MR context, if arrived
*/
@Override
public void configure(MapredContext context) {
if (context != null) {
this.mapReduceContext = context;
this.propertyReader.addHiveConfigurationSource(context);
this.resourceFinder.addHiveJobConfiguration(context.getJobConf());
log.debug("Non-empty MapredContext arrived");
} else {
log.error("Empty MapredContext arrived");
}
}
/**
* Save Hadoop configuration, if arrived
*/
@Override
public void setConf(Configuration conf) {
this.hadoopConfiguration = conf;
this.propertyReader.addHadoopConfigurationSource(conf);
this.resourceFinder.addHadoopConfigurationSource(conf);
}
And then, where you need configuration
public Configuration findConfiguration() {
if (hiveJobConfiguration != null) {
log.debug("Starting with hiveJobConfiguration");
return hiveJobConfiguration;
}
if (SessionState.get() != null && SessionState.get().getConf() != null) {
log.debug("Starting with SessionState configuration");
return SessionState.get().getConf();
}
if (hadoopConfiguration != null) {
log.debug("Starting with hadoopConfiguration");
return hadoopConfiguration;
}
log.debug("No existing configuration found, falling back to manually initialized");
return createNewConfiguration();
}
private Configuration createNewConfiguration() {
// load defaults, "core-default.xml" and "core-site.xml"
Configuration configuration = new Configuration();
// load expected configuration, mapred-site.xml, mapred-default.xml, hdfs-site.xml hdfs-default.xml
configuration.addResource("mapred-default.xml");
configuration.addResource("mapred-site.xml");
configuration.addResource("hdfs-default.xml");
configuration.addResource("hdfs-site.xml");
// load Hadoop configuration from FS if any and if requested
if (fallbackReadHadoopFilesFromFS) {
log.debug("Configured manual read of Hadoop configuration from FS");
try {
addFSHadoopConfiguration(configuration);
} catch (RuntimeException re) {
log.error("Reading of Hadoop configuration from FS failed", re);
}
}
return configuration;
}
@edu.umd.cs.findbugs.annotations.SuppressFBWarnings(
value = {"REC_CATCH_EXCEPTION", "SIC_INNER_SHOULD_BE_STATIC_ANON"},
justification = "Findbugs bug, missed IOException from file.getCanonicalPath(); DOn't like idea with static anon"
)
private void addFSHadoopConfiguration(Configuration configuration) {
log.debug("Started addFSHadoopConfiguration to load configuration from FS");
String hadoopConfPath = System.getProperty("HADOOP_CONF_DIR");
if (StringUtils.isEmpty(hadoopConfPath)) {
log.error("HADOOP_CONF_DIR is not set, skipping FS load in addFSHadoopConfiguration");
return;
} else {
log.debug("Found configuration dir, it points to " + hadoopConfPath);
}
File dir = new File(hadoopConfPath);
if (!dir.exists() || !dir.isDirectory()) {
log.error("HADOOP_CONF_DIR points to invalid place " + hadoopConfPath);
return;
}
File[] files = dir.listFiles(
new FilenameFilter() {
public boolean accept(File dir, String name) {
return name.endsWith("xml");
}
}
);
if (files == null) {
log.error("Configuration dir does not denote a directory, or if an I/O error occured. Dir used " + hadoopConfPath);
return;
}
for (File file : files) {
try {
URL url = new URL("file://" + file.getCanonicalPath());
configuration.addResource(url);
} catch (Exception e) {
log.error("Failed to open configuration file " + file.getPath(), e);
}
}
}
Works like a charm
来源:https://stackoverflow.com/questions/27402442/read-an-hdfs-file-from-a-hive-udf-execution-error-return-code-101-functiontas