HiveServer2 generate a lot of directories in hdfs /tmp/hive/hive

限于喜欢 提交于 2019-12-13 03:00:21

问题


We create new claster with Hiveserver2 (on Hortonworks HDP2.2 distribution). After some time we have more than 1048576 directories in /tmp/hive/hive on hdfs, because hive server generates it in this location.

Someone has got similar problem? Logs from hiveserver:

2015-08-31 06:48:15,828 WARN  [HiveServer2-Handler-Pool: Thread-1104]: conf.HiveConf (HiveConf.java:initialize(2499)) - HiveConf of name hive.heapsize does not exist
2015-08-31 06:48:15,829 WARN  [HiveServer2-Handler-Pool: Thread-1104]: conf.HiveConf (HiveConf.java:initialize(2499)) - HiveConf of name hive.server2.enable.impersonation does not exist
2015-08-31 06:48:15,829 WARN  [HiveServer2-Handler-Pool: Thread-1104]: conf.HiveConf (HiveConf.java:initialize(2499)) - HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist
2015-08-31 06:48:15,833 INFO  [HiveServer2-Handler-Pool: Thread-1104]: thrift.ThriftCLIService (ThriftCLIService.java:OpenSession(232)) - Client protocol version: HIVE_CLI_SERVICE_PROTOCOL_V6
2015-08-31 06:48:15,835 INFO  [HiveServer2-Handler-Pool: Thread-1104]: session.SessionState (SessionState.java:createPath(558)) - Created local directory: /tmp/ffd9e5e7-7a4e-472e-b5f1-9c7f8acb0bff_resources
2015-08-31 06:48:15,883 INFO  [HiveServer2-Handler-Pool: Thread-1104]: session.SessionState (SessionState.java:createPath(558)) - Created HDFS directory: /tmp/hive/hive/ffd9e5e7-7a4e-472e-b5f1-9c7f8acb0bff
2015-08-31 06:48:15,884 INFO  [HiveServer2-Handler-Pool: Thread-1104]: session.SessionState (SessionState.java:createPath(558)) - Created local directory: /tmp/hive/ffd9e5e7-7a4e-472e-b5f1-9c7f8acb0bff
2015-08-31 06:48:16,064 INFO  [HiveServer2-Handler-Pool: Thread-1104]: session.SessionState (SessionState.java:createPath(558)) - Created HDFS directory: /tmp/hive/hive/ffd9e5e7-7a4e-472e-b5f1-9c7f8acb0bff/_tmp_space.db
2015-08-31 06:48:16,065 INFO  [HiveServer2-Handler-Pool: Thread-1104]: session.SessionState (SessionState.java:start(460)) - No Tez session required at this point. hive.execution.engine=mr.

Hiveserver method when create session:

 /**
   * Create dirs & session paths for this session:
   * 1. HDFS scratch dir
   * 2. Local scratch dir
   * 3. Local downloaded resource dir
   * 4. HDFS session path
   * 5. Local session path
   * 6. HDFS temp table space
   * @param userName
   * @throws IOException
   */
  private void createSessionDirs(String userName) throws IOException {
    HiveConf conf = getConf();
    Path rootHDFSDirPath = createRootHDFSDir(conf);
    // Now create session specific dirs
    String scratchDirPermission = HiveConf.getVar(conf, HiveConf.ConfVars.SCRATCHDIRPERMISSION);
    Path path;
    // 1. HDFS scratch dir
    path = new Path(rootHDFSDirPath, userName);
    hdfsScratchDirURIString = path.toUri().toString();
    createPath(conf, path, scratchDirPermission, false, false);
    // 2. Local scratch dir
    path = new Path(HiveConf.getVar(conf, HiveConf.ConfVars.LOCALSCRATCHDIR));
    createPath(conf, path, scratchDirPermission, true, false);
    // 3. Download resources dir
    path = new Path(HiveConf.getVar(conf, HiveConf.ConfVars.DOWNLOADED_RESOURCES_DIR));
    createPath(conf, path, scratchDirPermission, true, false);
    // Finally, create session paths for this session
    // Local & non-local tmp location is configurable. however it is the same across
    // all external file systems
    String sessionId = getSessionId();
    // 4. HDFS session path
    hdfsSessionPath = new Path(hdfsScratchDirURIString, sessionId);
    createPath(conf, hdfsSessionPath, scratchDirPermission, false, true);
    conf.set(HDFS_SESSION_PATH_KEY, hdfsSessionPath.toUri().toString());
    // 5. Local session path
    localSessionPath = new Path(HiveConf.getVar(conf, HiveConf.ConfVars.LOCALSCRATCHDIR), sessionId);
    createPath(conf, localSessionPath, scratchDirPermission, true, true);
    conf.set(LOCAL_SESSION_PATH_KEY, localSessionPath.toUri().toString());
    // 6. HDFS temp table space
    hdfsTmpTableSpace = new Path(hdfsSessionPath, TMP_PREFIX);
    createPath(conf, hdfsTmpTableSpace, scratchDirPermission, false, true);
    conf.set(TMP_TABLE_SPACE_KEY, hdfsTmpTableSpace.toUri().toString());
  }

回答1:


We face similar kind of issue earlier. Manily hive uses temporary folders both on the machine running the Hive client and the default HDFS instance. These folders are used to store per-query temporary/intermediate data sets and are normally cleaned up by the hive client when the query is finished. However, in cases of abnormal hive client termination, some data may be left behind. The configuration details are as follows:

On the HDFS cluster this is set to /tmp/hive- by default and is controlled by the configuration variable hive.exec.scratchdir On the client machine, this is hardcoded to /tmp/ Note that when writing data to a table/partition, Hive will first write to a temporary location on the target table's filesystem (using hive.exec.scratchdir as the temporary location) and then move the data to the target table. This applies in all cases - whether tables are stored in HDFS (normal case) or in file systems like S3 or even NFS.

Source

So you can use Manual Script or Job to clean the temp Location, with regular interval or you can cron a shell script with cleaning 30 or 60 days Data




回答2:


Here has the answer. https://issues.apache.org/jira/browse/HIVE-15068

Higher version of Hive solve this problem. For lower version, well, write a cron job will work



来源:https://stackoverflow.com/questions/32306404/hiveserver2-generate-a-lot-of-directories-in-hdfs-tmp-hive-hive

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!