Unable to load libhdfs

时间秒杀一切 提交于 2021-01-28 01:50:26

问题


Trying to use pyarrow to access hdfs file and not able to get it working, below is the code, thank you very much in advance.

[rxie@cedgedev03 code]$ python
Python 2.7.12 |Anaconda 4.2.0 (64-bit)| (default, Jul 2 2016, 17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org

import pyarrow
import os
os.environ["JAVA_HOME"]="/usr/java/jdk1.8.0_121"
from pyarrow import hdfs
fs = hdfs.connect()

Traceback (most recent call last): File "", line 1, in File "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/pyarrow/hdfs.py", line 183, in connect extra_conf=extra_conf) File "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/pyarrow/hdfs.py", line 37, in init self._connect(host, port, user, kerb_ticket, driver, extra_conf) File "pyarrow/io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status pyarrow.lib.ArrowIOError: Unable to load libhdfs


回答1:


You might need to manually locate this file and specify it with the ARROW_LIBHDFS_DIR environmental variable.

Find the file using locate -l 1 libhdfs.so. In my case, the file is located under /opt/mapr/hadoop/hadoop-0.20.2/c++/Linux-amd64-64/lib.

Then, restart your Python REPL with the environment variable ARROW_LIBHDFS_DIR set to this path. In my case, my command looks like:

ARROW_LIBHDFS_DIR=/opt/mapr/hadoop/hadoop-0.20.2/c++/Linux-amd64-64/lib python

This should solve this particular problem.

(Inspired by https://gist.github.com/priancho/357022fbe63fae8b097a563e43dd885b)



来源:https://stackoverflow.com/questions/53027961/unable-to-load-libhdfs

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!