Unable to load libhdfs | 易学教程

问题

Trying to use pyarrow to access hdfs file and not able to get it working, below is the code, thank you very much in advance.

[rxie@cedgedev03 code]$ python
Python 2.7.12 |Anaconda 4.2.0 (64-bit)| (default, Jul 2 2016, 17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org

import pyarrow
import os
os.environ["JAVA_HOME"]="/usr/java/jdk1.8.0_121"
from pyarrow import hdfs
fs = hdfs.connect()

Traceback (most recent call last): File "", line 1, in File "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/pyarrow/hdfs.py", line 183, in connect extra_conf=extra_conf) File "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/pyarrow/hdfs.py", line 37, in init self._connect(host, port, user, kerb_ticket, driver, extra_conf) File "pyarrow/io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status pyarrow.lib.ArrowIOError: Unable to load libhdfs

回答1:

You might need to manually locate this file and specify it with the ARROW_LIBHDFS_DIR environmental variable.

Find the file using locate -l 1 libhdfs.so. In my case, the file is located under /opt/mapr/hadoop/hadoop-0.20.2/c++/Linux-amd64-64/lib.

Then, restart your Python REPL with the environment variable ARROW_LIBHDFS_DIR set to this path. In my case, my command looks like:

ARROW_LIBHDFS_DIR=/opt/mapr/hadoop/hadoop-0.20.2/c++/Linux-amd64-64/lib python

This should solve this particular problem.

(Inspired by https://gist.github.com/priancho/357022fbe63fae8b097a563e43dd885b)

来源：https://stackoverflow.com/questions/53027961/unable-to-load-libhdfs

标签

python

pyarrow