问题
Trying to use pyarrow to access hdfs file and not able to get it working, below is the code, thank you very much in advance.
[rxie@cedgedev03 code]$ python
Python 2.7.12 |Anaconda 4.2.0 (64-bit)| (default, Jul 2 2016, 17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
import pyarrow
import os
os.environ["JAVA_HOME"]="/usr/java/jdk1.8.0_121"
from pyarrow import hdfs
fs = hdfs.connect()
Traceback (most recent call last): File "", line 1, in File "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/pyarrow/hdfs.py", line 183, in connect extra_conf=extra_conf) File "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/pyarrow/hdfs.py", line 37, in init self._connect(host, port, user, kerb_ticket, driver, extra_conf) File "pyarrow/io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status pyarrow.lib.ArrowIOError: Unable to load libhdfs
回答1:
You might need to manually locate this file and specify it with the ARROW_LIBHDFS_DIR
environmental variable.
Find the file using locate -l 1 libhdfs.so
. In my case, the file is located under /opt/mapr/hadoop/hadoop-0.20.2/c++/Linux-amd64-64/lib
.
Then, restart your Python REPL with the environment variable ARROW_LIBHDFS_DIR
set to this path. In my case, my command looks like:
ARROW_LIBHDFS_DIR=/opt/mapr/hadoop/hadoop-0.20.2/c++/Linux-amd64-64/lib python
This should solve this particular problem.
(Inspired by https://gist.github.com/priancho/357022fbe63fae8b097a563e43dd885b)
来源:https://stackoverflow.com/questions/53027961/unable-to-load-libhdfs