Connect to Impala using impyla client with Kerberos auth

后端 未结 7 1249
栀梦
栀梦 2021-01-01 03:23

I\'m on a W8 machine, where I use Python (Anaconda distribution) to connect to Impala in our Hadoop cluster using the

相关标签:
7条回答
  • 2021-01-01 03:50

    Try this to get tables for kerberized cluster. In my case CDH-5.14.2-1.

    Make sure you have a valid ticket before running this code.

    with python 2.7 having below packages.

    thrift-0.9.3
    thriftpy-0.3.8
    thrift_sasl-0.3.0
    impyla==0.14.2.2
    

    Working Code

    from impala.dbapi import connect
    from impala.util import as_pandas
    
    # 21000 is impala daemon port.
    conn = connect(host='yourHost', port=21050, auth_mechanism='GSSAPI') 
    
    cursor = conn.cursor()
    cursor.execute("SHOW TABLES")
    # After running .execute(), Impala will store the result sets on the server
    # until it is fetched. Use the method .fetchall() to pull the entire result
    # set over the network (you should only do it if you know dataset is small)
    tables = cursor.fetchall()
    
    print("Displaying list of tables")
    # the result is a list of tuples
    for t in tables:
        # we know that each row in SHOW TABLES result
        # should only contains one table name
        print(t[0])
        # exit() enable for only one table
    
    print("eol >>>")
    
    0 讨论(0)
  • 2021-01-01 03:54

    To connection Impala using python you can follow below steps,

    1. Install Coludera ODBC Driver for Impala.
    2. Create DSN using 64-bit ODBC driver, put your server details, below is sample screen shot for same

      1. Use below code snippet for connectivity

        import pyodbc

        with pyodbc.connect("DSN=impala_con", autocommit=True) as conn: ... df = pd.read_sql("", conn)

    0 讨论(0)
  • 2021-01-01 03:57

    Install the kerberos Python package, it will fix your issue.

    0 讨论(0)
  • 2021-01-01 04:00

    I ran into the same issue but i fixed it by installing the right version of required libraries.

    Install below python libraries using pip:

    six==1.12.0
    bit_array==0.1.0
    thrift==0.9.3
    thrift_sasl==0.2.1
    sasl==0.2.1
    impyla==0.13.8
    

    Below code is working fine with the python version 2.7 and 3.4.

    import ssl
    from impala.dbapi import connect
    import os
    os.system("kinit")
    conn = connect(host='hostname.io', port=21050, use_ssl=True, database='default', user='urusername', kerberos_service_name='impala', auth_mechanism = 'GSSAPI')
    cur = conn.cursor()
    cur.execute('SHOW DATABASES;')
    result=cur.fetchall()
    for data in result:
        print (data) 
    
    0 讨论(0)
  • 2021-01-01 04:00

    For me, installing this package fixed it: libsasl2-modules-gssapi-mit

    0 讨论(0)
  • 2021-01-01 04:05

    For me, the following connection parameters worked. I did not have to install any additional packages in python.

    connect(host="your_host", port=21050, auth_mechanism='GSSAPI', timeout=100000, use_ssl=False, ca_cert=None, ldap_user=None, ldap_password=None, kerberos_service_name='impala')
    
    0 讨论(0)
提交回复
热议问题