How to Access Hive via Python?

后端 未结 16 766
小蘑菇
小蘑菇 2020-11-30 17:11

https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-Python appears to be outdated.

When I add this to /etc/profile:

export PYTHONP         


        
相关标签:
16条回答
  • 2020-11-30 17:34

    pyhs2 is no longer maintained. A better alternative is impyla

    Don't be confused that some of the above examples below about Impala; just change port to 10000 (default) for HiveServer2, and it'll work the same way as with Impala examples. It's the same protocol (Thrift) that is used for both Impala and Hive.

    https://github.com/cloudera/impyla

    It has many more features over pyhs2, for example, it has Kerberos authentication, which is a must for us.

    from impala.dbapi import connect
    conn = connect(host='my.host.com', port=10000)
    cursor = conn.cursor()
    cursor.execute('SELECT * FROM mytable LIMIT 100')
    print cursor.description  # prints the result set's schema
    results = cursor.fetchall()
    
    ##
    cursor.execute('SELECT * FROM mytable LIMIT 100')
    for row in cursor:
        process(row)
    

    Cloudera is putting more effort now on hs2 client https://github.com/cloudera/hs2client which is a C/C++ HiveServer2/Impala client. Might be a better option if you push a lot of data to/from python. (has Python binding too - https://github.com/cloudera/hs2client/tree/master/python )

    Some more information on impyla:

    • http://blog.cloudera.com/blog/2014/04/a-new-python-client-for-impala/
    • https://github.com/cloudera/impyla/blob/master/README.md
    0 讨论(0)
  • 2020-11-30 17:35

    The examples above are a bit out of date. One new example is here:

    import pyhs2 as hive
    import getpass
    DEFAULT_DB = 'default'
    DEFAULT_SERVER = '10.37.40.1'
    DEFAULT_PORT = 10000
    DEFAULT_DOMAIN = 'PAM01-PRD01.IBM.COM'
    
    u = raw_input('Enter PAM username: ')
    s = getpass.getpass()
    connection = hive.connect(host=DEFAULT_SERVER, port= DEFAULT_PORT, authMechanism='LDAP', user=u + '@' + DEFAULT_DOMAIN, password=s)
    statement = "select * from user_yuti.Temp_CredCard where pir_post_dt = '2014-05-01' limit 100"
    cur = connection.cursor()
    
    cur.execute(statement)
    df = cur.fetchall() 
    

    In addition to the standard python program, a few libraries need to be installed to allow Python to build the connection to the Hadoop databae.

    1.Pyhs2, Python Hive Server 2 Client Driver

    2.Sasl, Cyrus-SASL bindings for Python

    3.Thrift, Python bindings for the Apache Thrift RPC system

    4.PyHive, Python interface to Hive

    Remember to change the permission of the executable

    chmod +x test_hive2.py ./test_hive2.py

    Wish it helps you. Reference: https://sites.google.com/site/tingyusz/home/blogs/hiveinpython

    0 讨论(0)
  • 2020-11-30 17:36

    Below python program should work to access hive tables from python:

    import commands
    
    cmd = "hive -S -e 'SELECT * FROM db_name.table_name LIMIT 1;' "
    
    status, output = commands.getstatusoutput(cmd)
    
    if status == 0:
       print output
    else:
       print "error"
    
    0 讨论(0)
  • 2020-11-30 17:37

    By using Python Client Driver

    pip install pyhs2
    

    Then

    import pyhs2
    
    with pyhs2.connect(host='localhost',
                   port=10000,
                   authMechanism="PLAIN",
                   user='root',
                   password='test',
                   database='default') as conn:
    with conn.cursor() as cur:
        #Show databases
        print cur.getDatabases()
    
        #Execute query
        cur.execute("select * from table")
    
        #Return column info from query
        print cur.getSchema()
    
        #Fetch table results
        for i in cur.fetch():
            print i
    

    Refer : https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-PythonClientDriver

    0 讨论(0)
  • 2020-11-30 17:38

    To connect using a username/password and specifying ports, the code looks like this:

    from pyhive import presto
    
    cursor = presto.connect(host='host.example.com',
                        port=8081,
                        username='USERNAME:PASSWORD').cursor()
    
    sql = 'select * from table limit 10'
    
    cursor.execute(sql)
    
    print(cursor.fetchone())
    print(cursor.fetchall())
    
    0 讨论(0)
  • 2020-11-30 17:38

    here's a generic approach which makes it easy for me because I keep connecting to several servers (SQL, Teradata, Hive etc.) from python. Hence, I use the pyodbc connector. Here's some basic steps to get going with pyodbc (in case you have never used it):

    • Pre-requisite: You should have the relevant ODBC connection in your windows setup before you follow the below steps. In case you don't have it, find the same here

    Once complete: STEP 1. pip install: pip install pyodbc (here's the link to download the relevant driver from Microsoft's website)

    STEP 2. now, import the same in your python script:

    import pyodbc
    

    STEP 3. Finally, go ahead and give the connection details as follows:

    conn_hive = pyodbc.connect('DSN = YOUR_DSN_NAME , SERVER = YOUR_SERVER_NAME, UID = USER_ID, PWD = PSWD' )
    

    The best part of using pyodbc is that I have to import just one package to connect to almost any data source.

    0 讨论(0)
提交回复
热议问题