How to Access Hive via Python?

后端未结

关注

 16  767

https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-Python appears to be outdated.

When I add this to /etc/profile:

export PYTHONP


                      
              相关标签:


      
      
        
          16条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  北海茫月        
                
              
                            
                2020-11-30 17:39
              
            
            
                                                                       
Similar to eycheu's solution, but a little more detailed.

Here is an alternative solution specifically for hive2 that does not require PyHive or installing system-wide packages. I am working on a linux environment that I do not have root access to so installing the SASL dependencies as mentioned in Tristin's post was not an option for me:


  If you're on Linux, you may need to install SASL separately before running the above. Install the package libsasl2-dev using apt-get or yum or whatever package manager for your distribution. 


Specifically, this solution focuses on leveraging the python package: JayDeBeApi. In my experience installing this one extra package on top of a python Anaconda 2.7 install was all I needed. This package leverages java (JDK). I am assuming that is already set up.

Step 1: Install JayDeBeApi

pip install jaydebeap


Step 2: Download appropriate drivers for your environment:



Here is a link to the jars required for an enterprise CDH environment
Another post that talks about where to find jdbc drivers for Apache Hive


Store all .jar files in a directory. I will refer to this directory as /path/to/jar/files/.

Step 3: Identify your systems authentication mechanism:

In the pyhive solutions listed I've seen PLAIN listed as the authentication mechanism as well as Kerberos. Note that your jdbc connection URL will depend on the authentication mechanism you are using. I will explain  Kerberos solution without passing a username/password. Here is more information Kerberos authentication and options.

Create a Kerberos ticket if one is not already created

$ kinit


Tickets can be viewed via klist. 

You are now ready to make the connection via python:

import jaydebeapi
import glob
# Creates a list of jar files in the /path/to/jar/files/ directory
jar_files = glob.glob('/path/to/jar/files/*.jar')

host='localhost'
port='10000'
database='default'

# note: your driver will depend on your environment and drivers you've
# downloaded in step 2
# this is the driver for my environment (jdbc3, hive2, cloudera enterprise)
driver='com.cloudera.hive.jdbc3.HS2Driver'

conn_hive = jaydebeapi.connect(driver,
        'jdbc:hive2://'+host+':' +port+'/'+database+';AuthMech=1;KrbHostFQDN='+host+';KrbServiceName=hive'
                           ,jars=jar_files)


If you only care about reading, then you can read it directly into a panda's dataframe with ease via eycheu's solution:

import pandas as pd
df = pd.read_sql("select * from table", conn_hive)


Otherwise, here is a more versatile communication option:

cursor = conn_hive.cursor()
sql_expression = "select * from table"
cursor.execute(sql_expression)
results = cursor.fetchall()


You could imagine, if you wanted to create a table, you would not need to "fetch" the results, but could submit a create table query instead. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  滥情空心        
                
              
                            
                2020-11-30 17:43
              
            
            
                                                                       
I assert that you are using HiveServer2, which is the reason that makes the code doesn't work.

You may use pyhs2 to access your Hive correctly and the example code like that:

import pyhs2

with pyhs2.connect(host='localhost',
               port=10000,
               authMechanism="PLAIN",
               user='root',
               password='test',
               database='default') as conn:
    with conn.cursor() as cur:
        #Show databases
        print cur.getDatabases()

        #Execute query
        cur.execute("select * from table")

        #Return column info from query
        print cur.getSchema()

        #Fetch table results
        for i in cur.fetch():
            print i


Attention that you may install python-devel.x86_64 cyrus-sasl-devel.x86_64 before installing pyhs2 with pip.

Wish this can help you.

Reference: https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-PythonClientDriver
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  囚心锁ツ        
                
              
                            
                2020-11-30 17:47
              
            
            
                                                                       
You can use hive library,for that you want to import hive Class from hive import ThriftHive

Try This example:

import sys

from hive import ThriftHive
from hive.ttypes import HiveServerException

from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol

try:
  transport = TSocket.TSocket('localhost', 10000)
  transport = TTransport.TBufferedTransport(transport)
  protocol = TBinaryProtocol.TBinaryProtocol(transport)
  client = ThriftHive.Client(protocol)
  transport.open()
  client.execute("CREATE TABLE r(a STRING, b INT, c DOUBLE)")
  client.execute("LOAD TABLE LOCAL INPATH '/path' INTO TABLE r")
  client.execute("SELECT * FROM r")
  while (1):
    row = client.fetchOne()
    if (row == None):
       break
    print row

  client.execute("SELECT * FROM r")
  print client.fetchAll()
  transport.close()
except Thrift.TException, tx:
  print '%s' % (tx.message)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一生所求        
                
              
                            
                2020-11-30 17:49
              
            
            
                                                                       
It is a common practice to prohibit for a user to download and install packages and libraries on cluster nodes. In this case solutions of @python-starter and @goks are working perfect, if hive run on the same node. Otherwise, one can use a beeline  instead of hive command line tool. See details
#python 2
import commands

cmd = 'beeline -u "jdbc:hive2://node07.foo.bar:10000/...<your connect string>" -e "SELECT * FROM db_name.table_name LIMIT 1;"'

status, output = commands.getstatusoutput(cmd)

if status == 0:
   print output
else:
   print "error"

.
#python 3
import subprocess

cmd = 'beeline -u "jdbc:hive2://node07.foo.bar:10000/...<your connect string>" -e "SELECT * FROM db_name.table_name LIMIT 1;"'

status, output = subprocess.getstatusoutput(cmd)

if status == 0:
   print(output)
else:
   print("error")

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  夕颜        
                
              
                            
                2020-11-30 17:51
              
            
            
                                                                       
This can be a quick hack to connect hive and python,

from pyhive import hive
cursor = hive.connect('YOUR_HOST_NAME').cursor()
cursor.execute('SELECT * from table_name LIMIT 5',async=True)
print cursor.fetchall()


Output: List of Tuples
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  失恋的感觉        
                
              
                            
                2020-11-30 17:54
              
            
            
                                                                       
None of the answers demonstrate how to fetch and print the table headers. Modified the standard example from PyHive which is  widely used and actively maintained.
from pyhive import hive
cursor = hive.connect(host="localhost", 
                      port=10000, 
                      username="shadan", 
                      auth="KERBEROS", 
                      kerberos_service_name="hive"
                      ).cursor()
cursor.execute("SELECT * FROM my_dummy_table LIMIT 10")
columnList = [desc[0] for desc in cursor.description]
headerStr = ",".join(columnList)
headerTuple = tuple(headerStr.split (",")
print(headerTuple)
print(cursor.fetchone())
print(cursor.fetchall())

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     上一页
1
2
3
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复