Access Hive Data Using Python

前端 未结 4 1674
有刺的猬
有刺的猬 2020-12-09 19:13

I have some data in HDFS,i need to access that data using python,can anyone tell me how data is accessed from hive using python?

相关标签:
4条回答
  • 2020-12-09 19:29

    You can use hive library for access hive from python,for that you want to import hive Class from hive import ThriftHive

    Below the Example

    import sys
    
    from hive import ThriftHive
    from hive.ttypes import HiveServerException
    
    from thrift import Thrift
    from thrift.transport import TSocket
    from thrift.transport import TTransport
    from thrift.protocol import TBinaryProtocol
    
    try:
      transport = TSocket.TSocket('localhost', 10000)
      transport = TTransport.TBufferedTransport(transport)
      protocol = TBinaryProtocol.TBinaryProtocol(transport)
      client = ThriftHive.Client(protocol)
      transport.open()
      client.execute("CREATE TABLE r(a STRING, b INT, c DOUBLE)")
      client.execute("LOAD TABLE LOCAL INPATH '/path' INTO TABLE r")
      client.execute("SELECT * FROM r")
      while (1):
        row = client.fetchOne()
        if (row == None):
           break
        print row
    
      client.execute("SELECT * FROM r")
      print client.fetchAll()
      transport.close()
    except Thrift.TException, tx:
      print '%s' % (tx.message)
    
    0 讨论(0)
  • 2020-12-09 19:39

    I tried almost every possible solution to connect to Hive from remote windows server. Nothing seemed to work. PyHive and pyhs2 use SASL and SASL is not supported on windows. Installing it through cygwin also didn't help. The only solution that worked for me was pyodbc. You just need to configure DSN on your system.

    0 讨论(0)
  • 2020-12-09 19:52

    To install you'll need these libraries:

    pip install sasl
    pip install thrift
    pip install thrift-sasl
    pip install PyHive
    

    If you're on Linux, you may need to install SASL separately before running the above. Install the package libsasl2-dev using apt-get or yum or whatever package manager. For Windows there are some options on GNU.org. On a Mac SASL should be available if you've installed xcode developer tools (xcode-select --install)

    After installation, you can execute a hive query like this:

    from pyhive import hive
    conn = hive.Connection(host="YOUR_HIVE_HOST", port=PORT, username="YOU")
    

    Now that you have the hive connection, you have options how to use it. You can just straight-up query:

    cursor = conn.cursor()
    cursor.execute("SELECT cool_stuff FROM hive_table")
    for result in cursor.fetchall():
      use_result(result)
    

    ...or to use the connection to make a Pandas dataframe:

    import pandas as pd
    df = pd.read_sql("SELECT cool_stuff FROM hive_table", conn)
    
    0 讨论(0)
  • 2020-12-09 19:52

    A much simpler solution if you're on Windows uses pyodbc:

      import pyodbc
      import pandas as pd
    
      # connect odbc to data source name
      conn = pyodbc.connect("DSN=<your_dsn>", autocommit=True)
    
      # read data into dataframe
      hive_df = pd.read_sql("SELECT * FROM <table_name>", conn)
    

    As long as you have an ODBC driver and a DSN, that's all you need.

    0 讨论(0)
提交回复
热议问题