How can I populate a pandas DataFrame with the result of a Snowflake sql query?

后端 未结 2 2059
感动是毒
感动是毒 2021-02-13 16:10

Using the Python Connector I can query Snowflake:

import snowflake.connector

# Gets the version
ctx = snowflake.connector.connect(
    user=USER,
    password=P         


        
相关标签:
2条回答
  • 2021-02-13 16:27

    There is now a method .fetch_pandas.all() for this, no need for SQL Alchemy anymore.

    Note that you need to install snowflake.connector for pandas by doing this

    pip install snowflake-connector-python[pandas]
    

    Full documentation here

    import pandas as pd
    import snowflake.connector
    
    conn = snowflake.connector.connect(
                user="xxx",
                password="xxx",
                account="xxx",
                warehouse="xxx",
                database="MYDB",
                schema="MYSCHEMA"
                )
    
    cur = conn.cursor()
    
    # Execute a statement that will generate a result set.
    sql = "select * from MYTABLE limit 10"
    cur.execute(sql)
    # Fetch the result set from the cursor and deliver it as the Pandas DataFrame.
    df = cur.fetch_pandas_all()
    
    0 讨论(0)
  • 2021-02-13 16:28

    You can use DataFrame.from_records() or pandas.read_sql() with snowflake-sqlalchemy. The snowflake-alchemy option has a simpler API

    pd.DataFrame.from_records(iter(cur), columns=[x[0] for x in cur.description])
    

    will return a DataFrame with proper column names taken from the SQL result. The iter(cur) will convert the cursor into an iterator and cur.description gives the names and types of the columns.

    So the complete code will be

    import snowflake.connector
    import pandas as pd
    
    # Gets the version
    ctx = snowflake.connector.connect(
        user=USER,
        password=PASSWORD,
        account=ACCOUNT,
        authenticator='https://XXXX.okta.com',
        )
    ctx.cursor().execute('USE warehouse MY_WH')
    ctx.cursor().execute('USE MYDB.MYSCHEMA')
    
    
    query = '''
    select * from MYDB.MYSCHEMA.MYTABLE
    LIMIT 10;
    '''
    
    cur = ctx.cursor().execute(query)
    df = pd.DataFrame.from_records(iter(cur), columns=[x[0] for x in cur.description])
    

    If you prefer using pandas.read_sql then you can

    import pandas as pd
    
    from sqlalchemy import create_engine
    from snowflake.sqlalchemy import URL
    
    
    url = URL(
        account = 'xxxx',
        user = 'xxxx',
        password = 'xxxx',
        database = 'xxx',
        schema = 'xxxx',
        warehouse = 'xxx',
        role='xxxxx',
        authenticator='https://xxxxx.okta.com',
    )
    engine = create_engine(url)
    
    
    connection = engine.connect()
    
    query = '''
    select * from MYDB.MYSCHEMA.MYTABLE
    LIMIT 10;
    '''
    
    df = pd.read_sql(query, connection)
    
    0 讨论(0)
提交回复
热议问题