Create/Insert Json in Postgres with requests and psycopg2

前端 未结 1 1300
一向
一向 2021-02-04 23:09

Just started a project with PostgreSQL. I would like to make the leap from Excel to a database and I am stuck on create and insert. Once I run this I will have to s

相关标签:
1条回答
  • 2021-02-04 23:22

    It seems like you want to create a table with one column named "data". The type of this column is JSON. (I would recommend creating one column per field, but it's up to you.)

    In this case the variable data (that is read from the request) is a list of dicts. As I mentioned in my comment, you can loop over data and do the inserts one at a time as executemany() is not faster than multiple calls to execute().

    What I did was the following:

    1. Create a list of fields that you care about.
    2. Loop over the elements of data
    3. For each item in data, extract the fields into my_data
    4. Call execute() and pass in json.dumps(my_data) (Converts my_data from a dict into a JSON-string)

    Try this:

    #!/usr/bin/env python
    import requests
    import psycopg2
    import json
    
    conn = psycopg2.connect(database='NHL', user='postgres', password='postgres', host='localhost', port='5432')
    
    req = requests.get('http://www.nhl.com/stats/rest/skaters?isAggregate=false&reportType=basic&isGame=false&reportName=skatersummary&sort=[{%22property%22:%22playerName%22,%22direction%22:%22ASC%22},{%22property%22:%22goals%22,%22direction%22:%22DESC%22},{%22property%22:%22assists%22,%22direction%22:%22DESC%22}]&cayenneExp=gameTypeId=2%20and%20seasonId%3E=20172018%20and%20seasonId%3C=20172018') 
    
    # data here is a list of dicts
    data = req.json()['data']
    
    cur = conn.cursor()
    # create a table with one column of type JSON
    cur.execute("CREATE TABLE t_skaters (data json);")
    
    fields = [
        'seasonId',
        'playerName',
        'playerFirstName',
        'playerLastName',
        'playerId',
        'playerHeight',
        'playerPositionCode',
        'playerShootsCatches',
        'playerBirthCity',
        'playerBirthCountry',
        'playerBirthStateProvince',
        'playerBirthDate',
        'playerDraftYear',
        'playerDraftRoundNo',
        'playerDraftOverallPickNo'
    ]
    
    for item in data:
        my_data = {field: item[field] for field in fields}
        cur.execute("INSERT INTO t_skaters VALUES (%s)", (json.dumps(my_data),))
    
    
    # commit changes
    conn.commit()
    # Close the connection
    conn.close()
    

    I am not 100% sure if all of the postgres syntax is correct here (I don't have access to a PG database to test), but I believe that this logic should work for what you are trying to do.

    Update For Separate Columns

    You can modify your create statement to handle multiple columns, but it would require knowing the data type of each column. Here's some psuedocode you can follow:

    # same boilerplate code from above
    cur = conn.cursor()
    # create a table with one column per field
    cur.execute(
    """CREATE TABLE t_skaters (seasonId INTEGER, playerName VARCHAR, ...);"""
    )
    
    fields = [
        'seasonId',
        'playerName',
        'playerFirstName',
        'playerLastName',
        'playerId',
        'playerHeight',
        'playerPositionCode',
        'playerShootsCatches',
        'playerBirthCity',
        'playerBirthCountry',
        'playerBirthStateProvince',
        'playerBirthDate',
        'playerDraftYear',
        'playerDraftRoundNo',
        'playerDraftOverallPickNo'
    ]
    
    for item in data:
        my_data = [item[field] for field in fields]
        # need a placeholder (%s) for each variable 
        # refer to postgres docs on INSERT statement on how to specify order
        cur.execute("INSERT INTO t_skaters VALUES (%s, %s, ...)", tuple(my_data))
    
    
    # commit changes
    conn.commit()
    # Close the connection
    conn.close()
    

    Replace the ... with the appropriate values for your data.

    0 讨论(0)
提交回复
热议问题