how to transform pandas dataframe for insertion via executemany() statement?

前端未结

关注

 3  1838

I have a fairly big pandas dataframe - 50 or so headers and a few hundred thousand rows of data - and I\'m looking to transfer this data to a database using the

相关标签:

3条回答

别那么骄傲

2020-12-30 04:32
You can try this:
```
cursor.executemany(sql_str, your_dataframe.values.tolist())
```
Hope it helps.
0 讨论(0)
发布评论:

提交评论
- 加载中...
鱼传尺愫

2020-12-30 04:46
I managed to figure this out in the end. So if you have a Pandas Dataframe which you want to write to a database using ceODBC which is the module I used, the code is:

(with all_data as the dataframe) map dataframe values to string and store each row as a tuple in a list of tuples
```
for r in all_data.columns.values:
    all_data[r] = all_data[r].map(str)
    all_data[r] = all_data[r].map(str.strip)   
tuples = [tuple(x) for x in all_data.values]
```
for the list of tuples, change all null value signifiers - which have been captured as strings in conversion above - into a null type which can be passed to the end database. This was an issue for me, might not be for you.
```
string_list = ['NaT', 'nan', 'NaN', 'None']

def remove_wrong_nulls(x):
    for r in range(len(x)):
        for i,e in enumerate(tuples):
            for j,k in enumerate(e):
                if k == x[r]:
                    temp=list(tuples[i])
                    temp[j]=None
                    tuples[i]=tuple(temp)

remove_wrong_nulls(string_list)
```
create a connection to the database
```
cnxn=ceODBC.connect('DRIVER={SOMEODBCDRIVER};DBCName=XXXXXXXXXXX;UID=XXXXXXX;PWD=XXXXXXX;QUIETMODE=YES;', autocommit=False)
cursor = cnxn.cursor()
```
define a function to turn the list of tuples into a new_list which is a further indexing on the list of tuples, into chunks of 1000. This was necessary for me to pass the data to the database whose SQL Query could not exceed 1MB.
```
def chunks(l, n):
    n = max(1, n)
    return [l[i:i + n] for i in range(0, len(l), n)]

new_list = chunks(tuples, 1000)
```
define your query.
```
query = """insert into XXXXXXXXXXXX("XXXXXXXXXX", "XXXXXXXXX", "XXXXXXXXXXX") values(?,?,?)"""
```
Run through the the new_list containing the list of tuples in groups of 1000 and perform executemany. Follow this by committing and closing the connection and that's it :)
```
for i in range(len(new_list)):
    cursor.executemany(query, new_list[i])
cnxn.commit()
cnxn.close()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
时光取名叫无心

2020-12-30 04:51
Might be a little late to answer this question, but maybe it can still help someone. executemany() is not implemented by many ODBC. One of the ones that does have it is MySQL. When they refer to sequence of parameters they mean:
```
parameters=[{'name':'Jorge', 'age':22, 'sex':'M'}, 
            {'name':'Karen', 'age':25, 'sex':'F'}, 
            {'name':'James', 'age':29, 'sex':'M'}]
```
and for a query statement it would look something like:
```
SQL = INSERT IGNORE INTO WORKERS (NAME, AGE, SEX) VALUES (%(name)s, %(age)s, %(sex)s)
```
Which looks like you got there. A couple things though I want to point out in case it helps: pandas has a to_sql function that inserts into a db if you provide it the connector object, and chunks the data as well.

To rapidly create a sequence of parameters from a pandas dataframe I found the following two methods helpful:
```
# creates list of dict, list of parameters
# REF: https://groups.google.com/forum/#!topic/pydata/qna3Z3WmVpM
parameters = [df.iloc[line, :].to_dict() for line in range(len(df))]

# Cleaner Way
parameters = df.to_dict(orient='records')
```
0 讨论(0)
发布评论:

提交评论
- 加载中...