Progress bar for pandas.DataFrame.to_sql

不羁岁月 提交于 2019-12-10 04:37:03

问题


I want to migrate data from a large csv file to sqlite3 database.

My code on Python 3.5 using pandas:

con = sqlite3.connect(DB_FILENAME)
df = pd.read_csv(MLS_FULLPATH)
df.to_sql(con=con, name="MLS", if_exists="replace", index=False)

Is it possible to print current status (progress bar) of execution of to_sql method?

I looked the article about tqdm, but didn't find how to do this.


回答1:


Unfortuantely DataFrame.to_sql does not provide a chunk-by-chunk callback, which is needed by tqdm to update its status. However, you can process the dataframe chunk by chunk:

import sqlite3
import pandas as pd
from tqdm import tqdm

DB_FILENAME='/tmp/test.sqlite'

def chunker(seq, size):
    # from http://stackoverflow.com/a/434328
    return (seq[pos:pos + size] for pos in xrange(0, len(seq), size))

def insert_with_progress(df, dbfile):
    con = sqlite3.connect(dbfile)
    chunksize = int(len(df) / 10) # 10%
    with tqdm(total=len(df)) as pbar:
        for i, cdf in enumerate(chunker(df, chunksize)):
            replace = "replace" if i == 0 else "append"
            cdf.to_sql(con=con, name="MLS", if_exists=replace, index=False)
            pbar.update(chunksize)

df = pd.DataFrame({'a': range(0,100000)})
insert_with_progress(df, DB_FILENAME)

Note I'm generating the DataFrame inline here for the sake of having a complete workable example without dependency.

The result is quite stunning:




回答2:


I wanted to share a variant of the solution posted by miraculixx - that I had to alter for SQLAlchemy:

#these need to be customized - myDataFrame, myDBEngine, myDBTable

df=myDataFrame

def chunker(seq, size):
    return (seq[pos:pos + size] for pos in range(0, len(seq), size))

def insert_with_progress(df):
    con = myDBEngine.connect()
    chunksize = int(len(df) / 10)
    with tqdm(total=len(df)) as pbar:
        for i, cdf in enumerate(chunker(df, chunksize)):
            replace = "replace" if i == 0 else "append"
            cdf.to_sql(name="myDBTable", con=conn, if_exists="append", index=False) 
            pbar.update(chunksize)
            tqdm._instances.clear()

insert_with_progress(df)


来源:https://stackoverflow.com/questions/39494056/progress-bar-for-pandas-dataframe-to-sql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!