During an ETL process I needed to extract and load a JSON column from one Postgres database to another. We use Pandas for this since it has so many ways to read and write data f
If you (re-)create the JSON column using json.dumps()
, you're all set.
This way the data can be written using pandas' .to_sql()
method, but also the much faster COPY
method of PostgreSQL (via copy_expert()
of psycopg2 or sqlalchemy's raw_connection()
) can be employed.
For the sake of simplicity, let's assume that we have a column of dictionaries that should be written into a JSON(B) column:
import json
import pandas as pd
df = pd.DataFrame([['row1',{'a':1, 'b':2}],
['row2',{'a':3,'b':4,'c':'some text'}]],
columns=['r','kv'])
# conversion function:
def dict2json(dictionary):
return json.dumps(dictionary, ensure_ascii=False)
# overwrite the dict column with json-strings
df['kv'] = df.kv.map(dict2json)