问题
I'm using pandas, and exporting data as json like this:
import pandas as pd
df = pd.DataFrame({'a': ['Têst']})
df.to_json(orient='records', lines=True)
> u'{"a":"T\\u00east"}'
This makes sense since we have a Unicode character 00ea
prefixed with \u
and it is escaped with \
when converted to JSON
But then I import the JSON strings into Postgres with COPY
buffer = cStringIO.StringIO()
buffer.write(df.to_json(orient='records', lines=True))
buffer.seek(0)
with connection.cursor() as cursor:
cursor.copy_expert(sql="""
COPY tmp (json_data) FROM STDIN WITH NULL AS '' ENCODING 'UTF8';
""", file=buffer)
The problem is that the result in the database ends up being
{"a": "Tu00east"}
and as you can see the double \\
is gone.
I tried using CSV
as the COPY mode, but it messes things up since there are commas in some of the data, and trying to set ESCAPE character and DELIMITER to something else always seem to cause failures.
The table column has a jsonb
type. I read in the docs that PG doesn't like non-ASCII Unicode over \x003f unless the DB encoding is UTF8, which it is in my case, so that shouldn't be an issue.
I'm trying to figure out why the escaping characters are removed here, and how to import into Postgres and conserve the encoding.
回答1:
Use the csv option for COPY
, with DELIMITER e'\x01' QUOTE e'\x02'
. I'm not sure whether this works for all possible valid JSON, but I've never had it fail.
$ psql -X testdb -c 'create table t(d jsonb)'
CREATE TABLE
$ cat foo.json
{"a":"Têst"}
$ cat foo.json | psql -X testdb -c "COPY t from stdin csv delimiter e'\x01' quote e'\x02'"
COPY 1
$ psql -X testdb -c 'select * from t';
d
---------------
{"a": "Têst"}
(1 row)
来源:https://stackoverflow.com/questions/53112698/issue-with-encoding-when-importing-json-into-postgres