Python psql \copy CSV to remote server

人走茶凉 提交于 2021-01-28 13:02:11

问题


I am attempting to copy a csv (which has a header and quote character ") with python 3.6 to a table on a remote postgres 10 server. It is a large CSV (2.5M rows, 800MB) and while I previously imported it into a dataframe and then used dataframe.to_sql, this was very memory intensive so I switched to using COPY.

Using COPY with psycopg2 or sqlalchemy would work fine but the remote server does not have access to the local file system.

Using psql in the terminal I have successfully run the query below to populate the table. I don't think using \copy is possible with psycopg2 or sqlalchemy.

\copy table (col1, col2) FROM file_location CSV HEADER QUOTE '"' NULL ''

However when I try to use a one line psql -c command like below, it does not work and I get the error:

ERROR: COPY quote must be a single one-byte character.

psql -U user -h ip -d db -w pw -c "\copy table (col1, col2) FROM file_location CSV HEADER QUOTE '"' NULL ''"

Could you tell me why this is the case?

This one-line -c psql statement would be easier to implement with the subprocess module in python than having to open a terminal and execute a command which I'm not sure how to do. If you could suggest a workaround or different methodology that would be great.

====== Per Andrew's suggestion to escape the quote character this worked on the command line. However when implementing it in python like below, a new error comes up:

/bin/sh: -c: line 0: unexpected EOF while looking for matching `''

/bin/sh: -c: line 1: syntax error: unexpected end of file

"\"\copy table (col1, col2) FROM file_location CSV HEADER QUOTE '\"' NULL ''\""
cmd = f'psql -U {user} -h {ip} -d {db} -w {pw} -c {copy_statement}'
subprocess.call(cmd, shell=True)

回答1:


Try not to use shell=True if you can avoid it. better to tokenize the command yourself to help sh.

subprocess.call(["psql", "-U", "{user}", "-h", "{ip}", "-d", "{db}", "-w", "{pw}", "-c", "{copy statement}"])

In this case then your copy statement could be as it is passed to psql verbatim, because there are no shell quoting issues to take into account. (N.B. still have to quote this for python, so the string would remain as is).


If you still want to use shell=True then you have to escape the string literal for both python and shell

"\"\copy table (col1, col2) FROM file_location CSV HEADER QUOTE '\\\"' NULL ''\""

will create a string in python which will be

"\copy table (col1, col2) FROM file_location CSV HEADER QUOTE '\"' NULL ''\"

Which is what we found out we needed on our shell in the first place!


Edit (clarifying something from the comments):

subprocess.call, when not using shell=True, takes an iterable of arguments.

So you could have

psql_command = "\"\copy table (col1, col2) FROM file_location CSV HEADER QUOTE '\\\"' NULL ''\""
# user, hostname, password, dbname all defined elsewhere above.
command = ["psql",
    "-U", user,
    "-h", hostname,
    "-d", dbname,
    "-w", password,
    "-c", psql_command,
]

subprocess.call(command)

See https://docs.python.org/2/library/subprocess.html#subprocess.call or https://docs.python.org/3/library/subprocess.html#subprocess.call

extra edit :- Please note that to avoid shell injection, you should be using the method described here. See the warning section of https://docs.python.org/2/library/subprocess.html#frequently-used-arguments



来源:https://stackoverflow.com/questions/46758865/python-psql-copy-csv-to-remote-server

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!