问题
I want to read some CSV/Excel files on a secure SFTP folder, make some changes (fixed changes in each file like remove column 2) in those files, upload them to a Postgre DB and also the upload them to a different SFTP path in Python
What's the best way to this?
I have made a connection to the SFTP using pysftp library and am reading the Excel:
import pysftp
import pandas as pd
myHostname = "*****"
myUsername = "****"
myPassword = "***8"
cnopts =pysftp.CnOpts()
cnopts.hostkeys = None
sftp=pysftp.Connection(host=myHostname, username=myUsername,
password=myPassword,cnopts=cnopts)
print ("Connection succesfully stablished ... ")
sftp.chdir('test/test')
#sftp.pwd
a=[]
for i in sftp.listdir_attr():
with sftp.open(i.filename) as f:
df=pd.read_csv(f)
How should I proceed with the upload to DB and making those changes to the CSV permanent?
回答1:
You have the download part done.
For the upload part, see How to Transfer Pandas DataFrame to .csv on SFTP using Paramiko Library in Python? – While it's for Paramiko, pysftp Connection.open method behaves identically to Paramiko SFTPClient.open, so the code is the same.
Full code can be like:
with sftp.open("/remote/path/data.csv", "r+", bufsize=32768) as f:
# Download CSV contents from SFTP to memory
df = pd.read_csv(f)
# Modify as you need (just an example)
df.at[0, 'Name'] = 'changed'
# Upload the in-memory data back to SFTP
f.seek(0)
df.to_csv(f, index=False)
# Truncate the remote file in case the new version of the contents is smaller
f.truncate(f.tell())
The above updates the same file. If you want to upload to a different file, use this:
# Download CSV contents from SFTP to memory
with sftp.open("/remote/path/source.csv", "r") as f:
df = pd.read_csv(f)
# Modify as you need (just an example)
df.at[0, 'Name'] = 'changed'
# Upload the in-memory data back to SFTP
with sftp.open("/remote/path/target.csv", "w", bufsize=32768) as f:
df.to_csv(f, index=False)
For the purpose of bufsize
, see:
Writing to a file on SFTP server opened using pysftp "open" method is slow
Obligatory warning: Do not set cnopts.hostkeys = None
, unless you do not care about security. For the correct solution see Verify host key with pysftp.
回答2:
That's several questions in one question :)
I would suggest go with that approach:
- Make a local copy of the file (not sure how big it is, no point to shuffle it around between your local machine and sftp server. You cna use get method
- Make operations on your data with pandas, then dump it back to csv with to_csv method
- load data to the postgree using either
pandas.io
or pureSQLAlchemy
. Check the docs here - Upload the file to the destination you want with put method
来源:https://stackoverflow.com/questions/61817226/read-csv-excel-files-from-sftp-file-make-some-changes-in-those-files-using-pand