FTP to Google Storage

有些话、适合烂在心里 提交于 2019-12-04 19:51:14

问题


Some files get uploaded on a daily basis to an FTP server and I need those files under Google Cloud Storage. I don't want to bug the users that upload the files to install any additional software and just let them keep using their FTP client. Is there a way to use GCS as an FTP server? If not, how can I create a job that periodically picks up the files from an FTP location and puts them in GCS? In other words: what's the best and simplest way to do it?


回答1:


You could write yourself an FTP server which uploads to GCS, for example based on pyftpdlib

Define a custom handler which stores to GCS when a file is received

import os
from pyftpdlib.handlers import FTPHandler
from pyftpdlib.servers import FTPServer
from pyftpdlib.authorizers import DummyAuthorizer
from google.cloud import storage

class MyHandler:
    def on_file_received(self, file):
        storage_client = storage.Client()
        bucket = storage_client.get_bucket('your_gcs_bucket')
        blob = bucket.blob(file[5:]) # strip leading /tmp/
        blob.upload_from_filename(file)
        os.remove(file)
    def on_... # implement other events

def main():
    authorizer = DummyAuthorizer()
    authorizer.add_user('user', 'password', homedir='/tmp', perm='elradfmw')

    handler = MyHandler
    handler.authorizer = authorizer
    handler.masquerade_address = add.your.public.ip
    handler.passive_ports = range(60000, 60999)

    server = FTPServer(("127.0.0.1", 21), handler)
    server.serve_forever()

if __name__ == "__main__":
    main()

I've successfully run this on Google Container Engine (it requires some effort getting passive FTP working properly) but it should be pretty simple to do on Compute Engine. According to the above configuration, open port 21 and ports 60000 - 60999 on the firewall.

To run it, python my_ftp_server.py - if you want to listen on port 21 you'll need root privileges.




回答2:


You could setup a cron and rsync between the FTP server and Google Cloud Storage using gsutil rsync or open source rclone tool.

If you can't run those commands on the FTP server periodically, you could mount the FTP server as a local filesystem or drive (Linux, Windows)




回答3:


I have successfully set up an FTP proxy to GCS using gcsfs in a VM in Google Compute (mentioned by jkff in the comment to my question), with these instructions: http://ilyapimenov.com/blog/2015/01/19/ftp-proxy-to-gcs.html

Some changes are needed though:

  • In /etc/vsftpd.conf change #write_enable=YES
    to write_enable=YES
  • Add firewall rules in your GC project to allow access to ports 21 and passive ports 15393 to 15592 (https://console.cloud.google.com/networking/firewalls/list)

Some possible problems:

  • If you can access the FTP server using the local ip, but not the remote ip, it's probably because you haven't set up the firewall rules
  • If you can access the ftp server, but are unable to write, it's probably because you need the write_enable=YES
  • If you are tying to read on the folder you created on /mnt, but get a I/O error, it's probably because the bucket in gcsfs_config is not right.

Also, your ftp client needs to use the transfer mode set to "passive".




回答4:


Set up a VM in the google cloud, using some *nix flavor. Set up ftp on it, and point it to a folder abc. Use google fuse to mount abc as a GCS bucket. Voila - back and forth between gcs / ftp without writing any software. (Small print: fuse rolls up and dies if you push too much data, so bounce it periodically, once a week or once a day; also you might need to set the mount or fuse to allow permissions for all users)



来源:https://stackoverflow.com/questions/43486480/ftp-to-google-storage

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!