Downloading text files with Python and ftplib.FTP from z/os

后端 未结 4 912
無奈伤痛
無奈伤痛 2021-01-05 18:31

I\'m trying to automate downloading of some text files from a z/os PDS, using Python and ftplib.

Since the host files are EBCDIC, I can\'t simply use FTP.retrbinary(

相关标签:
4条回答
  • 2021-01-05 19:03

    Your writelineswitheol method appends '\r\n' instead of '\n' and then writes the result to a file opened in text mode. The effect, no matter what platform you are running on, will be an unwanted '\r'. Just append '\n' and you will get the appropriate line ending.

    Proper error handling should not be relegated to a "bells and whistles" version. You should set up your callback so that your file open() is in a try/except and retains a reference to the output file handle, your write call is in a try/except, and you have a callback_obj.close() method which you use when retrlines() returns to explicitly file_handle.close() (in a try/except) -- that way you get explict error handling e.g. messages "can't (open|write to|close) file X because Y" AND you save having to think about when your files are going to be implicitly closed and whether you risk running out of file handles.

    Python 3.x ftplib.FTP.retrlines() should give you str objects which are in effect Unicode strings, and you will need to encode them before you write them -- unless the default encoding is latin1 which would be rather unusual for a Windows box. You should have test files with (1) all possible 256 bytes (2) all bytes that are valid in the expected EBCDIC codepage.

    [a few "sanitation" remarks]

    1. You should consider upgrading your Python from 3.0 (a "proof of concept" release) to 3.1.

    2. To facilitate better understanding of your code, use "i" as an identifier only as a sequence index and only if you irredeemably acquired the habit from FORTRAN 3 or more decades ago :-)

    3. Two of the problems discovered so far (appending line terminator to each character, wrong line terminator) would have shown up the first time you tested it.

    0 讨论(0)
  • 2021-01-05 19:06

    Just came across this question as I was trying to figure out how to recursively download datasets from z/OS. I've been using a simple python script for years now to download ebcdic files from the mainframe. It effectively just does this:

    def writeline(line):
        file.write(line + "\n")
    
    file = open(filename, "w")
    ftp.retrlines("retr " + filename, writeline)
    
    0 讨论(0)
  • 2021-01-05 19:08

    Use retrlines of ftplib to download file from z/os, each line has no '\n'.

    It's different from windows ftp command 'get xxx'.

    We can rewrite the function 'retrlines' to 'retrlines_zos' in ftplib.py.

    Just copy the whole code of retrlines, and chane the 'callback' line to:

    ...

    callback(line + "\n")

    ...

    I tested and it worked.

    0 讨论(0)
  • 2021-01-05 19:14

    You should be able to download the file as a binary (using retrbinary) and use the codecs module to convert from EBCDIC to whatever output encoding you want. You should know the specific EBCDIC code page being used on the z/OS system (e.g. cp500). If the files are small, you could even do something like (for a conversion to UTF-8):

    file = open(ebcdic_filename, "rb")
    data = file.read()
    converted = data.decode("cp500").encode("utf8")
    file = open(utf8_filename, "wb")
    file.write(converted)
    file.close()
    

    Update: If you need to use retrlines to get the lines and your lines are coming back in the correct encoding, your approach will not work, because the callback is called once for each line. So in the callback, sequence will be the line, and your for loop will write individual characters in the line to the output, each on its own line. So you probably want to do self.write(sequence + "\r\n") rather than the for loop. It still doesn' feel especially right to subclass file just to add this utility method, though - it probably needs to be in a different class in your bells-and-whistles version.

    0 讨论(0)
提交回复
热议问题