Python pickle: fix \r characters before loading

前端 未结 4 1707
忘掉有多难
忘掉有多难 2021-01-02 13:38

I got a pickled object (a list with a few numpy arrays in it) that was created on Windows and apparently saved to a file loaded as text, not in binary mode (ie. with

相关标签:
4条回答
  • 2021-01-02 14:08

    Can't you -- on Windows -- just open the file in text mode, the same way it was written, read it in and then write it out to another file opened properly in binary mode?

    0 讨论(0)
  • 2021-01-02 14:25

    Have you tried unpickling in text mode? That is,

    x = pickle.load(open(filename, 'r'))
    

    (On Windows, of course.)

    0 讨论(0)
  • 2021-01-02 14:28

    Presuming that the file was created with the default protocol=0 ASCII-compatible method, you should be able to load it anywhere by using open('pickled_file', 'rU') i.e. universal newlines.

    If this doesn't work, show us the first few hundred bytes: print repr(open('pickled_file', 'rb').read(200)) and paste the results into an edit of your question.

    Update after file contents were published:

    Your file starts with '\x80\x02'; it was dumped with protocol 2, the latest/best. Protocols 1 and 2 are binary protocols. Your file was written in text mode on Windows. This has resulted in each '\n' being converted to '\r\n' by the C runtime. Files should be opened in binary mode like this:

    with open('result.pickle', 'wb') as f: # b for binary
        pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL)
    
    with open('result.pickle', 'rb') as f: # b for binary
        obj = pickle.load(f)
    

    Docs are here. This code will work portably on both Windows and non-Windows systems.

    You can recover the original pickle image by reading the file in binary mode and then reversing the damage by replacing all occurrences of '\r\n' by '\n'. Note: This recovery procedure is necessary whether you are trying to read it on Windows or not.

    0 讨论(0)
  • 2021-01-02 14:31

    Newlines in Windows aren't just '\r', it's CRLF, or '\r\n'.

    Give file.read().replace('\r\n', '\n') a try. You were previously deleting carriage returns that may not have actually been part of newlines.

    0 讨论(0)
提交回复
热议问题