Fastest way to convert file from latin1 to utf-8 in python

前端 未结 3 2013
一生所求
一生所求 2021-02-10 04:17

I need fastest way to convert files from latin1 to utf-8 in python. The files are large ~ 2G. ( I am moving DB data ). So far I have

import codecs
infile = codec         


        
3条回答
  •  走了就别回头了
    2021-02-10 04:48

    If you are desperate to do it in Python (or any other language), at least do the I/O in bigger chunks than lines, and avoid the codecs overhead.

    infile = open(tmpfile, 'rb')
    outfile = open(tmpfile1, 'wb')
    BLOCKSIZE = 65536 # experiment with size
    while True:
        block = infile.read(BLOCKSIZE)
        if not block: break
        outfile.write(block.decode('latin1').encode('utf8'))
    infile.close()
    outfile.close()
    

    Otherwise, go with iconv ... I haven't look under the hood but if it doesn't special-case latin1 input I'd be surprised :-)

提交回复
热议问题