UTF-8 and os.listdir()

后端 未结 1 1689
情书的邮戳
情书的邮戳 2021-01-07 05:00

I\'m having a bit of trouble with a file containing the \"ș\" character (that\'s \\xC8\\x99 in UTF-8 - LATIN SMALL LETTER S WITH COMMA BELOW).

I\'m crea

相关标签:
1条回答
  • 2021-01-07 05:31

    The OS X filesystem mostly uses decomposed characters rather than their combined form. You'll need to normalise the filenames back to the NFC combined normalised form:

    import unicodedata
    files = [unicodedata.normalize('NFC', f) for f in os.listdir(u'.')]
    

    This processes filenames as unicode; you'd otherwise need to decode the bytestring to unicode first.

    Also see the unicodedata.normalize() function documentation.

    0 讨论(0)
提交回复
热议问题