Reading UTF-8 text files with ReadList

后端 未结 2 455
攒了一身酷
攒了一身酷 2021-01-13 03:25

Is it possible to use ReadList to read UTF-8 (or any other) encoded text files using ReadList[..., Word], or is it ASCII-only? If it\'s ASCII-only

相关标签:
2条回答
  • 2021-01-13 04:04

    This seems to work

    FromCharacterCode[ToCharacterCode[ReadList["raw.php.txt", Word]], "UTF-8"]
    

    The timings I get for the linked test file are

    FromCharacterCode[ToCharacterCode[ReadList["test.txt", Word]], "UTF-8"]); // Timing
    
    (* ==> {0.000195, Null} *)
    
    Import["test.txt", "Text"]; // Timing
    
    (* ==> {0.01784, Null} *)
    
    0 讨论(0)
  • 2021-01-13 04:10

    If I leave out Word, this works:

    $CharacterEncoding = "UTF-8";
    
    ReadList["UTF8.txt"]
    

    This however is a failure, because the data is not read as strings.

    Please try this on a larger file and report its performance:

    FromCharacterCode[BinaryReadList["UTF8.txt"], "UTF-8"]
    
    0 讨论(0)
提交回复
热议问题