How to check encoding of a CSV file

后端 未结 6 1993
醉梦人生
醉梦人生 2020-12-05 12:41

I have a CSV file and I wish to understand its encoding. Is there a menu option in Microsoft Excel that can help me detect it

OR do I need to make use of programming

相关标签:
6条回答
  • 2020-12-05 13:07

    You can use Notepad++ to evaluate a file's encoding without needing to write code. The evaluated encoding of the open file will display on the bottom bar, far right side. The encodings supported can be seen by going to Settings -> Preferences -> New Document/Default Directory and looking in the drop down.

    0 讨论(0)
  • 2020-12-05 13:08

    You can also use python chardet library

    # install the chardet library
    !pip install chardet
    
    # import the chardet library
    import chardet 
    
    # use the detect method to find the encoding
    # 'rb' means read in the file as binary
    with open("test.csv", 'rb') as file:
        print(chardet.detect(file.read()))
    
    0 讨论(0)
  • 2020-12-05 13:20

    In Linux systems, you can use file command. It will give the correct encoding

    Sample:

    file blah.csv
    

    Output:

    blah.csv: ISO-8859 text, with very long lines
    
    0 讨论(0)
  • 2020-12-05 13:20

    In Python, You can Try...

    from encodings.aliases import aliases
    alias_values = set(aliases.values())
    
    for encoding in set(aliases.values()):
        try:
            df=pd.read_csv("test.csv", encoding=encoding)
            print('successful', encoding)
        except:
            pass
    
    0 讨论(0)
  • 2020-12-05 13:25

    Use chardet https://github.com/chardet/chardet (documentation is short and easy to read).

    Install python, then pip install chardet, at last use the command line command.

    I tested under GB2312 and it's pretty accurate. (Make sure you have at least a few characters, sample with only 1 character may fail easily).

    file is not reliable as you can see.

    0 讨论(0)
  • 2020-12-05 13:27

    If you use Python, just use a print() function to check the encoding of a csv file. For example:

    with open('file_name.csv') as f:
        print(f)
    

    The output is something like this:

    <_io.TextIOWrapper name='file_name.csv' mode='r' encoding='utf8'>
    
    0 讨论(0)
提交回复
热议问题