“ValueError: embedded null character” when using open()

后端 未结 4 1138
一整个雨季
一整个雨季 2021-01-04 06:48

I am taking python at my college and I am stuck with my current assignment. We are supposed to take 2 files and compare them. I am simply trying to open the files so I can

相关标签:
4条回答
  • 2021-01-04 07:30

    If you are trying to open a file then you should use the path generated by os, like so:

    import os
    os.path.join("path","to","the","file")
    
    0 讨论(0)
  • 2021-01-04 07:32

    It seems that you have problems with characters "\" and "/". If you use them in input - try to change one to another...

    0 讨论(0)
  • 2021-01-04 07:32

    The problem is due to bytes data that needs to be decoded.

    When you insert a variable into the interpreter, it displays it's repr attribute whereas print() takes the str (which are the same in this scenario) and ignores all unprintable characters such as: \x00, \x01 and replaces them with something else.

    A solution is to "decode" file1_content (ignore bytes):

    file1_content = ''.join(x for x in file1_content if x.isprintable())
    
    0 讨论(0)
  • 2021-01-04 07:37

    Default encoding of files for Python 3.5 is 'utf-8'.

    Default encoding of files for Windows tends to be something else.

    If you intend to open two text files, you may try this:

    import locale
    locale.getdefaultlocale()
    file1 = input("Enter the name of the first file: ")
    file1_open = open(file1, encoding=locale.getdefaultlocale()[1])
    file1_content = file1_open.read()
    

    There should be some automatic detection in the standard library.

    Otherwise you may create your own:

    def guess_encoding(csv_file):
        """guess the encoding of the given file"""
        import io
        import locale
        with io.open(csv_file, "rb") as f:
            data = f.read(5)
        if data.startswith(b"\xEF\xBB\xBF"):  # UTF-8 with a "BOM"
            return "utf-8-sig"
        elif data.startswith(b"\xFF\xFE") or data.startswith(b"\xFE\xFF"):
            return "utf-16"
        else:  # in Windows, guessing utf-8 doesn't work, so we have to try
            try:
                with io.open(csv_file, encoding="utf-8") as f:
                    preview = f.read(222222)
                    return "utf-8"
            except:
                return locale.getdefaultlocale()[1]
    

    and then

    file1 = input("Enter the name of the first file: ")
    file1_open = open(file1, encoding=guess_encoding(file1))
    file1_content = file1_open.read()
    
    0 讨论(0)
提交回复
热议问题