How can I detect DOS line breaks in a file?

前端 未结 7 601
半阙折子戏
半阙折子戏 2020-12-29 09:46

I have a bunch of files. Some are Unix line endings, many are DOS. I\'d like to test each file to see if if is dos formatted, before I switch the line endings.

How

相关标签:
7条回答
  • 2020-12-29 10:07

    dos linebreaks are \r\n, unix only \n. So just search for \r\n.

    0 讨论(0)
  • 2020-12-29 10:17

    You could search the string for \r\n. That's DOS style line ending.

    EDIT: Take a look at this

    0 讨论(0)
  • 2020-12-29 10:18

    As a complete Python newbie & just for fun, I tried to find some minimalistic way of checking this for one file. This seems to work:

    if "\r\n" in open("/path/file.txt","rb").read():
        print "DOS line endings found"
    

    Edit: simplified as per John Machin's comment (no need to use regular expressions).

    0 讨论(0)
  • 2020-12-29 10:25

    (Python 2 only:) If you just want to read text files, either DOS or Unix-formatted, this works:

    print open('myfile.txt', 'U').read()
    

    That is, Python's "universal" file reader will automatically use all the different end of line markers, translating them to "\n".

    http://docs.python.org/library/functions.html#open

    (Thanks handle!)

    0 讨论(0)
  • 2020-12-29 10:27

    Python can automatically detect what newline convention is used in a file, thanks to the "universal newline mode" (U), and you can access Python's guess through the newlines attribute of file objects:

    f = open('myfile.txt', 'U')
    f.readline()  # Reads a line
    # The following now contains the newline ending of the first line:
    # It can be "\r\n" (Windows), "\n" (Unix), "\r" (Mac OS pre-OS X).
    # If no newline is found, it contains None.
    print repr(f.newlines)
    

    This gives the newline ending of the first line (Unix, DOS, etc.), if any.

    As John M. pointed out, if by any chance you have a pathological file that uses more than one newline coding, f.newlines is a tuple with all the newline codings found so far, after reading many lines.

    Reference: http://docs.python.org/2/library/functions.html#open

    If you just want to convert a file, you can simply do:

    with open('myfile.txt', 'U') as infile:
        text = infile.read()  # Automatic ("Universal read") conversion of newlines to "\n"
    with open('myfile.txt', 'w') as outfile:
        outfile.write(text)  # Writes newlines for the platform running the program
    
    0 讨论(0)
  • 2020-12-29 10:27

    You can use the following function (which should work in Python 2 and Python 3) to get the newline representation used in an existing text file. All three possible kinds are recognized. The function reads the file only up to the first newline to decide. This is faster and less memory consuming when you have larger text files, but it does not detect mixed newline endings.

    In Python 3, you can then pass the output of this function to the newline parameter of the open function when writing the file. This way you can alter the context of a text file without changing its newline representation.

    def get_newline(filename):
        with open(filename, "rb") as f:
            while True:
                c = f.read(1)
                if not c or c == b'\n':
                    break
                if c == b'\r':
                    if f.read(1) == b'\n':
                        return '\r\n'
                    return '\r'
        return '\n'
    
    0 讨论(0)
提交回复
热议问题