Reading files with DOS line endings using fgets() on linux

后端 未结 4 585
小蘑菇
小蘑菇 2021-01-22 16:15

I have a file with DOS line endings that I receive at run-time, so I cannot convert the line endings to UNIX-style offline. Also, my app runs on both Windows and Linux. My app d

相关标签:
4条回答
  • 2021-01-22 16:43

    On Unix, the lines would be read to the newline \n and would include the carriage return \r. You would need to trim both off the end.

    0 讨论(0)
  • 2021-01-22 16:49

    You'll get what's actually in the file, including the \r characters. In unix there aren't text files and binary files, there are just files, and stdio doesn't do conversions. After reading a line into a buffer with fgets, you can do:

    char *p = strrchr(buffer, '\r');
    if(p && p[1]=='\n' && p[2]=='\0') {
        p[0] = '\n';
        p[1] = '\0';
    }
    

    That will change a terminating \r\n\0 into \n\0. Or you could just do p[0]='\0' if you don't want to keep the \n.

    Note the use of strrchr, not strchr. There's nothing that prevents multiple \rs from being present in the middle of a line, and you probably don't want to truncate the line at the first one.

    Answer to the EDIT section of the question: yes, the "b" in "rb" is a no-op in unix.

    0 讨论(0)
  • 2021-01-22 17:00

    fgets() keeps line endings.

    http://msdn.microsoft.com/en-us/library/c37dh6kf(v=vs.80).aspx

    fgets() itself doesn't have any special options for converting line endings, but on Windows, you can choose to either open a file in "binary" mode, or in "text" mode. In text mode Windows converts the CR/LF sequence (C string: "\r\n") into just a newline (C string: "\n"). It's a feature so that you can write the same code for Windows and Linux and it will work (you don't need "\r\n" on Windows and just "\n" on Linux).

    http://msdn.microsoft.com/en-US/library/yeby3zcb(v=vs.80)

    Note that the Windows call to fopen() takes the same arguments as the call to fopen() in Linux. The "binary" mode needs a non-standard character ('b') in the file mode, but the "text" mode is the default. So I suggest you just use the same code lines for Windows and Linux; the Windows version of fopen() is designed for that.

    The Linux version of the C library doesn't have any tricky features. If the text file has CR/LF line endings, then that is what you get when you read it. Linux fopen() will accept a 'b' in the options, but ignores it!

    http://linux.die.net/man/3/fopen

    http://linux.die.net/man/3/fgets

    0 讨论(0)
  • 2021-01-22 17:05

    Although the other answers gave satisfying information regarind the question what kind of line ending would be returned for a DOS file read under UNIX, I'd like to mentioned an alternative way to chop off such line endings.

    The significant difference is, that the following approach is multi-byte-character save, as it does not involve any characters directly:

    if (pszLine && (2 <= strlen(pszLine)))
    { 
      size_t size = strcspn(pszLine, "\r\n"); 
      pszLine[size] = 0; 
    } 
    
    0 讨论(0)
提交回复
热议问题