This function is supposed to search through a text file for the new line character. When it finds the newline character, it increments the newLine
counter, and when
The logic looks correct if you have Unix line endings. If you have Windows CRLF line endings but are processing the file on Unix, you have a CR before each LF, and the CR resets newLine
to zero, so you get the message for each newline.
This would explain what you're seeing.
It would also explain why everyone else is saying your logic is correct (it is — provided that the lines end with just LF and not CRLF) but you are seeing an unexpected result.
How to resolve it?
Fair question. One major option is to use dos2unix
or an equivalent mechanism to convert the DOS file into a Unix file. There are many questions on the subject on SO.
If you don't need the CR ('\r'
in C) characters at all, you can simply delete (not print, and not zero newLine
) those.
If you need to preserve the CRLF line endings, you'll need to be a bit more careful. You'll have to record that you got a CR, then check that you get an LF, then print the pair, and then check whether you get any more CRLF sequences and suppress those, etc.
dupnl.c
This program only reads from standard input; this is more flexible than
only reading from a fixed file name. Learn to avoid writing code which
only works with one file name; it will save you lots of recompilation
over time. Th code handles Unix-style files with newlines ("\n"
) only
at the end; it also handles DOS files with CRLF ("\r\n"
) endings; and
it also handles (old style) Mac (Mac OS 9 and earlier) files with CR
("\r"
) line endings. In fact, it handes arbitrary interleavings of
the different line ending styles. If you want enforcement of a single
mode, you have to do some work to decide which mode, and then use an
appropriate subset of this code.
#include
int main(void)
{
FILE *fp = stdin; // Instead of fopen()
int newLine = 1;
int c;
while ((c = fgetc(fp)) != EOF)
{
if (c == '\n')
{
/* Unix NL line ending */
if (newLine++ == 0)
putchar(c);
}
else if (c == '\r')
{
int c1 = fgetc(fp);
if (c1 == '\n')
{
/* DOS CRLF line ending */
if (newLine++ == 0)
{
putchar(c);
putchar(c1);
}
}
else
{
/* MAC CR line ending */
if (newLine++ == 0)
putchar(c);
if (c1 != EOF && c1 != '\r')
ungetc(c1, stdin);
}
}
else
{
putchar(c);
newLine = 0;
}
}
return 0;
}
$ cat test.unx
data long enough to be seen 1 - Unix
data long enough to be seen 2 - Unix
data long enough to be seen 3 - Unix
data long enough to be seen 4 - Unix
data long enough to be seen 5 - Unix
$ sed 's/Unix/DOS/g' test.unx | ule -d > test.dos
$ cat test.dos
data long enough to be seen 1 - DOS
data long enough to be seen 2 - DOS
data long enough to be seen 3 - DOS
data long enough to be seen 4 - DOS
data long enough to be seen 5 - DOS
$ sed 's/Unix/Mac/g' test.unx | ule -m > test.mac
$ cat test.mac
$ ta long enough to be seen 5 - Mac
$ odx test.mac
0x0000: 0D 0D 64 61 74 61 20 6C 6F 6E 67 20 65 6E 6F 75 ..data long enou
0x0010: 67 68 20 74 6F 20 62 65 20 73 65 65 6E 20 31 20 gh to be seen 1
0x0020: 2D 20 4D 61 63 0D 0D 64 61 74 61 20 6C 6F 6E 67 - Mac..data long
0x0030: 20 65 6E 6F 75 67 68 20 74 6F 20 62 65 20 73 65 enough to be se
0x0040: 65 6E 20 32 20 2D 20 4D 61 63 0D 64 61 74 61 20 en 2 - Mac.data
0x0050: 6C 6F 6E 67 20 65 6E 6F 75 67 68 20 74 6F 20 62 long enough to b
0x0060: 65 20 73 65 65 6E 20 33 20 2D 20 4D 61 63 0D 64 e seen 3 - Mac.d
0x0070: 61 74 61 20 6C 6F 6E 67 20 65 6E 6F 75 67 68 20 ata long enough
0x0080: 74 6F 20 62 65 20 73 65 65 6E 20 34 20 2D 20 4D to be seen 4 - M
0x0090: 61 63 0D 0D 0D 0D 64 61 74 61 20 6C 6F 6E 67 20 ac....data long
0x00A0: 65 6E 6F 75 67 68 20 74 6F 20 62 65 20 73 65 65 enough to be see
0x00B0: 6E 20 35 20 2D 20 4D 61 63 0D 0D 0D n 5 - Mac...
0x00BC:
$ dupnl < test.unx
data long enough to be seen 1 - Unix
data long enough to be seen 2 - Unix
data long enough to be seen 3 - Unix
data long enough to be seen 4 - Unix
data long enough to be seen 5 - Unix
$ dupnl < test.dos
data long enough to be seen 1 - DOS
data long enough to be seen 2 - DOS
data long enough to be seen 3 - DOS
data long enough to be seen 4 - DOS
data long enough to be seen 5 - DOS
$ dupnl < test.mac
$ ta long enough to be seen 5 - Mac
$ dupnl < test.mac | odx
0x0000: 64 61 74 61 20 6C 6F 6E 67 20 65 6E 6F 75 67 68 data long enough
0x0010: 20 74 6F 20 62 65 20 73 65 65 6E 20 31 20 2D 20 to be seen 1 -
0x0020: 4D 61 63 0D 64 61 74 61 20 6C 6F 6E 67 20 65 6E Mac.data long en
0x0030: 6F 75 67 68 20 74 6F 20 62 65 20 73 65 65 6E 20 ough to be seen
0x0040: 32 20 2D 20 4D 61 63 0D 64 61 74 61 20 6C 6F 6E 2 - Mac.data lon
0x0050: 67 20 65 6E 6F 75 67 68 20 74 6F 20 62 65 20 73 g enough to be s
0x0060: 65 65 6E 20 33 20 2D 20 4D 61 63 0D 64 61 74 61 een 3 - Mac.data
0x0070: 20 6C 6F 6E 67 20 65 6E 6F 75 67 68 20 74 6F 20 long enough to
0x0080: 62 65 20 73 65 65 6E 20 34 20 2D 20 4D 61 63 0D be seen 4 - Mac.
0x0090: 64 61 74 61 20 6C 6F 6E 67 20 65 6E 6F 75 67 68 data long enough
0x00A0: 20 74 6F 20 62 65 20 73 65 65 6E 20 35 20 2D 20 to be seen 5 -
0x00B0: 4D 61 63 0D Mac.
0x00B4:
$
The lines starting $ ta
are where the prompt overwrites the previous output (and the 'long enough to be seen' part is because my prompt is normally longer than just $
).
odx
is a hex dump program. ule
is for 'uniform line endings' and analyzes or transforms data so it has uniform line endings.
Usage: ule [-cdhmnsuzV] [file ...]
-c Check line endings (default)
-d Convert to DOS (CRLF) line endings
-h Print this help and exit
-m Convert to MAC (CR) line endings
-n Ensure line ending at end of file
-s Write output to standard output (default)
-u Convert to Unix (LF) line endings
-z Check for zero (null) bytes
-V Print version information and exit