C++ cutting off character(s) when read lines from file

可紊 提交于 2019-12-12 04:56:17

问题


I know this has to do with the differences between the end-of-line designators in Windows and linux but I don't know how to fix it.

I did look at the posting at Getting std :: ifstream to handle LF, CR, and CRLF? but when I used a simplified version from that post (I used straight reads instead of buffered reads, knowing there was a performance penalty but wanting to keep it simple for now), it did not solve my problem so I am hoping for some guidance here. I did test my modified version of the post and it did successfully find and replace characters and a tab that I temporarily used for a test scenario, so the logic is working but I still have the problem.

I know I am missing something very basic here and I am likely going to feel very stupid when someone helps me figure this out, so I would rather not admit my stupidity publicly but I have been working on this for a week now and cannot solve it so I am reaching out for help.

I am new to C++ so please be gentle in you answers if I am doing something really noobie here :-)

I have the following one-file program that I have created to prototype what I want to do. So this is a simple example, but I need to get this to work to go further. This is NOT a homework problem; I really need to get this solved to create an application.

The program (shown below):

  • compiles without error or warnings and runs cleanly on a CentOS box;

  • cross compiles without error or warnings using mingw32 on a CentOS box and runs cleanly on Windows;

  • produces the correct (expected) output on both linux and Windows when I use an input text file created on linux
  • does NOT produce the correct (expected) output when I use an input text file created in Windows

So yes, it has something to do with the different file formats between linux and Windows and it likely has to do with the newline codes, but I have tried to accommodate that and it does not work.

To make it more complicated, I have discovered that old Mac newline characters are different yet again:

  • linux = \n
  • Windows = \r\n
  • Mac = \r

.

PLEASE help! ...

.

I want to:

  1. read in the contents of a txt file
  2. run some validation checks on the content (not done here; will do next)
  3. output a report to another txt file

so I need to check the file, determine the newline character(s) being used and handle accordingly

Any suggestions?

My current (simplified) code (with no validation checks yet) is:

[code]

int main(int argc, char** argv)
{
    std::string rc_input_file_name = "rc_input_file.txt";
    std::string rc_output_file_name = "rc_output_file.txt";

    char * RC_INPUT_FILE_NAME = new char[ rc_input_file_name.length() + 1 ];
    strcpy( RC_INPUT_FILE_NAME, rc_input_file_name.c_str() );
    char * RC_OUTPUT_FILE_NAME = new char[ rc_output_file_name.length() + 1 ];
    strcpy( RC_OUTPUT_FILE_NAME, rc_output_file_name.c_str() );

    bool failure_flag = false;

    std::ifstream rc_input_file_holder;
    rc_input_file_holder.open( RC_INPUT_FILE_NAME , std::ios::in );

    if ( ! rc_input_file_holder.is_open() )
    {
       std::cout << "Error - Could not open the input file" << std::endl;
       failure_flag = true;
    }
    else
    {
       std::ofstream rc_output_file_holder;
       rc_output_file_holder.open( RC_OUTPUT_FILE_NAME , std::ios::out | std::ios::trunc );

       if ( ! rc_output_file_holder.is_open() )
       {
          std::cout << "Error - Could not open or create the output file" << std::endl;
          failure_flag = true;
       }
       else
       {
          std::streampos char_num = 0;

          long int line_num = 0;
          long int starting_char_pos = 0;

          std::string file_line = "";
          while ( getline( rc_input_file_holder , file_line ) )
          {
             line_num = line_num + 1;
             long int file_line_length = file_line.length() +1 ;
             long int char_num = 0;
             for ( char_num = 0 ; char_num < file_line_length ;  char_num++ )
             {
                if ( file_line[ char_num ] == '\n' )
                {
                    if ( char_num == file_line_length - 1 )
                    {
                       file_line[ char_num ] = '-';
                    }
                    else
                    {
                       if ( file_line[ char_num + 1 ] == '\n' )
                       {
                          file_line[ char_num ] = ' ';
                       }
                       else
                       {
                          file_line[ char_num ] = ' ';
                       }
                    }
                }
             }

             int field_display_width = 4;
             std::cout << "Line " << std::setw( field_display_width ) << line_num << 
                    ", starting at character position " << std::setw( field_display_width ) << starting_char_pos << 
                    ", contains " << file_line << "." << std::endl;

             starting_char_pos = rc_input_file_holder.tellg();

             rc_output_file_holder << "Line " << line_num << ": " << file_line << std::endl;
          }

          rc_input_file_holder.close();
          rc_output_file_holder.close();
          delete [] RC_INPUT_FILE_NAME;
          delete [] RC_OUTPUT_FILE_NAME;
       }
    }

    if ( failure_flag )
    {
       return EXIT_FAILURE;
    }
    else
    {
       return EXIT_SUCCESS;
    }
}

[/code]

The same code, with lots of comments (for my benefit as a learning experience) is:

[code]

/*
 * The main function, from which all else is accessed
 */
int main(int argc, char** argv)
{


    /*
    *Program to:
    *  1) read from a text file
    *  2) do some validation checks on the content of that text file
    *  3) output a report to another text file
    */

    // Set the filenames to be used in this file-handling program
    std::string rc_input_file_name = "rc_input_file.txt";
    std::string rc_output_file_name = "rc_output_file.txt";

    // Note that when the filenames are used in the .open statements below
    //   they have to be in a cstring format, not a string format
    //   so the conversion is done here once
    // Use the Capitalized form of the file name to indicate the converted value
    //   (remember, variable names are case-sensitive in C/C++ so NAME is different than name)
    // This conversion could be done 3 ways:
    // - done each time the cstring is needed: 
    //          file_holder_name.open( string_file_name.c_str() )
    // - done once and referred to each time
    //     simple method: 
    //          const char * converted_file_name = string_file_name.c_str()
    //     explicit method (2-step):              
    //          char * converted_file_name = new char[ string_file_name.length() + 1 ];
    //          strcpy( converted_file_name, string_file_name.c_str() );
    // This program uses the explicit method to do it once for each filename
    // because by doing so, the char array created has variable length
    // and you do not risk buffer overflow
    char * RC_INPUT_FILE_NAME = new char[ rc_input_file_name.length() + 1 ];
    strcpy( RC_INPUT_FILE_NAME, rc_input_file_name.c_str() );
    char * RC_OUTPUT_FILE_NAME = new char[ rc_output_file_name.length() + 1 ];
    strcpy( RC_OUTPUT_FILE_NAME, rc_output_file_name.c_str() );

    // This will be set to true if either the input or output file cannot be opened
    bool failure_flag = false;

    // Open the input file
    std::ifstream rc_input_file_holder;
    rc_input_file_holder.open( RC_INPUT_FILE_NAME , std::ios::in );

    // Validate that the input file was properly opened/created
    // If not, set failure flag
    if ( ! rc_input_file_holder.is_open() )
    {
       // Could not open the input file; set failure flag to true
       std::cout << "Error - Could not open the input file" << std::endl;
       failure_flag = true;
    }
    else
    {
       // Open the output file
       // Create one if none previously existed
       // Erase the contents if it already existed
       std::ofstream rc_output_file_holder;
       rc_output_file_holder.open( RC_OUTPUT_FILE_NAME , std::ios::out | std::ios::trunc );

       // Validate that the output file was properly opened/created
       // If not, set failure flag
       if ( ! rc_output_file_holder.is_open() )
       {
          // Could not open the output file; set failure flag to true
          std::cout << "Error - Could not open or create the output file" << std::endl;
          failure_flag = true;
       }
       else
       {
          // Get the current position where the character pointer is at
          // Get it before the getline is executed so it gives you where the current line starts
          std::streampos char_num = 0;

          // Initialize the line_number and starting character position to 0
          long int line_num = 0;
          long int starting_char_pos = 0;

          std::string file_line = "";
          while ( getline( rc_input_file_holder , file_line ) )
          {
             // Set the line number counter to the current line (first line is Line 1, not 0)
             line_num = line_num + 1;


             // Check if the new line designator uses the standard for:
             //   - linux (\n)
             //   - Windows (\r\n)
             //   - Old Mac (\r)
             // Convert any non-linux new line designator to linux new line designator (\n)
             long int file_line_length = file_line.length() +1 ;
             long int char_num = 0;
             for ( char_num = 0 ; char_num < file_line_length ;  char_num++ )
             {
                // If a \r character is found, decide what to do with it
                if ( file_line[ char_num ] == '\n' )
                {
                    // If the \r char  is the last line character (before the null terminator)
                    //   the file use the old Mac format to indicate new line
                    //   so replace the \r with \n
                    if ( char_num == file_line_length - 1 )
                    {
                       file_line[ char_num ] = '-';
                    }
                    else
                    // If the \r char is NOT the last line character (before the null terminator)
                    {
                       // If the next character is a \n, the file uses the Windows format to indicate new line
                       //   so replace the \r with space
                       if ( file_line[ char_num + 1 ] == '\n' )
                       {
                          file_line[ char_num ] = ' ';
                       }
                       // If the next char is NOT a \n (and the pointer is NOT at the last line character)
                       //   then for some reason, there is a \r in the interior of the string
                       // At this point, I do  not know why this would be
                       //   but I don't want it left there, so replace it with a space
                       // Yes, I  know this is the same as the above action, 
                       //   but I left is separate to allow for future flexibility
                       else
                       {
                          file_line[ char_num ] = '-';
                       }
                    }
                }
             }


             // Output the contents of the line just fetched
             // This is done in this prototype file as a placeholder
             // In the real program, this is where the validation check(s) for the line would occur)
             //   and would likely be done in a function or class
             // The setw() function requires #include <iomanip>
             int field_display_width = 4;
             std::cout << "Line " << std::setw( field_display_width ) << line_num << 
                    ", starting at character position " << std::setw( field_display_width ) << starting_char_pos << 
                    ", contains " << file_line << "." << std::endl;

             // Reset the character pointer to the end of this line => start of next line
             starting_char_pos = rc_input_file_holder.tellg();

             // Output the (edited) contents of the line just fetched
             // This is done in this prototype file as a placeholder
             // In the real program, this is where the results of the validation checks would be recorded
             // You could put this in an if statement and record nothing if the line was valid
             rc_output_file_holder << "Line " << line_num << ": " << file_line << std::endl;
          }

          // Clean up by:
          //  - closing the files that were opened (input and output)
          //  - deleting the character arrays created
          rc_input_file_holder.close();
          rc_output_file_holder.close();
          delete [] RC_INPUT_FILE_NAME;
          delete [] RC_OUTPUT_FILE_NAME;
       }
    }

    // Check to see if all operations have successfully completed
    // If so exit this program with success indicated
    // If not,exit this program with failure indicated
    if ( failure_flag )
    {
       return EXIT_FAILURE;
    }
    else
    {
       return EXIT_SUCCESS;
    }
}

[/code]

I have all the proper includes and there are no errors or warnings generated when I compile for linux or cross-compile for Windows.

The input file I am using has just 5 lines of (silly) text:

A new beginning
just in case
the file was corrupted
and the darn program was working fine ...
at least it was on linux

and the output on linux is, as expected:

Line    1, starting at character position    0, contains A new beginning.
Line    2, starting at character position   16, contains just in case.
Line    3, starting at character position   29, contains the file was corrupted.
Line    4, starting at character position   52, contains and the darn program was working fine ....
Line    5, starting at character position   94, contains at least it was on linux.

The output in Windows is the same when I import the text file create in linux, but when I use notepad and manually recreate the same file in Windows the ouput is

Line    1, starting at character position    0, contains A new beginning.
Line    2, starting at character position   20, contains t in case.
Line    3, starting at character position   33, contains e file was corrupted.
Line    4, starting at character position   56, contains nd the darn program was working fine ....
Line    5, starting at character position   98, contains at least it was on linux.

Note the differences in the starting character position for lines 2,3,4 and 5 Note the missing characters at the start of line 2,3 and 4

  • 3 characters missing from line 2
  • 2 characters missing for line 3
  • 1 character missing from line 5
  • 0 characters missing from line 5

Any and all ideas welcome ...


回答1:


See resolution at

cross-compiler out of date

To net it out, the mingw cross-compiler installed via apt-get install was old and out of date. When I manually installed an updated cross-compiler, and updated the settings to prevent some error messages, all worked fine.



来源:https://stackoverflow.com/questions/29681515/c-cutting-off-characters-when-read-lines-from-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!