c++ program for reading an unknown size csv file (filled only with floats) with constant (but unknown) number of columns into an array

后端 未结 4 1786
别那么骄傲
别那么骄傲 2021-01-16 09:24

was wondering if someone could give me a hand im trying to build a program that reads in a big data block of floats with unknown size from a csv file. I already wrote this i

相关标签:
4条回答
  • 2021-01-16 09:52

    I intended this as an edit to Dietmar Kuhl's solution, but it was rejected as too large an edit...

    The usual reason given for converting Matlab to C++ is performance. So I benchmarked these two solutions. I compiled with G++ 4.7.3 for cygwin with the following options "-Wall -Wextra -std=c++0x -O3 -fwhole-program". I tested on a 32-bit Intel Atom N550.

    As input I used 2 10,000 line files. The first file was 10 "0.0" values per line, the second file was 100 "0.0" values per line.

    I timed from the command line using time and I used the average of the sum of user+sys over three runs.

    I modified the second program to read from std::cin as in the first program.

    Finally, I ran the tests again with std::cin.sync_with_stdio(false);

    Results (time in seconds):

                   sync                no sync
            10/line  100/line     10/line  100/line
    prog A    1.839    16.873       0.721     6.228
    prog B    1.741    16.098       0.721     5.563
    

    The obvious conclusion is that version B is slightly faster, but more importantly, you should disable syncing with stdio.

    0 讨论(0)
  • 2021-01-16 09:53
    int fclose(infile);
    

    This line is wrong. The compiler thinks you're trying to initialize the variable fclose with a FILE*, which is wrong. It should be this if you're simply trying to close the file:

    fclose(infile);
    
    0 讨论(0)
  • 2021-01-16 10:00

    I would, obviously, just use IOStreams. Reading a homogeneous array or arrays from a CSV file without having to bother with any quoting is fairly trivial:

    #include <iostream>
    #include <sstream>
    #include <string>
    #include <vector>
    
    std::istream& comma(std::istream& in)
    {
        if ((in >> std::ws).peek() != std::char_traits<char>::to_int_type(',')) {
            in.setstate(std::ios_base::failbit);
        }
        return in.ignore();
    }
    
    int main()
    {
        std::vector<std::vector<double>> values;
        std::istringstream in;
        for (std::string line; std::getline(std::cin, line); )
        {
            in.clear();
            in.str(line);
            std::vector<double> tmp;
            for (double value; in >> value; in >> comma) {
                tmp.push_back(value);
            }
            values.push_back(tmp);
        }
    
        for (auto const& vec: values) {
            for (auto val: vec) {
                std::cout << val << ", ";
            }
            std::cout << "\n";
        }
    }
    

    Given the simple structure of the file, the logic can actually be simplified: Instead of reading the values individually, each line can be viewed as a sequence of values if the separators are read automatically. Since a comma won't be read automatically, the commas are replaced by spaced before creating the string stream for the internal lines. The corresponding code becomes

    #include <algorithm>
    #include <fstream>
    #include <iostream>
    #include <iterator>
    #include <sstream>
    #include <string>
    #include <vector>
    
    int main()
    {
        std::vector<std::vector<double> > values;
        std::ifstream fin("textread.csv");
        for (std::string line; std::getline(fin, line); )
        {
            std::replace(line.begin(), line.end(), ',', ' ');
            std::istringstream in(line);
            values.push_back(
                std::vector<double>(std::istream_iterator<double>(in),
                                    std::istream_iterator<double>()));
        }
    
        for (std::vector<std::vector<double> >::const_iterator
                 it(values.begin()), end(values.end()); it != end; ++it) {
            std::copy(it->begin(), it->end(),
                      std::ostream_iterator<double>(std::cout, ", "));
            std::cout << "\n";
        }
    }
    

    Here is what happens:

    1. The destination values is defined as a vector of vectors of double. There isn't anything guaranteeing that the different rows are the same size but this is trivial to check once the file is read.
    2. An std::ifstream is defined and initialized with the file. It may be worth checking the file after construction to see if it could be opened for reading (if (!fin) { std::cout << "failed to open...\n";).
    3. The file is processed one line at a time. The lines are simply read using std::getline() to read them into a std::string. When std::getline() fails it couldn't read another line and the conversion ends.
    4. Once the line is read, all commas are replaced by spaces.
    5. From the thus modified line a string stream for reading the line is constructed. The original code reused a std::istringstream which was declared outside the loop to save the cost of constructing the stream all the time. Since the stream goes bad when the lines is completed, it first needed to be in.clear()ed before its content was set with in.str(line).
    6. The individual values are iterated using an std::istream_iterator<double> which just read a value from the stream it is constructed with. The iterator given in is the start of the sequence and the default constructed iterator is the end of the sequence.
    7. The sequence of values produced by the iterators is used to immediately construct a temporary std::vector<double> representing a row.
    8. The temporary vector is pushed to the end of the target array.

    Everything after that is simply printing the content of the produced matrix using C++11 features (range-based for and variables with automatically deduced type).

    0 讨论(0)
  • 2021-01-16 10:09

    As proposed here changing getline escape may help you with better reading of csv file but you need to change type from string to int.

    For dealing with any number of rows and cols you may use multi dimensional vector (vector inside vector as described here), then you have each row in one vector and all rows in the bigger vectors

    0 讨论(0)
提交回复
热议问题