was wondering if someone could give me a hand im trying to build a program that reads in a big data block of floats with unknown size from a csv file. I already wrote this i
I intended this as an edit to Dietmar Kuhl's solution, but it was rejected as too large an edit...
The usual reason given for converting Matlab to C++ is performance. So I benchmarked these two solutions. I compiled with G++ 4.7.3 for cygwin with the following options "-Wall -Wextra -std=c++0x -O3 -fwhole-program". I tested on a 32-bit Intel Atom N550.
As input I used 2 10,000 line files. The first file was 10 "0.0" values per line, the second file was 100 "0.0" values per line.
I timed from the command line using time and I used the average of the sum of user+sys over three runs.
I modified the second program to read from std::cin
as in the first program.
Finally, I ran the tests again with std::cin.sync_with_stdio(false);
Results (time in seconds):
sync no sync
10/line 100/line 10/line 100/line
prog A 1.839 16.873 0.721 6.228
prog B 1.741 16.098 0.721 5.563
The obvious conclusion is that version B is slightly faster, but more importantly, you should disable syncing with stdio.
int fclose(infile);
This line is wrong. The compiler thinks you're trying to initialize the variable fclose
with a FILE*
, which is wrong. It should be this if you're simply trying to close the file:
fclose(infile);
I would, obviously, just use IOStreams. Reading a homogeneous array or arrays from a CSV file without having to bother with any quoting is fairly trivial:
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
std::istream& comma(std::istream& in)
{
if ((in >> std::ws).peek() != std::char_traits<char>::to_int_type(',')) {
in.setstate(std::ios_base::failbit);
}
return in.ignore();
}
int main()
{
std::vector<std::vector<double>> values;
std::istringstream in;
for (std::string line; std::getline(std::cin, line); )
{
in.clear();
in.str(line);
std::vector<double> tmp;
for (double value; in >> value; in >> comma) {
tmp.push_back(value);
}
values.push_back(tmp);
}
for (auto const& vec: values) {
for (auto val: vec) {
std::cout << val << ", ";
}
std::cout << "\n";
}
}
Given the simple structure of the file, the logic can actually be simplified: Instead of reading the values individually, each line can be viewed as a sequence of values if the separators are read automatically. Since a comma won't be read automatically, the commas are replaced by spaced before creating the string stream for the internal lines. The corresponding code becomes
#include <algorithm>
#include <fstream>
#include <iostream>
#include <iterator>
#include <sstream>
#include <string>
#include <vector>
int main()
{
std::vector<std::vector<double> > values;
std::ifstream fin("textread.csv");
for (std::string line; std::getline(fin, line); )
{
std::replace(line.begin(), line.end(), ',', ' ');
std::istringstream in(line);
values.push_back(
std::vector<double>(std::istream_iterator<double>(in),
std::istream_iterator<double>()));
}
for (std::vector<std::vector<double> >::const_iterator
it(values.begin()), end(values.end()); it != end; ++it) {
std::copy(it->begin(), it->end(),
std::ostream_iterator<double>(std::cout, ", "));
std::cout << "\n";
}
}
Here is what happens:
values
is defined as a vector of vectors of double
. There isn't anything guaranteeing that the different rows are the same size but this is trivial to check once the file is read.std::ifstream
is defined and initialized with the file. It may be worth checking the file after construction to see if it could be opened for reading (if (!fin) { std::cout << "failed to open...\n";
).std::getline()
to read them into a std::string
. When std::getline()
fails it couldn't read another line and the conversion ends.line
is read, all commas are replaced by spaces.line
a string stream for reading the line is constructed. The original code reused a std::istringstream
which was declared outside the loop to save the cost of constructing the stream all the time. Since the stream goes bad when the lines is completed, it first needed to be in.clear()
ed before its content was set with in.str(line)
.std::istream_iterator<double>
which just read a value from the stream it is constructed with. The iterator given in
is the start of the sequence and the default constructed iterator is the end of the sequence.std::vector<double>
representing a row.Everything after that is simply printing the content of the produced matrix using C++11 features (range-based for and variables with auto
matically deduced type).
As proposed here changing getline escape may help you with better reading of csv file but you need to change type from string to int.
For dealing with any number of rows and cols you may use multi dimensional vector (vector inside vector as described here), then you have each row in one vector and all rows in the bigger vectors