How to extract data from a line which has fields separated by '|' character in C++?

為{幸葍}努か 提交于 2019-12-03 09:04:22

I prefer to use the String Toolkit. The String Toolkit will take care of converting the numbers as it parses.

Here is how I would solve it.

#include <fstream>
#include <strtk.hpp>   // http://www.partow.net/programming/strtk

using namespace std;

// using strings instead of character arrays
class Employee
{
    public:
    int index;
    int employee_number;
    std::string name;
    std::string department;
    std::string band;
    std::string location;
};


std::string filename("empdata.txt");

// assuming the file is text
std::fstream fs;
fs.open(filename.c_str(), std::ios::in);

if(fs.fail())  return false;   

const char *whitespace    = " \t\r\n\f";

const char *delimiter    = "|";

std::vector<Employee> employee_data;

// process each line in turn
while( std::getline(fs, line ) )
{

// removing leading and trailing whitespace
// can prevent parsing problemsfrom different line endings.

    strtk::remove_leading_trailing(whitespace, line);


    // strtk::parse combines multiple delimeters in these cases

    Employee e;

    if( strtk::parse(line, delimiter, e.index, e.employee_number, e.name, e.department, e.band, e.location) )
    {
         std::cout << "succeed" << std::endl;
     employee_data.push_back( e );
    }

}

AFAIK, there is nothing that does it out of the box. But you have all the tools to build it yourself

The C way

You read the lines into a char * (with cin.getline()) and then use strtok, and strcpy

The getline way

The getline function accept a third parameter to specify a delimiter. You can make use of that to split the line through a istringstream. Something like :

int main() {
    std::string line, temp;
    std::ifstream myfile("file.txt");
    std::getline(myfile, line);
    while (myfile.good()) {
        empdata data;
        std::getline(myfile, line);
        if (myfile.eof()) {
            break;
        }
        std::istringstream istr(line);
        std::getline(istr, temp, '|');
        data.sl = ::strtol(temp.c_str(), NULL, 10);
        std::getline(istr, temp, '|');
        data.empNO = ::strtol(temp.c_str(), NULL, 10);
        istr.getline(data.name, sizeof(data.name), '|');
        istr.getline(data.department, sizeof(data.department), '|');
        istr.getline(data.band, sizeof(data.band), '|');
        istr.getline(data.location, sizeof(data.location), '|');
    }
    return 0;
}

This is the C++ version of the previous one

The find way

You read the lines into a string (as you currently do) and use string::find(char sep, size_t pos) to find next occurence of the separator and copy the data (from string::c_str()) between start of substring and separator to your fields

The manual way

You just iterate the string. If the character is a separator, you put a NULL at the end of current field and pass to next field. Else, you just write the character in current position of current field.

Which to choose ?

If you are more used to one of them, stick to it.

Following is just my opinion.

The getline way will be the simplest to code and to maintain.

The find way is mid level. It is still at a rather high level and avoids the usage of istringstream.

The manual way will be really low level, so you should structure it to make it maintainable. For example your could a explicit description of the lines as an array of fields with a maximimum size and current position. And as you have both int and char[] fields it will be tricky. But you can easily configure it the way you want. For example, your code only allow 20 characters for department field, whereas Research and Development in line 2 is longer. Without special processing, the getline way will leave the istringstream in bad state and will not read anything more. And even if you clear the state, you will be badly positionned. So you should first read into a std::string and then copy the beginning to the char * field.

Here is a working manual implementation :

class Field {
public:
    virtual void reset() = 0;
    virtual void add(empdata& data, char c) = 0;
};

class IField: public Field {
private:
    int (empdata::*data_field);
    bool ok;

public:
    IField(int (empdata::*field)): data_field(field) {
        ok = true;
        reset();
    }
    void reset() { ok = true; }
    void add(empdata& data, char c);
};

void IField::add(empdata& data, char c) {
    if (ok) {
        if ((c >= '0') && (c <= '9')) {
            data.*data_field = data.*data_field * 10  + (c - '0');
        }
        else {
            ok = false;
        }
    }
}


class CField: public Field {
private:
    char (empdata::*data_field);
    size_t current_pos;
    size_t size;

public:
    CField(char (empdata::*field), size_t size): data_field(field), size(size) {
        reset();
    }
    void reset() { current_pos = 0; }
    void add(empdata& data, char c);
};

void CField::add(empdata& data, char c) {
    if (current_pos < size) {
        char *ix = &(data.*data_field);
        ix[current_pos ++] = c;
        if (current_pos == size) {
            ix[size -1] = '\0';
            current_pos +=1;
        }
    }
}

int main() {
    std::string line, temp;
    std::ifstream myfile("file.txt");
    Field* fields[] = {
        new IField(&empdata::sl),
        new IField(&empdata::empNO),
        new CField(reinterpret_cast<char empdata::*>(&empdata::name), 20),
        new CField(reinterpret_cast<char empdata::*>(&empdata::department), 20),
        new CField(reinterpret_cast<char empdata::*>(&empdata::band), 3),
        new CField(reinterpret_cast<char empdata::*>(&empdata::location), 20),
        NULL
    };
    std::getline(myfile, line);
    while (myfile.good()) {
        Field** f = fields;
        empdata data = {0};
        std::getline(myfile, line);
        if (myfile.eof()) {
            break;
        }
        for (std::string::const_iterator it = line.begin(); it != line.end(); it++) {
            char c;
            c = *it;
            if (c == '|') {
                f += 1;
                if (*f == NULL) {
                    continue;
                }
                (*f)->reset();
            }
            else {
                (*f)->add(data, c);
            }
        }
        // do something with data ...
    }
    for(Field** f = fields; *f != NULL; f++) {
        free(*f);
    }
    return 0;
}

It is directly robust, efficient and maintainable : adding a field is easy, and it is tolerant to errors in input file. But it is way loooonger than the other ones, and would need much more tests. So I would not advise to use it without special reasons (necessity to accept multiple separators, optional fields and dynamic order, ...)

Forhad

Try this simple code segment , this will read the file and , give a print , you can read line by line and later you can use that to process as you need .

Data : provided bu you : in file named data.txt.

package com.demo;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;

public class Demo {

    public static void main(String a[]) {
        try {
            File file = new File("data.txt");
            FileReader fileReader = new FileReader(file);
            BufferedReader bufferReader = new BufferedReader(fileReader);
            String data;

            while ((data = bufferReader.readLine()) != null) {
                // data = br.readLine( );
                System.out.println(data);
            }   

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

In console you will get output like this :

Sl|EmployeeID|Name|Department|Band|Location
1|327427|Brock Mcneil|Research and Development|U2|Pune
2|310456|Acton Golden|Advertising|P3|Hyderabad
3|305540|Hollee Camacho|Payroll|U3|Bangalore
4|218801|Simone Myers|Public Relations|U3|Pune
5|144051|Eaton Benson|Advertising|P1|Chennai

This is a simple idea, you may do what you need.

dau_sama

In C++ you can change the locale to add an extra character to the separator list of the current locale:

#include <locale>
#include <iostream>

struct pipe_is_space : std::ctype<char> {
  pipe_is_space() : std::ctype<char>(get_table()) {}
  static mask const* get_table()
  {
    static mask rc[table_size];
    rc['|'] = std::ctype_base::space;
    rc['\n'] = std::ctype_base::space;
    return &rc[0];
  }
};

int main() {
  using std::string;
  using std::cin;
  using std::locale;

  cin.imbue(locale(cin.getloc(), new pipe_is_space));

  string word;
  while(cin >> word) {
    std::cout << word << "\n";
  }
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!