How to split a text file using tab space in c?

后端 未结 3 838
情书的邮戳
情书的邮戳 2021-01-26 10:44

I am doing a socket programming for that I having the following details in the text file. I want to split that text file using tab space and I need to assign them to the variabl

相关标签:
3条回答
  • 2021-01-26 10:58

    Your problems is one of the rare instance where reading formatted input with the formatted input function scanf (or reading a line and using sscanf) actually makes sense. If your records are tab separated values, then you can craft a scanf format string to read each of your fields in a reasonably clean manner.

    The key for using scanf is to always validate the return (the number of successful conversions that took place based on the number of format specifiers in your format string). You must also protect array widths if you are reading strings into fixed buffers to prevent writing beyond array or allocation bounds by using the appropriate field-width modifiers.

    Putting those pieces together, you could do something like the following:

        int rtn = scanf (" %7s\t%63[^\t]\t%u\t%lf",  /* save scanf return */
                            tmp.idx, tmp.desc, &tmp.n, &tmp.price);
    

    (with your storage sized appropriately -- I guessed at what your fields were)

    Speaking of storage, any time you have data of different types that you need to coordinate, you should think struct. Here, for example purposes, a struct with reasonably sized fixed buffers is fine, e.g.

    /* constants - index, description width, max records */
    enum { IDX = 8, DESC = 64, MAXR = 128 };
    
    typedef struct {    /* struct to hold each record values */
        char idx[IDX],  /* 1st field */
            desc[DESC]; /* 2nd field */
        double price;   /* 4th & 3rd - ordered to put smallest last */
        unsigned n;
    } rec_t;
    

    To handle your storage needs, you simply declare an array of rec_t, e.g.

        rec_t record[MAXR] = {{ .idx = "" }},   /* array of struct */
            tmp = { .idx = "" };                /* tmp struct for read */
    

    When reading into an array of struct, it is often useful to use a temporary struct to fill with values from scanf (or whatever you are using), and then after validating your scanf return (conversions), you can simply assign the tmp struct to the next element in your array and increment the array index, e.g.

            if (rtn == 4)           /* validate 4 conversions */
                record[n++] = tmp;  /* assign tmp to record[n], increment */
    

    I find it easier to simply loop continually when taking input with scanf and then validate the return, checking for EOF and then simply break your read-loop if you encounter EOF (or you otherwise satisfy your input needs).

    Putting all the pieces together, you could do something like the following, which happily skips the empty lines in your file shown above, only storing values when the scanf return indicates all successful conversions took place. The program reads data on stdin, though you can easily modify the code to open a given filename for the read.

    #include <stdio.h>
    #include <stdlib.h>
    
    /* constants - index, description width, max records */
    enum { IDX = 8, DESC = 64, MAXR = 128 };
    
    typedef struct {    /* struct to hold each record values */
        char idx[IDX],  /* 1st field */
            desc[DESC]; /* 2nd field */
        double price;   /* 4th & 3rd - ordered to put smallest last */
        unsigned n;
    } rec_t;
    
    int main (void) {
    
        rec_t record[MAXR] = {{ .idx = "" }},   /* array of struct */
            tmp = { .idx = "" };                /* tmp struct for read */
        unsigned n = 0;                         /* n - records */
    
        while (n < MAXR) {  /* loop while n < max records (MAXR) */
            int rtn = scanf (" %7s\t%63[^\t]\t%u\t%lf",  /* save scanf return */
                            tmp.idx, tmp.desc, &tmp.n, &tmp.price);
            if (rtn == EOF)         /* return EOF? */
                break;
            if (rtn == 4)           /* validate 4 conversions */
                record[n++] = tmp;  /* assign tmp to record[n], increment */
        }
    
        for (unsigned i = 0; i < n; i++)    /* output array */
            printf ("record[%3u]:  %-8s %-24s %3u    %9.2f\n", i, record[i].idx, 
                    record[i].desc, record[i].n, record[i].price);
    }
    

    Example Use/Output

    $ ./bin/readtsv <dat/file.tsv
    record[  0]:  001      Coffee maker              10      3000.00
    record[  1]:  002      Pressure cooker            4      7000.00
    record[  2]:  003      Blender                   10      2500.00
    record[  3]:  004      Pillow                    10       300.00
    record[  4]:  005      Camera                     5     25000.00
    record[  5]:  006      Washer                     5     25000.00
    record[  6]:  007      Headphone                  3      5000.00
    record[  7]:  008      Mattresses                 5      6000.00
    record[  8]:  009      Heater                     3      1000.00
    record[  9]:  010      Cookware                   2     10000.00
    

    Look things over and let me know if you have any questions.

    0 讨论(0)
  • 2021-01-26 11:01

    Here is a sample code to read line-by-line from file and split strings at tab character.

    strtok modifies the source string passed and it uses static buffer while parsing, so it's not thread safe. If no delimiter is present in the line then the first call to strtok returns the whole string. You need to handle this case.

    void split_string(char *line) {
        const char delimiter[] = "\t";
        char *tmp;
    
        tmp = strtok(line, delimiter);
        if (tmp == NULL)
        return;
    
        printf("%s\n", tmp);
    
        for (;;) {
            tmp = strtok(NULL, delimiter);
            if (tmp == NULL)
                break;
            printf("%s\n", tmp);
        }
    }
    
    int main(void)
    {
        char *line = NULL;
        size_t size;
        FILE *fp = fopen("split_text.txt", "r");
        if (fp == NULL) {
            return -1;
        }
        while (getline(&line, &size, fp) != -1) {
            split_string(line);
        }
    
        free(line);
        return 0;
    }
    
    0 讨论(0)
  • 2021-01-26 11:15

    You can implement a simplistic parser with sscanf() or strtok() but be aware that neither of these will handle empty fields correctly. scanf's %[^\t] conversion specifier will fail if there are no characters before the next tab, and strtok() will consider any sequence of tabs a single delimiter.

    Here is a solution with an ad hoc utility function that behaves similarly to strtok() but without its shortcomings:

    #include <stdio.h>
    #include <stdlib.h>
    
    char *getfield(char **pp, char sep) {
        char *p, *res;
        for (res = p = *pp;; p++) {
            if (*p == sep) {
                *p++ = '\0';
                *pp = p;
                return res;
            }
            if (*p == '\0')
                return NULL;
        }
    }
    
    int main() {
        char line[256];
        char filename[] = "input_file.txt";
        int lineno = 0;
        FILE *fp = fopen(filename, "r");
        if (fp != NULL) {
            while (fgets(line, sizeof line, fp)) {
                char *p = line;
                char *reference = getfield(&p, '\t');
                char *description = getfield(&p, '\t');
                char *quantity = getfield(&p, '\t');
                char *price = strtod(getfield(&p, '\n');
                lineno++;
                if (price != NULL) {
                    /* all fields were parsed correctly */
                    printf("reference: %s\n, reference);
                    printf("description: %s\n, description);
                    printf("quantity: %d\n, atoi(quantity));
                    printf("price: %.2f\n\n, strtod(price, NULL));
                } else {
                    fprintf(stderr, "%s:%d: invalid line\n", filename, lineno);
                }
            }
            fclose(fp);
        }
        return 0;
    }
    
    0 讨论(0)
提交回复
热议问题