Storing each line of a text file into an array

后端 未结 2 1979
南笙
南笙 2020-12-22 00:39

I am trying to save each line of a text file into an array. They way I am doing it and works fine so far is this :

char *lines[40];
char line[50];
int i = 0          


        
相关标签:
2条回答
  • 2020-12-22 01:13

    There are many ways to approach this problem. Either declare a static 2D array or char (e.g. char lines[40][50] = {{""}};) or declare a pointer to array of type char [50], which is probably the easiest for dynamic allocation. With that approach you only need a single allocation. With constant MAXL = 40 and MAXC = 50, you simply need:

    char (*lines)[MAXC] = NULL;
    ...
    lines = malloc (MAXL * sizeof *lines);
    

    Reading each line with fgets is a simple task of:

    while (i < MAXL && fgets (lines[i], MAXC, fp)) {...
    

    When you are done, all you need to do is free (lines); Putting the pieces together, you can do something like:

    #include <stdio.h>
    #include <stdlib.h>
    
    enum { MAXL = 40, MAXC = 50 };
    
    int main (int argc, char **argv) {
    
        char (*lines)[MAXC] = NULL; /* pointer to array of type char [MAXC] */
        int i, n = 0;
        FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
    
        if (!fp) {  /* valdiate file open for reading */
            fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
            return 1;
        }
    
        if (!(lines = malloc (MAXL * sizeof *lines))) { /* allocate MAXL arrays */
            fprintf (stderr, "error: virtual memory exhausted 'lines'.\n");
            return 1;
        }
    
        while (n < MAXL && fgets (lines[n], MAXC, fp)) { /* read each line */
            char *p = lines[n];                  /* assign pointer */
            for (; *p && *p != '\n'; p++) {}     /* find 1st '\n'  */
            *p = 0, n++;                         /* nul-termiante  */
        }
        if (fp != stdin) fclose (fp);   /* close file if not stdin */
    
        /* print lines */
        for (i = 0; i < n; i++) printf (" line[%2d] : '%s'\n", i + 1, lines[i]);
    
        free (lines);   /* free allocated memory */
    
        return 0;
    }
    

    note: you will also want to check to see if the whole line was read by fgets each time. (say you had a long line of more than 38 chars in the file). You do this by checking whether *p is '\n' before overwriting with the nul-terminating character. (e.g. if (*p != '\n') { int c; while ((c = getchar()) != '\n' && c != EOF) {} }). That insures the next read with fgets will begin with the next line, instead of the remaining characters in the current line.

    To include the check you could do something similar to the following (note: I changed the read loop counter from i to n to eliminate the need for assigning n = i; following the read loop).

        while (n < MAXL && fgets (lines[n], MAXC, fp)) { /* read each line */
            char *p = lines[n];                 /* assign pointer  */
            for (; *p && *p != '\n'; p++) {}    /* find 1st '\n'   */
            if (*p != '\n') {                   /* check line read */
                int c;  /* discard remainder of line with getchar  */
                while ((c = fgetc (fp)) != '\n' && c != EOF) {}
            }
            *p = 0, n++;                        /* nul-termiante   */
        }
    

    It is up to you whether you discard or keep the remainder of lines that exceed the length of your array. However, it is a good idea to always check. (the lines of text in my example input below are limited to 17-chars so there was no possibility of a long line, but you generally cannot guarantee the line length.

    Example Input

    $ cat dat/40lines.txt
    line of text -  1
    line of text -  2
    line of text -  3
    line of text -  4
    line of text -  5
    line of text -  6
    ...
    line of text - 38
    line of text - 39
    line of text - 40
    

    Example Use/Output

    $ ./bin/fgets_ptr2array <dat/40lines.txt
     line[ 1] : 'line of text -  1'
     line[ 2] : 'line of text -  2'
     line[ 3] : 'line of text -  3'
     line[ 4] : 'line of text -  4'
     line[ 5] : 'line of text -  5'
     line[ 6] : 'line of text -  6'
    ...
     line[38] : 'line of text - 38'
     line[39] : 'line of text - 39'
     line[40] : 'line of text - 40'
    

    Now include a the length check in code and add a long line to the input, e.g.:

    $ cat dat/40lines+long.txt
    line of text -  1
    line of text -  2
    line of text -  3 + 123456789 123456789 123456789 123456789 65->|
    line of text -  4
    ...
    

    Rerun the program and you can confirm you have now protected against long lines in the file mucking up your sequential read of lines from the file.


    Dynamically Reallocating lines

    If you have an unknown number of lines in your file and you reach your initial allocation of 40 in lines, then all you need do to keep reading additional lines is realloc storage for lines. For example:

        int i, n = 0, maxl = MAXL;
        ...
        while (fgets (lines[n], MAXC, fp)) {     /* read each line */
            char *p = lines[n];                  /* assign pointer */
            for (; *p && *p != '\n'; p++) {}     /* find 1st '\n'  */
            *p = 0;                              /* nul-termiante  */
            if (++n == maxl) { /* if limit reached, realloc lines  */
                void *tmp = realloc (lines, 2 * maxl * sizeof *lines);
                if (!tmp) {     /* validate realloc succeeded */
                    fprintf (stderr, "error: realloc - virtual memory exhausted.\n");
                    break;      /* on failure, exit with existing data */
                }
                lines = tmp;    /* assign reallocated block to lines */
                maxl *= 2;      /* update maxl to reflect new size */
            }
        }
    

    Now it doesn't matter how many lines are in your file, you will simply keep reallocating lines until your entire files is read, or you run out of memory. (note: currently the code reallocates twice the current memory for lines on each reallocation. You are free to add as much or as little as you like. For example, you could allocate maxl + 40 to simply allocate 40 more lines each time.

    Edit In Response To Comment Inquiry

    If you do want to use a fixed increase in the number of lines rather than scaling by some factor, you must allocate for a fixed number of additional lines (the increase times sizeof *lines), you can't simple add 40 bytes, e.g.

            void *tmp = realloc (lines, (maxl + 40) * sizeof *lines);
                if (!tmp) {     /* validate realloc succeeded */
                    fprintf (stderr, "error: realloc - virtual memory exhausted.\n");
                    break;      /* on failure, exit with existing data */
                }
                lines = tmp;    /* assign reallocated block to lines */
                maxl += 40;     /* update maxl to reflect new size */
            }
    

    Recall, lines is a pointer-to-array of char[50], so for each additional line you want to allocate, you must allocate storage for 50-char (e.g. sizeof *lines), so the fixed increase by 40 lines will be realloc (lines, (maxl + 40) * sizeof *lines);, then you must accurately update your max-lines-allocated count (maxl) to reflect the increase of 40 lines, e.g. maxl += 40;.

    Example Input

    $ cat dat/80lines.txt
    line of text -  1
    line of text -  2
    ...
    line of text - 79
    line of text - 80
    

    Example Use/Output

    $ ./bin/fgets_ptr2array_realloc <dat/80lines.txt
     line[ 1] : 'line of text -  1'
     line[ 2] : 'line of text -  2'
    ...
     line[79] : 'line of text - 79'
     line[80] : 'line of text - 80'
    

    Look it over and let me know if you have any questions.

    0 讨论(0)
  • 2020-12-22 01:15

    As an aside, I tested the exact code you show above to get line count (by counting newline characters), on a file containing more than 1000 lines, and with some lines 4000 char long. The problem is not there. The seg fault is therefore likely due to the way you are allocating memory for each line buffer. You may be attempting to write a long line to a short buffer. (maybe I missed it in your post, but could not find where you addressed line length?)

    Two things useful when allocating memory for storing strings in a file are number of lines, and the maximum line length in the file. These can be used to create the array of char arrays.

    You can get both line count and longest line by looping on fgets(...): (a variation on your theme, essentially letting fgets find the newlines)

    int countLines(FILE *fp, int *longest)
    {
        int i=0;
        int max = 0;
        char line[4095];  // max for C99 strings
        *longest = max;
        while(fgets(line, 4095, fp))
        {
            max = strlen(line); 
            if(max > *longest) *longest = max;//record longest
            i++;//track line count
        }
        return i;
    }
    int main(void)
    {
        int longest;
        char **strArr = {0};
        FILE *fp = fopen("C:\\dev\\play\\text.txt", "r");
        if(fp)
        {
            int count = countLines(fp, &longest);
            printf("%d", count);
            GetKey();
        }
        // use count and longest to create memory
        strArr = create2D(strArr, count, longest);
        if(strArr)
        {
           //use strArr ...
           //free strArr
           free2D(strArr, lines);
        }
        ......and so on
        return 0;   
    }
    
    char ** create2D(char **a, int lines, int longest)
    {
        int i;
        a = malloc(lines*sizeof(char *));
        if(!a) return NULL;
        {
            for(i=0;i<lines;i++)
            {
                a[i] = malloc(longest+1);
                if(!a[i]) return NULL;
            }
        }
        return a;
    }
    
    void free2D(char **a, int lines)
    {
        int i;
        for(i=0;i<lines;i++)
        {
            if(a[i]) free(a[i]);
        }
        if(a) free(a);
    }
    
    0 讨论(0)
提交回复
热议问题