Read integers from a specific line from a file in c

一笑奈何 提交于 2021-02-11 13:55:30

问题


Hello there I've been struggling with this problem for some time and after some research on the internet that didn't bear any fruit I decided to ask for some help.

I need to read some integers from a file and from a specific line and then do something with them.

I know this trick for handling strings of characters

 while(fgets(pointer_to_string, length, "file_name.txt"))
 line++;       /*increment line-integer type- by 1*/

 if(line == your_line) /*do something with the strings at that line*/

I know that 'fgets()' will read everything until it reaches a newline '\n' so that makes it easy,but my problem is a bit different. I need to read from a file integers, for example:

5
1 78 45 32 2

In my particular case the number on the first line represents the number of integers located on the second line separated by a blank space, so i need to read the first number then create a pointer to an array to which I will allocate memory:

int a[20];
int num; /*number on first line*/
int* p;
p = a;
p = (int*)malloc(num*sizeof(int));

Of course the memory allocation will be done after I read the first number from the file.

So I guess I'd be easier to just show you my struggle:

int main()
{

FILE* file = fopen("files.txt", "r");

int a[20], first_num, j = 0;
int* p = a, line = 1;
while(!feof(file))
{

    if ( line == 1 )
    {


        fscanf(file, "%d", &first_num);
        p = (int*)malloc(first_num*sizeof(int));
    }
    else
    {


        for ( j = 0; j <  first_num; j++)
            fscanf(file, "%d", (p + j));
    }


    line++;

}

for ( j = 0; j < first_num; j++)
{
    printf("\t%d\t", *(p + j));
}
printf("\n%d", first_num);
free(p);

fclose(file);


return 0;
}

Weirdly enough this program actually works for this example ( number of elements on the 1st line and array on the 2nd ) but I have a feeling that it is flawed or at least I can't call it "clean" mostly because I'm not really sure how that loop works I know that 'feof' function is used to reach the end of a file so as long as I'm not there yet it will return a non-zero value and that's why I can memorize the number on the 1st line but I don't know when and how it checks the loop.At first I thought that it does it at the end of every line so but that would imply that if I were to change the 'else' with:

else if ( line == 2 )

it would still need to work properly, which it doesn't.So I would appreciate some explanations for how that loop actually works.

My guess is that I need a loop in the 'while' to check when I reached the end of a line or something like that but I'm really stuck.

My real question is for how to read integers separated by space from a specific line from a file and not necessarily the example I gave you ( that one is for someone who wouldn't mind helping me out )


回答1:


Let's start with some basics. When reading lines from a file (or lines of input from the user), you will generally want to use a line-oriented input function such as fgets or POSIX getline to make sure you read an entire line at a time and not have what is left in your input buffer depend on which scanf conversion specifier was used last. With fgets you will need to provide a buffer of sufficient size to hold the entire line, or dynamically allocate and realloc as needed until an entire line is read (getline handles this for you). You validate an entire line was read by checking the last character is the '\n' character or that the length of the buffer is less than the maximum size (both are left to you below).

Once you have a line of text read, you have two options, you can either use sscanf to convert the digits in your buffer to integer values (either knowing the number contained in the line beforehand and providing an adequate number of conversion specifiers, or by converting each individually and using the "%n" specifier to report the number of characters extracted for that conversion and incrementing the start within your buffer by that amount for the next conversion)

Your other option, and by far the most flexible and robust from an error checking and reporting standpoint is to use strtol and use the endptr parameter for its intended purpose of providing a pointer to one past the last digit converted allowing you to walk down your buffer directly converting values as you go. See: strtol(3) - Linux manual page strtol provides the ability to discriminate between a failure where no digits were converted, where overflow or underflow occurred (setting errno to an appropriate value), and allows you to test whether additional characters remain after the conversion through the endptr parameter for control of your value conversion loop.

As with any code you write, validating each necessary step will ensure you can respond appropriately.

Let's start with your sample input file:

Example Input File

$ cat dat/int_file.txt
5
1 78 45 32 2

When faced with a single value on the first line, a majority of the time you will simply want to convert the original value with fscanf (file, "%d", &ival);, which is fine, but -- the pitfalls of using any of the scanf family is YOU must account for any characters left in the input buffer following conversion. While the "%d" conversion specifier will provide the needed conversion, character extract stops with the last digit, leaving the '\n' unread. As long as you account for that fact, it's fine to use fscanf to grab the first value. However, you must validate each step along the way.

Let's look at the beginning of an example doing just that, opening a file (or reading from stdin if no filename is given), validating the file is open, and then validating first_num is read, e.g.

#include <stdio.h>
#include <stdlib.h> /* for malloc/free & EXIT_FAILURE */
#include <errno.h>  /* for strtol validation */
#include <limits.h> /* for INT_MIN/INT_MAX */

#define MAXC 1024   /* don't skimp on buffer size */

int main (int argc, char **argv) {

    int first_num,      /* your first_num */
        *arr = NULL,    /* a pointer to block to fill with int values */
        nval = 0;       /* the number of values converted */
    char buf[MAXC];     /* buffer to hold subsequent lines read */
    /* open file passed as 1st argument (default: stdin if no argument) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r"): stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("fopen-file");
        exit (EXIT_FAILURE);
    }

    if (fscanf (fp, "%d", &first_num) != 1) {   /* read/validate int */
        fputs ("error: invalid file format, integer not first.\n", stderr);
        exit (EXIT_FAILURE);
    }

At this point, your input buffer contains:

\n
1 78 45 32 2

Since you are going to embark on a line-oriented read of the remaining lines in the file, you can simply make your first call to fgets for the purpose of reading and discarding the '\n', e.g.

    if (!fgets (buf, MAXC, fp)) {   /* read/discard '\n' */
        fputs ("error: non-POSIX ending after 1st integer.\n", stderr);
        exit (EXIT_FAILURE);
    }

(note: the validation. If the file had ended with a non-POSIX line end (e.g. no '\n'), fgets would fail and unless you are checking, you are likely to invoke undefined behavior by attempting to later read from a file stream where no characters remain to be read and thereafter attempting to read from a buffer with indeterminate contents)

You can allocate storage for first_num number of integers at this point and assign the starting address for that new block to arr for filling with integer values, e.g.

    /* allocate/validate storage for first_num integers */
    if (!(arr = malloc (first_num * sizeof *arr))) {
        perror ("malloc-arr");
        exit (EXIT_FAILURE);
    }

For reading the remaining values in your file, you could just make a single call to fgets and then turn to converting the integer values contained within the buffer filled, but with just a little forethought, you can craft an approach that will read as many lines as needed until first_num integers have been converted or EOF is encountered. Whether you are taking input or converting values in a buffer, a robust approach is the same Loop Continually Until You Get What You Need Or Run Out Of Data, e.g.

while (fgets (buf, MAXC, fp)) { /* read lines until conversions made */
    char *p = buf,  /* nptr & endptr for strtol conversion */
        *endptr;
    if (*p == '\n')     /* skip blank lines */
        continue;
    while (nval < first_num) {  /* loop until nval == first_num */
        errno = 0;              /* reset errno for each conversion */
        long tmp = strtol (p, &endptr, 0);  /* call strtol */
        if (p == endptr && tmp == 0) {  /* validate digits converted */
            /* no digits converted - scan forward to next +/- or [0-9] */
            do
                p++;
            while (*p && *p != '+' && *p != '-' && 
                    ( *p < '0' || '9' < *p));
            if (*p)     /* valid start of numeric sequence? */
                continue;   /* go attempt next conversion */
            else
                break;      /* go read next line */
        }
        else if (errno) {   /* validate successful conversion */
            fputs ("error: overflow/underflow in conversion.\n", stderr);
            exit (EXIT_FAILURE);
        }
        else if (tmp < INT_MIN || INT_MAX < tmp) {  /* validate int */
            fputs ("error: value exceeds range of 'int'.\n", stderr);
            exit (EXIT_FAILURE);
        }
        else {  /* valid conversion - in range of int */
            arr[nval++] = tmp;      /* add value to array */
            if (*endptr && *endptr != '\n') /* if chars remain */
                p = endptr;         /* update p to endptr */
            else        /* otherwise */
                break;  /* bail */
        }
    }
    if (nval == first_num)  /* are all values filled? */
        break;
}

Now let's unpack this a bit. The first thing that occurs is you declare the pointers needed to work with strtol and assign the starting address of buf which you fill with fgets to p and then read a line from your file. There is no need to attempt conversion on a blank line, so we test the first character in buf and if it is a '\n' we get the next line with:

    ...
    if (*p == '\n')     /* skip blank lines */
        continue;
    ...

Once you have a non-empty line, you start your conversion loop and will attempt conversions until the number of values you have equals first_num or you reach the end of the line. Your loop control is simply:

    while (nval < first_num) {  /* loop until nval == first_num */
        ...
    }

Within the loop you will fully validate your attempted conversions with strtol by resetting errno = 0; before each conversion and assigning the return of the conversion to a temporary long int value. (e.g. string-to-long), e.g.

        errno = 0;              /* reset errno for each conversion */
        long tmp = strtol (p, &endptr, 0);  /* call strtol */

Once you make the conversion, you have three conditions to validate before you have a good integer conversion,

  1. if NO digits were converted, then p == endptr (and per the man page the return is set to zero). So to check whether this condition occurred, you can check: if (p == endptr && tmp == 0);
  2. if there was an error during conversion of digits, regardless of which error occurred, errno will be set to a non-zero value allowing you to check for an error in conversion with if (errno). You can also further dive into which occurred as specified in the man page, but for validation purposes here it is enough to know whether an error occurred; and finally
  3. if digits were converted and there was no error, you are still not done. The strtol conversion is to a value of long that may or may not be compatible with int (e.g. long is 8-bytes on x86_64 while int is 4-bytes. So to ensure the converted value will fit in your integer array, you need to check that the value returned is within INT_MIN and INT_MAX before you assign the value to an element of arr.

(note: with 1. above, just because no digits were converted does not mean there were no digits in the line, it just means the first value was not a digit. You should scan forward in the line using your pointer to find the next +/- or [0-9] to determine in further numeric values exist. That is the purpose of the while loop within that code block)

Once you have a good integer value, recall that endptr will be set to the next character after the last digit converted. A quick check whether *endptr is not the nul-terminating character and not the line-ending will tell you whether charters remain that are available for conversion. If so, simply update p = endptr so that your pointer now points one past the last digit converted and repeat. (you can also scan forward at this point with the same while loop used above to determine if another numeric value exists -- this is left to you)

Once the loop completes, all you need do is check if nval == first_num to know if you need to continue collecting values.

Putting it altogether, you could do something similar to:

#include <stdio.h>
#include <stdlib.h> /* for malloc/free & EXIT_FAILURE */
#include <errno.h>  /* for strtol validation */
#include <limits.h> /* for INT_MIN/INT_MAX */

#define MAXC 1024   /* don't skimp on buffer size */

int main (int argc, char **argv) {

    int first_num,      /* your first_num */
        *arr = NULL,    /* a pointer to block to fill with int values */
        nval = 0;       /* the number of values converted */
    char buf[MAXC];     /* buffer to hold subsequent lines read */
    /* open file passed as 1st argument (default: stdin if no argument) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r"): stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("fopen-file");
        exit (EXIT_FAILURE);
    }

    if (fscanf (fp, "%d", &first_num) != 1) {   /* read/validate int */
        fputs ("error: invalid file format, integer not first.\n", stderr);
        exit (EXIT_FAILURE);
    }

    if (!fgets (buf, MAXC, fp)) {   /* read/discard '\n' */
        fputs ("error: non-POSIX ending after 1st integer.\n", stderr);
        exit (EXIT_FAILURE);
    }

    /* allocate/validate storage for first_num integers */
    if (!(arr = malloc (first_num * sizeof *arr))) {
        perror ("malloc-arr");
        exit (EXIT_FAILURE);
    }

    while (fgets (buf, MAXC, fp)) { /* read lines until conversions made */
        char *p = buf,  /* nptr & endptr for strtol conversion */
            *endptr;
        if (*p == '\n')     /* skip blank lines */
            continue;
        while (nval < first_num) {  /* loop until nval == first_num */
            errno = 0;              /* reset errno for each conversion */
            long tmp = strtol (p, &endptr, 0);  /* call strtol */
            if (p == endptr && tmp == 0) {  /* validate digits converted */
                /* no digits converted - scan forward to next +/- or [0-9] */
                do
                    p++;
                while (*p && *p != '+' && *p != '-' && 
                        ( *p < '0' || '9' < *p));
                if (*p)     /* valid start of numeric sequence? */
                    continue;   /* go attempt next conversion */
                else
                    break;      /* go read next line */
            }
            else if (errno) {   /* validate successful conversion */
                fputs ("error: overflow/underflow in conversion.\n", stderr);
                exit (EXIT_FAILURE);
            }
            else if (tmp < INT_MIN || INT_MAX < tmp) {  /* validate int */
                fputs ("error: value exceeds range of 'int'.\n", stderr);
                exit (EXIT_FAILURE);
            }
            else {  /* valid conversion - in range of int */
                arr[nval++] = tmp;      /* add value to array */
                if (*endptr && *endptr != '\n') /* if chars remain */
                    p = endptr;         /* update p to endptr */
                else        /* otherwise */
                    break;  /* bail */
            }
        }
        if (nval == first_num)  /* are all values filled? */
            break;
    }
    if (nval < first_num) { /* validate required integers found */
        fputs ("error: EOF before all integers read.\n", stderr);
        exit (EXIT_FAILURE);
    }

    for (int i = 0; i < nval; i++)  /* loop outputting each integer */
        printf ("arr[%2d] : %d\n", i, arr[i]);

    free (arr);         /* don't forget to free the memory you allocate */

    if (fp != stdin)    /* and close any file streams you have opened */
        fclose (fp);

    return 0;
}

(note: the final check of if (nval < first_num) after exiting the read and conversion loop)

Example Use/Output

With your example file, you would get the following:

$ ./bin/fgets_int_file dat/int_file.txt
arr[ 0] : 1
arr[ 1] : 78
arr[ 2] : 45
arr[ 3] : 32
arr[ 4] : 2

Why Go To The Extra Trouble?

By thoroughly understanding the conversion process and going to the few additional lines of trouble, you end up with a routine that can provide flexible input handling for whatever number of integers you need regardless of the input file format. Let's look at another variation of your input file:

A More Challenging Input File

$ cat dat/int_file2.txt
5

1 78

      45

32 2 144 91 270

foo

What changes are needed to handle retrieving the same first five integer values from this file? (hint: none - try it)

An Even More Challenging Input File

What if we up the ante again?

$ cat dat/int_file3.txt
5

1 two buckle my shoe, 78 close the gate

      45 is half of ninety
foo bar
32 is sixteen times 2 and 144 is a gross, 91 is not prime and 270 caliber

baz

What changes are needed to read the first 5 integer values from this file? (hint: none)

But I Want To Specify the Line To Start Reading From

OK, let's take another input file to go along with the example. Say:

An Example Input Reading From A Given Line

$ cat dat/int_file4.txt
5

1,2 buckle my shoe, 7,8 close the gate

      45 is half of ninety
foo bar
32 is sixteen times 2 and 144 is a gross, 91 is not prime and 270 caliber

baz

   1 78 45 32 2 27 41 39 1111

a quick brown fox jumps over the lazy dog

What would I have to change? The only changes needed are changes to skip the first 10 lines and begin your conversion loop at line 11. To do that you would need to add a variable to hold the value of the line to start reading integers on (say rdstart) and a variable to hold the line count so we know when to start reading (say linecnt), e.g.

    int first_num,
        *arr = NULL,
        nval = 0,
        rdstart = argc > 2 ? strtol(argv[2], NULL, 0) : 2,
        linecnt = 1;

(note: the line to start the integer read from is taken as the 2nd argument to the program or a default of line 2 is used if none is specified -- and yes, you should apply the same full validations to this use of strtol, but that I leave to you)

What else needs changing? Not much. Instead of simply reading and discarding the '\n' left by fscanf, just do that linecnt-1 times (or just linecnt time since you initialized linecnt = 1;). To accomplish that, simply wrap your first call to fgets in a loop (and change the error message to make sense), e.g.

    while (linecnt < rdstart) { /* loop until linecnt == rdstart */
        if (!fgets (buf, MAXC, fp)) {   /* read/discard line */
            fputs ("error: less than requested no. of lines.\n", stderr);
            exit (EXIT_FAILURE);
        }
        linecnt++;  /* increment linecnt */
    }

That's it. (and note it will continue to handle the first 3 input files as well just by omitting the second parameter...)

Example Output Start At Line 11

Does it work?

$ ./bin/fgets_int_file_line dat/int_file4.txt 11
arr[ 0] : 1
arr[ 1] : 78
arr[ 2] : 45
arr[ 3] : 32
arr[ 4] : 2

Look things over and let me know if you have further questions. There are many ways to do this, but by far, if you learn how to use strtol (all the strtoX functions work very much the same), you will be well ahead of the game in handling numeric conversion.




回答2:


My real question is for how to read integers separated by space from a specific line from a file ... (?)

Step 1: Say you expect up to N integers. Read the line with a generous yet sane maximum length. I'd used 2x the anticipated maximum size. Good to allow for extra spacing yet lines of extreme length are likely errant or hostile.

#define INT_PER_LINE_MAX 20

// About 1 digit per 3 bits, 28/93 is just above log10(2)
#define CHARACTERS_PER_INT_MAX ((sizeof(int)*CHAR_BIT - 1)*28/93 + 2)

// Room for N int, separators and \0
#define LINE_SIZE (INT_PER_LINE_MAX * (CHARACTERS_PER_INT_MAX + 1) + 1)

// I like 2x to allow extra spaces, leading zeros, etc.
char buf[LINE_SIZE * 2];

while (fgets(buf, sizeof buf, file) {
  ...

Step 2: Call a function to parse the N integers from the string

while (fgets(buf, sizeof buf, file) {
  int a[INT_PER_LINE_MAX];
  int count = parse_ints(a, INT_PER_LINE_MAX, buf);
  if (count < 0) {
    puts("bad input");
  } else {
    printf("%d int found\n", count);
  }
}

Step 3: Make parse_ints(). Parse the string using strtol() as well suggested by @David C. Rankin or use sscanf() with "%d %n". This latter approach lacks robust overflow protection.

int parse_ints(int *a, int n, const char *buf) {

  int i;
  // enough room? and more to parse?
  for (i=0; i<n && *buf; i++) {
    int value;
    int n;  // save offset where scanning stopped.
    if (sscanf(buf, "%d %n", &value, &n) != 1) {
      return -1;  // No int scanned.
    }
    a[i] = value;
    buf += n;  // advance the buffer
  }
  if (*buf) {
    return -1;  // Unexpected extra text left over
  }
  return i;
}   


来源:https://stackoverflow.com/questions/53798736/read-integers-from-a-specific-line-from-a-file-in-c

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!