问题
I have a question about reading a file character by character and counting it in C
here's my code down below
void read_in(char** quotes){
FILE *frp = fopen(IN_FILE, "r");
char c;
size_t tmp_len =0, i=0;
//char* tmp[100];
//char* quotes[MAX_QUOTES];
//char str = fgets(str, sizeof(quotes),frp);
while((c=fgetc(frp)) != EOF){
if(frp == NULL){
printf("File is empty!");
fclose(frp); exit(1);
}
else{
if(c != '\n'){
printf("%c",c);
c=fgetc(frp);
tmp_len++;
}
}
char* tmp = (char*)calloc(tmp_len+1, sizeof(char));
fgets(tmp, sizeof(tmp), frp);
strcpy((char*)quotes[i], tmp);
printf("%s\n", (char*)quotes[i]);
i++;
}
}
It doesn't work but I don't understand why.
Thank you
回答1:
If you are using Linux you can try to use getline
instead of fgetc
and fgets
because getline
takes care of memory allocation.
Example:
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
FILE *fp;
char *line = NULL;
size_t len = 0;
ssize_t read;
if (argc != 2)
{
printf("usage: rf <filename>\n");
exit(EXIT_FAILURE);
}
fp = fopen(argv[1], "r");
if (fp == NULL)
{
perror("fopen");
exit(EXIT_FAILURE);
}
while ((read = getline(&line, &len, fp)) != -1) {
printf("Retrieved line of length %zu :\n", read);
printf("%s", line);
}
free(line);
exit(EXIT_SUCCESS);
}
回答2:
From your question and through the comments, it is relatively clear you want to read all quotes (lines) in a file into dynamically allocated storage (screen 1) and then sort the lines by length and output the first 5 shortest lines (screen 2) saving the 5 shortest lines to a second output file (this part is left to you). Reading and storing all lines from a file isn't difficult -- but it isn't trivial either. It sounds basic, and it is, but it requires that you use all of the basic tools needed to interface with persistent storage (reading the file from disk/storage media) and your computer's memory subsystem (RAM) -- correctly.
Reading each line from a file isn't difficult, but like anything in C, it requires you to pay attention to the details. You can read from a file using character-oriented input functions (fgetc()
, getc()
, etc..), you can use formatted-input functions (fscanf()
) and you can use line-oriented input functions such as (fgets()
or POSIX getline()
). Reading lines from a file is generally done with line-oriented functions, but there is nothing wrong with using a character-oriented approach either. In fact you can relatively easily write a function based around fgetc()
that will read each line from a file for you.
In the trivial case where you know the maximum number of characters for the longest line in the file, you can use a 2D array of characters to store the entire file. This simplifies the process by eliminating the need to allocate storage dynamically, but has a number of disadvantages like each line in the file requiring the same storage as the longest line in the file, and by limiting the size of the file that can be stored to the size of your program stack. Allocating storage dynamically with (malloc
, calloc
, or realloc
) eliminates these disadvantages and inefficiencies allowing you to store files up to the limit of the memory available on your computer. (there are methods that allow both to handle files of any size by using sliding-window techniques well beyond your needs here)
There is nothing difficult about handling dynamically allocated memory, or in copying or storing data within it on a character-by-character basis. That said, the responsibility for each allocation, tracking the amount of data written to each allocated block, reallocating to resize the block to ensure no data is written outside the bounds of each block and then freeing each allocated block when it is no longer needed -- is yours, the programmer. C gives the programmer the power to use each byte of memory available, and also places on the programmer the responsibility to use the memory correctly.
The basic approach to storing a file is simple. You read each line from the file, allocating/reallocating storage for each character until a '\n'
or EOF
is encountered. To coordinate all lines, you allocate a block of pointers, and you assign the address for each block of memory holding a line to a pointer, in sequence, reallocating the number of pointers required as needed to hold all lines.
Sometimes a picture really is worth 1000 words. With the basic approach you declare a pointer (to what?) a pointer so you can allocate a block of memory containing pointers to which you will assign each allocated line. For example, you could declare, char **lines;
A pointer-to-pointer is a single pointer that points to a block of memory containing pointers. Then the type for each pointer for lines
will be char *
which will point to each block holding a line from the file, e.g.
char **lines;
|
| allocated
| pointers allocated blocks holding each line
lines --> +----+ +-----+
| p1 | --> | cat |
+----+ +-----+--------------------------------------+
| p2 | --> | Four score and seven years ago our fathers |
+----+ +-------------+------------------------------+
| p3 | --> | programming |
+----+ +-------------------+
| .. | | ... |
+----+ +-------------------+
| pn | --> | last line read |
+----+ +----------------+
You can make lines
a bit more flexible to use by allocating 1 additional pointer and initializing that pointer to NULL
which allows you to iterate over lines
without knowing how many lines there are -- until NULL
is encountered, e.g.
| .. | | ... |
+----+ +-------------------+
| pn | --> | last line read |
+----+ +----------------+
|pn+1| | NULL |
+----+ +------+
While you can put this all together in a single function, to help the learning process (and just for practical reusability), it is often easier to break this up into two function. One that reads and allocates storage for each line, and a second function that basically calls the first function, allocating pointers and assigning the address for each allocated block of memory holding a line read from the file to the next pointer in turn. When you are done, you have an allocated block of pointers where each of the pointers holds the address of (points to) an allocated block holding a line from the file.
You have indicated you want to read from the file with fgetc()
and read a character at a time. There is nothing wrong with that, and there is little penalty to this approach since the underlying I/O subsystem provides a read-buffer that you are actually reading from rather than reading from disk one character at-a-time. (the size varies between compilers, but is generally provided through the BUFSIZ
macro, both Linux and Windows compilers provide this)
There are virtually an unlimited number of ways to write a function that allocates storage to hold a line and then reads a line from the file one character at-a-time until a '\n'
or EOF
is encountered. You can return a pointer to the allocated block holding the line and pass a pointer parameter to be updated with the number of characters contained in the line, or you can have the function return the line length and pass the address-of a pointer as a parameter to be allocated and filled within the function. It is up to you. One way would be:
#define NSHORT 5 /* no. of shortest lines to display */
#define LINSZ 128 /* initial allocation size for each line */
...
/** read line from 'fp' stored in allocated block assinged to '*s' and
* return length of string stored on success, on EOF with no characters
* read, or on failure, return -1. Block of memory sized to accommodate
* exact length of string with nul-terminating char. unless -1 returned,
* *s guaranteed to contain nul-terminated string (empty-string allowed).
* caller responsible for freeing allocated memory.
*/
ssize_t fgetcline (char **s, FILE *fp)
{
int c; /* char read from fp */
size_t n = 0, size = LINSZ; /* no. of chars and allocation size */
void *tmp = realloc (NULL, size); /* tmp pointer for realloc use */
if (!tmp) /* validate every allocation/reallocation */
return -1;
*s = tmp; /* assign reallocated block to pointer */
while ((c = fgetc(fp)) != '\n' && c != EOF) { /* read chars until \n or EOF */
if (n + 1 == size) { /* check if realloc required */
/* realloc using temporary pointer */
if (!(tmp = realloc (*s, size + LINSZ))) {
free (*s); /* on failure, free partial line */
return -1; /* return -1 */
}
*s = tmp; /* assign reallocated block to pointer */
size += LINSZ; /* update allocated size */
}
(*s)[n++] = c; /* assign char to index, increment */
}
(*s)[n] = 0; /* nul-terminate string */
if (n == 0 && c == EOF) { /* if nothing read and EOF, free mem return -1 */
free (*s);
return -1;
}
if ((tmp = realloc (*s, n + 1))) /* final realloc to exact length */
*s = tmp; /* assign reallocated block to pointer */
return (ssize_t)n; /* return length (excluding nul-terminating char) */
}
(note: the ssize_t
is a signed type providing the range of size_t
that essentially allows the return of -1
. it is provided in the sys/types.h
header. you can adjust the type as desired)
The fgetclines()
function makes one final call to realloc
to shrink the size of the allocation to the exact number of characters needed to hold the line and the nul-terminating character.
The function called to read all lines in the file while allocation and reallocating pointers as required does essentially the same thing as the fgetclines()
function above does for characters. It simply allocates some initial number of pointers and then begins reading lines from the file, reallocating twice the number of pointers each time it is needed. It also adds one additional pointer to hold NULL
as a sentinel that will allow iterating over all pointers until NULL
is reached (this is optional). The parameter n
is updated to with the number of lines stored to make that available back in the calling function. This function too can be written in a number of different ways, one would be:
/** read each line from `fp` and store in allocated block returning pointer to
* allocateted block of pointers to each stored line with the final pointer
* after the last stored string set to NULL as a sentinel. 'n' is updated to
* the number of allocated and stored lines (excluding the sentinel NULL).
* returns valid pointer on success, NULL otherwise. caller is responsible for
* freeing both allocated lines and pointers.
*/
char **readfile (FILE *fp, size_t *n)
{
size_t nptrs = LINSZ; /* no. of allocated pointers */
char **lines = malloc (nptrs * sizeof *lines); /* allocated bock of pointers */
void *tmp = NULL; /* temp pointer for realloc use */
/* read each line from 'fp' into allocated block, assign to next pointer */
while (fgetcline (&lines[*n], fp) != -1) {
lines[++(*n)] = NULL; /* set next pointer NULL as sentinel */
if (*n + 1 >= nptrs) { /* check if realloc required */
/* allocate using temporary pointer to prevent memory leak on failure */
if (!(tmp = realloc (lines, 2 * nptrs * sizeof *lines))) {
perror ("realloc-lines");
return lines; /* return original poiner on failure */
}
lines = tmp; /* assign reallocated block to pointer */
nptrs *= 2; /* update no. of pointers allocated */
}
}
/* final realloc sizing exact no. of pointers required */
if (!(tmp = realloc (lines, (*n + 1) * sizeof *lines)))
return lines; /* return original block on failure */
return tmp; /* return updated block of pointers on success */
}
Note above, the function takes an open FILE*
parameter for the file rather than taking a filename to open within the function. You generally want to open the file in the calling function and validate that it is open for reading before calling a function to read all the lines. If the file cannot be opened in the caller, there is no reason to make the function all to read the line from the file to begin with.
With a way to read an store all lines from your file done, you next need to turn to sorting the lines by length so you can output the 5 shortest lines (quotes). Since you will normally want to preserve the lines from your file in-order, the easiest way to sort the lines by length while preserving the original order is just to make a copy of the pointers and sort the copy of pointers by line length. For example, your lines
pointer can continue to contain the pointers in original order, while the set of pointers sortedlines
can hold the pointers in order sorted by line length, e.g.
int main (int argc, char **argv) {
char **lines = NULL, /* pointer to allocated block of pointers */
**sortedlines = NULL; /* copy of lines pointers to sort by length */
After reading the file and filling the lines
pointer, you can copy the pointers to sortedlines
(including the sentinel NULL
), e.g.
/* alocate storage for copy of lines pointers (plus sentinel NULL) */
if (!(sortedlines = malloc ((n + 1) * sizeof *sortedlines))) {
perror ("malloc-sortedlines");
return 1;
}
/* copy pointers from lines to sorted lines (plus sentinel NULL) */
memcpy (sortedlines, lines, (n + 1) * sizeof *sortedlines);
Then you simply call qsort
to sort the pointers in sortedlines
by length. Your only job with qsort
is to write the *compare` function. The prototype for the compare function is:
int compare (const void *a, const void *b);
Both a
and b
will be pointers-to elements being sorted. In your case with char **sortedlines;
, the elements will be pointer-to-char, so a
and b
will both have type pointer-to-pointer to char
. You simply write a compare function so it will return less than zero
if the length of line pointed to by a
is less than b
(already in the right order), return zero if the length is the same (no action needed) and return greater than zero if the length of a
is greater than b
(a swap is required). Writing the compare a the difference of two conditionals rather than simple a - b
will prevent all potential overflow, e.g.
/** compare funciton for qsort, takes pointer-to-element in a & b */
int complength (const void *a, const void *b)
{
/* a & b are pointer-to-pointer to char */
char *pa = *(char * const *)a, /* pa is pointer to string */
*pb = *(char * const *)b; /* pb is pointer to string */
size_t lena = strlen(pa), /* length of pa */
lenb = strlen(pb); /* length of pb */
/* for numeric types returing result of (a > b) - (a < b) instead
* of result of a - b avoids potential overflow. returns -1, 0, 1.
*/
return (lena > lenb) - (lena < lenb);
}
Now you can simply pass the collection of objects, the number of object, the size of each object and the function to use to sort the objects to qsort
. It doesn't matter what you need to sort -- it works the same way every time. There is no reason you should ever need to "go write" a sort (except for educational purposes) -- that is what qsort
is provided for. For example, here with sortedlines
, all you need is:
qsort (sortedlines, n, sizeof *sortedlines, complength); /* sort by length */
Now you can display all lines by iterating through lines
and display all lines in ascending line length through sortedlines
. Obviously to display the first 5 lines, just iterate over the first 5 valid pointers in sortedlines
. The same applies to opening another file for writing and writing those 5 lines to a new file. (that is left to you)
That's it. Is any of it difficult -- No. Is it trivial to do -- No. It is a basic part of programming in C that takes work to learn and to understand, but that is no different than anything worth learning. Putting all the pieces together in a working program to read and display all lines in a file and then sort and display the first 5 shortest lines you could do:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#define NSHORT 5 /* no. of shortest lines to display */
#define LINSZ 128 /* initial allocation size for each line */
/** compare funciton for qsort, takes pointer-to-element in a & b */
int complength (const void *a, const void *b)
{
/* a & b are pointer-to-pointer to char */
char *pa = *(char * const *)a, /* pa is pointer to string */
*pb = *(char * const *)b; /* pb is pointer to string */
size_t lena = strlen(pa), /* length of pa */
lenb = strlen(pb); /* length of pb */
/* for numeric types returing result of (a > b) - (a < b) instead
* of result of a - b avoids potential overflow. returns -1, 0, 1.
*/
return (lena > lenb) - (lena < lenb);
}
/** read line from 'fp' stored in allocated block assinged to '*s' and
* return length of string stored on success, on EOF with no characters
* read, or on failure, return -1. Block of memory sized to accommodate
* exact length of string with nul-terminating char. unless -1 returned,
* *s guaranteed to contain nul-terminated string (empty-string allowed).
* caller responsible for freeing allocated memory.
*/
ssize_t fgetcline (char **s, FILE *fp)
{
int c; /* char read from fp */
size_t n = 0, size = LINSZ; /* no. of chars and allocation size */
void *tmp = realloc (NULL, size); /* tmp pointer for realloc use */
if (!tmp) /* validate every allocation/reallocation */
return -1;
*s = tmp; /* assign reallocated block to pointer */
while ((c = fgetc(fp)) != '\n' && c != EOF) { /* read chars until \n or EOF */
if (n + 1 == size) { /* check if realloc required */
/* realloc using temporary pointer */
if (!(tmp = realloc (*s, size + LINSZ))) {
free (*s); /* on failure, free partial line */
return -1; /* return -1 */
}
*s = tmp; /* assign reallocated block to pointer */
size += LINSZ; /* update allocated size */
}
(*s)[n++] = c; /* assign char to index, increment */
}
(*s)[n] = 0; /* nul-terminate string */
if (n == 0 && c == EOF) { /* if nothing read and EOF, free mem return -1 */
free (*s);
return -1;
}
if ((tmp = realloc (*s, n + 1))) /* final realloc to exact length */
*s = tmp; /* assign reallocated block to pointer */
return (ssize_t)n; /* return length (excluding nul-terminating char) */
}
/** read each line from `fp` and store in allocated block returning pointer to
* allocateted block of pointers to each stored line with the final pointer
* after the last stored string set to NULL as a sentinel. 'n' is updated to
* the number of allocated and stored lines (excluding the sentinel NULL).
* returns valid pointer on success, NULL otherwise. caller is responsible for
* freeing both allocated lines and pointers.
*/
char **readfile (FILE *fp, size_t *n)
{
size_t nptrs = LINSZ; /* no. of allocated pointers */
char **lines = malloc (nptrs * sizeof *lines); /* allocated bock of pointers */
void *tmp = NULL; /* temp pointer for realloc use */
/* read each line from 'fp' into allocated block, assign to next pointer */
while (fgetcline (&lines[*n], fp) != -1) {
lines[++(*n)] = NULL; /* set next pointer NULL as sentinel */
if (*n + 1 >= nptrs) { /* check if realloc required */
/* allocate using temporary pointer to prevent memory leak on failure */
if (!(tmp = realloc (lines, 2 * nptrs * sizeof *lines))) {
perror ("realloc-lines");
return lines; /* return original poiner on failure */
}
lines = tmp; /* assign reallocated block to pointer */
nptrs *= 2; /* update no. of pointers allocated */
}
}
/* final realloc sizing exact no. of pointers required */
if (!(tmp = realloc (lines, (*n + 1) * sizeof *lines)))
return lines; /* return original block on failure */
return tmp; /* return updated block of pointers on success */
}
/** free all allocated memory (both lines and pointers) */
void freelines (char **lines, size_t nlines)
{
for (size_t i = 0; i < nlines; i++) /* loop over each pointer */
free (lines[i]); /* free allocated line */
free (lines); /* free pointers */
}
int main (int argc, char **argv) {
char **lines = NULL, /* pointer to allocated block of pointers */
**sortedlines = NULL; /* copy of lines pointers to sort by length */
size_t n = 0; /* no. of pointers with allocated lines */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
if (!(lines = readfile (fp, &n))) /* read all lines in file, fill lines */
return 1;
if (fp != stdin) /* close file if not stdin */
fclose (fp);
/* alocate storage for copy of lines pointers (plus sentinel NULL) */
if (!(sortedlines = malloc ((n + 1) * sizeof *sortedlines))) {
perror ("malloc-sortedlines");
return 1;
}
/* copy pointers from lines to sorted lines (plus sentinel NULL) */
memcpy (sortedlines, lines, (n + 1) * sizeof *sortedlines);
qsort (sortedlines, n, sizeof *sortedlines, complength); /* sort by length */
/* output all lines from file (first screen) */
puts ("All lines:\n\nline : text");
for (size_t i = 0; i < n; i++)
printf ("%4zu : %s\n", i + 1, lines[i]);
/* output first five shortest lines (second screen) */
puts ("\n5 shortest lines:\n\nline : text");
for (size_t i = 0; i < (n >= NSHORT ? NSHORT : n); i++)
printf ("%4zu : %s\n", i + 1, sortedlines[i]);
freelines (lines, n); /* free all allocated memory for lines */
free (sortedlines); /* free block of pointers */
}
(note: the file reads from the filename passed as the first argument to the program, or from stdin
if no argument is given)
Example Input File
$ cat dat/fleascatsdogs.txt
My dog
My fat cat
My snake
My dog has fleas
My cat has none
Lucky cat
My snake has scales
Example Use/Output
$ ./bin/fgetclinesimple dat/fleascatsdogs.txt
All lines:
line : text
1 : My dog
2 : My fat cat
3 : My snake
4 : My dog has fleas
5 : My cat has none
6 : Lucky cat
7 : My snake has scales
5 shortest lines:
line : text
1 : My dog
2 : My snake
3 : Lucky cat
4 : My fat cat
5 : My cat has none
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind
is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/fgetclinesimple dat/fleascatsdogs.txt
==5900== Memcheck, a memory error detector
==5900== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==5900== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==5900== Command: ./bin/fgetclinesimple dat/fleascatsdogs.txt
==5900==
All lines:
line : text
1 : My dog
2 : My fat cat
3 : My snake
4 : My dog has fleas
5 : My cat has none
6 : Lucky cat
7 : My snake has scales
5 shortest lines:
line : text
1 : My dog
2 : My snake
3 : Lucky cat
4 : My fat cat
5 : My cat has none
==5900==
==5900== HEAP SUMMARY:
==5900== in use at exit: 0 bytes in 0 blocks
==5900== total heap usage: 21 allocs, 21 frees, 7,938 bytes allocated
==5900==
==5900== All heap blocks were freed -- no leaks are possible
==5900==
==5900== For counts of detected and suppressed errors, rerun with: -v
==5900== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
There is a lot here, and as with any "how do it do X?" question, the devil is always in the detail, the proper use of each function, the proper validation of each input or allocation/reallocation. Each part is just as important as the other to ensure your code does what you need it to do -- in a defined way. Look things over, take your time to digest the parts, and let me know if you have further questions.
来源:https://stackoverflow.com/questions/62650192/need-help-for-reading-a-file-character-by-character-in-c