I wrote a program that is supposed to identify any words of a given file and index each one properly so that I can print a word that is assigned a specific index. The last print
Your program logic of attempting to be an array of pointers and a linked-list is a recipe for confusion, and prevents simply allocating for a block of nodes
(as any reallocation would change the ->next
address held). If you declare an array, then ->next
is superfluous as next = &array[n+1]
so it isn't needed if you use an array. If you want a linked-list, then forget about the array -- it is superfluous to the list. (see link to a linked-list implementation at the end)
Either way, your use of fgetc
doesn't handle leading or multiple intervening spaces (whitespace). However, there is a C standard library function that does skip leading whtiespace and read characters until the next whitespace (the very simple fscanf
with the "%s"
format-specifier)
Replacing fgetc
with fscanf
, reduces the reading of words to:
#define MAXW 128
...
char buf[MAXW];
FILE *fp = fopen (argv[1], "r");
...
while (fscanf (fp, "%127s", buf) == 1) { /* read each word from file */
...
That greatly simplifies the logic of reading space-separated words.
If you are reading an unknown number of words of unknown length and you want to size your allocations efficiently allocating only what is needed to store each word, then your approach is the normal declare a pointer-to-pointer (e.g. char **words = NULL;
), then allocate some initial number of pointers, read the word into a temporary buffer, get the length, allocate length+1
chars of storage, copy the word to the storage and assign the starting address to the next allocated pointer. When you have used the initial number of pointers allocated, realloc
and keep going.
Implementing that logic and tracking the available allocated number of pointers with avail
and the used pointers with used
and reallocating when used == avail
, you could do:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define WORDS 2 /* if you need a constant, #define one (or more) */
#define MAXW 128
int main (int argc, char **argv) {
char buf[MAXW];
size_t used = 0, avail = WORDS; /* used & available (allocated) pointers */
char **words = NULL;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
/* allocate/validate initial avail no. of pointers */
if ((words = malloc (avail * sizeof *words)) == NULL) {
perror ("malloc-words");
return 1;
}
while (fscanf (fp, "%127s", buf) == 1) { /* read each word from file */
size_t len;
if (used == avail) { /* realloc 2X pointers if used == avail */
void *tmp = realloc (words, 2 * avail * sizeof *words);
if (!tmp) { /* validate reallocation */
perror ("realloc-words");
break; /* don't exit, original pointer words still valid */
}
words = tmp; /* assign reallocated block of ptrs to words */
avail *= 2; /* update the number of pointers available */
}
len = strlen (buf); /* get word length */
if (!(words[used] = malloc (len + 1))) { /* allocate len + 1 chars */
perror ("malloc-words[used]");
break; /* ditto */
}
memcpy (words[used++], buf, len + 1); /* copy word to storage */
}
if (fp != stdin) /* close file if not stdin */
fclose (fp);
for (size_t i = 0; i < used; i++) { /* loop outputting words */
printf ("words[%2zu] : %s\n", i, words[i]);
free (words[i]); /* free storage for words */
}
free (words); /* free pointers */
}
If you have strdup
available, then you can replace the manual allocation and copy above:
len = strlen (buf); /* get word length */
if (!(words[used] = malloc (len + 1))) { /* allocate len + 1 chars */
perror ("malloc-words[used]");
break;
}
memcpy (words[used++], buf, len + 1); /* copy word to storage */
with a simple assignment from strdup(buf)
, but as with any function that allocates memory, you must check the return:
if (!(words[used] = strdup (buf))) { /* strdup allocates - you validate */
perror ("strdup-words[used]");
break;
}
used++;
Example Input File
$ cat dat/captnjack.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.
Example Use/Output
$ ./bin/fscanf_words dat/captnjack.txt
words[ 0] : This
words[ 1] : is
words[ 2] : a
words[ 3] : tale
words[ 4] : Of
words[ 5] : Captain
words[ 6] : Jack
words[ 7] : Sparrow
words[ 8] : A
words[ 9] : Pirate
words[10] : So
words[11] : Brave
words[12] : On
words[13] : the
words[14] : Seven
words[15] : Seas.
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind
is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/fscanf_words dat/captnjack.txt
==12872== Memcheck, a memory error detector
==12872== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==12872== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==12872== Command: ./bin/fscanf_words dat/captnjack.txt
==12872==
words[ 0] : This
words[ 1] : is
words[ 2] : a
words[ 3] : tale
words[ 4] : Of
words[ 5] : Captain
words[ 6] : Jack
words[ 7] : Sparrow
words[ 8] : A
words[ 9] : Pirate
words[10] : So
words[11] : Brave
words[12] : On
words[13] : the
words[14] : Seven
words[15] : Seas.
==12872==
==12872== HEAP SUMMARY:
==12872== in use at exit: 0 bytes in 0 blocks
==12872== total heap usage: 23 allocs, 23 frees, 5,988 bytes allocated
==12872==
==12872== All heap blocks were freed -- no leaks are possible
==12872==
==12872== For counts of detected and suppressed errors, rerun with: -v
==12872== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
If you want a linked-list and want to get rid of the array, then see Singly Linked List of Strings With Sorted Insertion for an implementation.
Let me know if you have further questions.