问题
I have 2 types of texts which I want to split them into words.
The first type of text file is just words divided by newline.
Milk
Work
Chair
...
The second type of text file is a text from a book , which has only whitespace. (No comas,question marks etc.)
And then she tried to run
but she was stunned by the view of
...
Do you know which is the best way to do it ?
I tried the following 2 ways but it seems I am getting segmentations.
For the first type of text I use:
while(fgets(line,sizeof(line),wordlist) != NULL)
{
/* Checks Words |
printf("%s",line);*/
InsertWord(W,line);/*Function that inserts the word to a tree*/
}
And for the second type of text I use:
while(fgets(line,sizeof(line),out) != NULL)
{
bp = line ;
while(1)
{
cp = strtok(bp," ");
bp = NULL ;
if(cp == NULL)
break;
/*printf("Word by Word : %s \n",cp);*/
CheckWord(Words, cp);/*Function that checks if the word from the book is the same with one in a tree */
}
}
Can you suggest anything better or correct me if am wrong on these one's ?
EDIT:(About segm.fault)
InsertWord is a function that inserts words into a tree. When I used this code:
for (i = 0 ; i <=2 ; i++)
{
if (i==0)
InsertWord(W,"A");
if (i==1)
InsertWord(W,"B");
if (i==2)
InsertWord(W,"c");
}*/
The tree inserts the words just fine and prints them , too, which means my tree works fine and it's functions(they were given by our teachers ,too). But when I try to do the same like that :
char this_word[15];
while (fscanf(wordlist, "%14s", this_word) == 1)
{
printf("Latest word that was read: '%s'\n", this_word);
InsertWord(W,this_word);
}
I am getting errors from tree.So, I guessed it was some kind of segmentation. Any ideas ?
回答1:
You want to read from a file, fgets() might come in mind.
You want to split into tokens by a delimiter (whitespace), strtok() should come in mind.
So, you could do it like this:
#include <stdio.h>
#include <string.h>
int main(void)
{
FILE * pFile;
char mystring [100];
char* pch;
pFile = fopen ("text_newlines.txt" , "r");
if (pFile == NULL) perror ("Error opening file");
else {
while ( fgets (mystring , 100 , pFile) != NULL )
printf ("%s", mystring);
fclose (pFile);
}
pFile = fopen ("text_wspaces.txt" , "r");
if (pFile == NULL) perror ("Error opening file");
else {
while ( fgets (mystring , 100 , pFile) != NULL ) {
printf ("%s", mystring);
pch = strtok (mystring," ");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ");
}
}
fclose (pFile);
}
return 0;
}
Output:
linux25:/home/users/grad1459>./a.out
Milk
Work
Chair
And then she tried to run
And
then
she
tried
to
run
but she was stunned by the view of
but
she
was
stunned
by
the
view
of
//newline here as well
回答2:
This is the type of input fscanf
and %s
was made for:
char this_word[15];
while (fscanf(tsin, "%14s", this_word) == 1) {
printf("Latest word that was read: '%s'.\n", this_word);
// Process the word...
}
回答3:
The easiest way might be to go character-by-character:
char word[50];
char *word_pos = word;
// Discard characters until the first word character
while ((ch = fgetch(out)) != EOF &&
ch != '\n' &&
ch != ' ');
do {
if (ch == '\n' || ch == ' ') {
*word_pos++ = '\0';
word_pos = word;
CheckWord(Words, word);
while ((ch = fgetch(out)) != EOF &&
ch != '\n' &&
ch != ' ');
}
*word_pos++ = ch;
} while ((ch = fgetch(out)) != EOF);
You're limitd by word
's size, and you'd need to add every stop character into the while
and the if
conditions.
来源:https://stackoverflow.com/questions/37317612/splitting-a-text-file-into-words-in-c