Splitting a text file into words in C

那年仲夏 提交于 2019-12-24 00:54:29

问题


I have 2 types of texts which I want to split them into words.

The first type of text file is just words divided by newline.

Milk
Work
Chair
...

The second type of text file is a text from a book , which has only whitespace. (No comas,question marks etc.)

And then she tried to run 
but she was stunned by the view of 
...

Do you know which is the best way to do it ?

I tried the following 2 ways but it seems I am getting segmentations.

For the first type of text I use:

while(fgets(line,sizeof(line),wordlist) != NULL)
{
    /* Checks Words |
    printf("%s",line);*/
    InsertWord(W,line);/*Function that inserts the word to a tree*/
}

And for the second type of text I use:

while(fgets(line,sizeof(line),out) != NULL)
{
    bp = line ;
    while(1)
    {
        cp = strtok(bp," ");
        bp = NULL ;

        if(cp == NULL)
            break;

        /*printf("Word by Word : %s \n",cp);*/
        CheckWord(Words, cp);/*Function that checks if the word from the book is the same with one in a tree */
    }
}

Can you suggest anything better or correct me if am wrong on these one's ?

EDIT:(About segm.fault)

InsertWord is a function that inserts words into a tree. When I used this code:

for (i = 0 ; i <=2 ; i++)
{
    if (i==0)
        InsertWord(W,"A");
    if (i==1)
        InsertWord(W,"B");
    if (i==2)
        InsertWord(W,"c");
}*/

The tree inserts the words just fine and prints them , too, which means my tree works fine and it's functions(they were given by our teachers ,too). But when I try to do the same like that :

char this_word[15];
while (fscanf(wordlist, "%14s", this_word) == 1) 
{
    printf("Latest word that was read: '%s'\n", this_word);
    InsertWord(W,this_word);
}

I am getting errors from tree.So, I guessed it was some kind of segmentation. Any ideas ?


回答1:


You want to read from a file, fgets() might come in mind.

You want to split into tokens by a delimiter (whitespace), strtok() should come in mind.


So, you could do it like this:

#include <stdio.h>
#include <string.h>

int main(void)
{
   FILE * pFile;
   char mystring [100];
   char* pch;

   pFile = fopen ("text_newlines.txt" , "r");
   if (pFile == NULL) perror ("Error opening file");
   else {
     while ( fgets (mystring , 100 , pFile) != NULL )
       printf ("%s", mystring);
     fclose (pFile);
   }

   pFile = fopen ("text_wspaces.txt" , "r");
   if (pFile == NULL) perror ("Error opening file");
   else {
     while ( fgets (mystring , 100 , pFile) != NULL ) {
       printf ("%s", mystring);
       pch = strtok (mystring," ");
       while (pch != NULL)
       {
         printf ("%s\n",pch);
         pch = strtok (NULL, " ");
       }
     }
     fclose (pFile);
   }

   return 0;
}

Output:

linux25:/home/users/grad1459>./a.out
Milk
Work
Chair
And then she tried to run 
And
then
she
tried
to
run


but she was stunned by the view of
but
she
was
stunned
by
the
view
of
//newline here as well



回答2:


This is the type of input fscanf and %s was made for:

char this_word[15];
while (fscanf(tsin, "%14s", this_word) == 1) {
    printf("Latest word that was read: '%s'.\n", this_word);
    // Process the word...
}



回答3:


The easiest way might be to go character-by-character:

char word[50];
char *word_pos = word;

// Discard characters until the first word character
while ((ch = fgetch(out)) != EOF &&
        ch != '\n' &&
        ch != ' ');

do {
    if (ch == '\n' || ch == ' ') {
        *word_pos++ = '\0';
        word_pos = word;
        CheckWord(Words, word);

        while ((ch = fgetch(out)) != EOF &&
                ch != '\n' &&
                ch != ' ');
    }

    *word_pos++ = ch;
} while ((ch = fgetch(out)) != EOF);

You're limitd by word's size, and you'd need to add every stop character into the while and the if conditions.



来源:https://stackoverflow.com/questions/37317612/splitting-a-text-file-into-words-in-c

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!