String tokenizer without using strtok()

半世苍凉 提交于 2019-12-22 10:45:45

问题


I'm in the process of writing a string tokenizer without using strtok(). This is mainly for my own betterment and for a greater understanding of pointers. I think I almost have it, but I've been receiving the following errors:

myToc.c:25 warning: assignment makes integer from pointer without a cast
myToc.c:35 (same as above)
myToc.c:44 error: invalid type argument of 'unary *' (have 'int')

What I'm doing is looping through the string sent to the method, finding each delimiter, and replacing it with '\0.' The "ptr" array is supposed to have pointers to the separated substrings. This is what I have so far.

#include <string.h>

void myToc(char * str){
   int spcCount = 0;
   int ptrIndex = 0;

   int n = strlen(str);

   for(int i = 0; i < n; i++){
      if(i != 0 && str[i] == ' ' && str[i-1] != ' '){
         spcCount++;
      }
   }

   //Pointer array; +1 for \0 character, +1 for one word more than number of spaces
   int *ptr = (int *) calloc(spcCount+2, sizeof(char));
   ptr[spcCount+1] = '\0';
   //Used to differentiate separating spaces from unnecessary ones
   char temp;

   for(int j = 0; j < n; j++){
      if(j == 0){
         /*Line 25*/ ptr[ptrIndex] = &str[j];
         temp = str[j];
         ptrIndex++;
      }
      else{
         if(str[j] == ' '){
            temp = str[j];
            str[j] = '\0';
         }
         else if(str[j] != ' ' && str[j] != '\0' && temp == ' '){
            /*Line 35*/ ptr[ptrIndex] = &str[j];
            temp = str[j];
            ptrIndex++;
         }
      }
   }

   int k = 0;
   while(ptr[k] != '\0'){
      /*Line 44*/ printf("%s \n", *ptr[k]);
      k++;
   }
}

I can see where the errors are occurring but I'm not sure how to correct them. What should I do? Am I allocating memory correctly or is it just an issue with how I'm specifying the addresses?


回答1:


You pointer array is wrong. It looks like you want:

char **ptr =  calloc(spcCount+2, sizeof(char*));

Also, if I am reading your code correctly, there is no need for the null byte as this array is not a string.

In addition, you'll need to fix:

while(ptr[k] != '\0'){
  /*Line 44*/ printf("%s \n", *ptr[k]);
  k++;
}

The dereference is not required and if you remove the null ptr, this should work:

for ( k = 0; k < ptrIndex; k++ ){
  /*Line 44*/ printf("%s \n", ptr[k]);
}



回答2:


#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void myToc(char * str){
    int spcCount = 0;
    int ptrIndex = 0;

    int n = strlen(str);

    for(int i = 0; i < n; i++){
        if(i != 0 && str[i] == ' ' && str[i-1] != ' '){
            spcCount++;
        }
    }

    char **ptr = calloc(spcCount+2, sizeof(char*));
    //ptr[spcCount+1] = '\0';//0 initialized by calloc 
    char temp = ' ';//can simplify the code

    for(int j = 0; j < n; j++){
        if(str[j] == ' '){
            temp = str[j];
            str[j] = '\0';
        } else if(str[j] != '\0' && temp == ' '){//can omit `str[j] != ' ' &&`
            ptr[ptrIndex++] = &str[j];
            temp = str[j];
        }
    }

    int k = 0;
    while(ptr[k] != NULL){//better use NULL
        printf("%s \n", ptr[k++]);
    }
    free(ptr);
}

int main(){
    char test1[] = "a b c";
    myToc(test1);
    char test2[] = "hello world";
    myToc(test2);
    return 0;
}



回答3:


Update: I tried this at http://www.compileonline.com/compile_c99_online.php with the fixes for lines 25, 35, and 44, and with a main function that called myToc() twice. I initially encountered segfaults when trying to write null characters to str[], but that was only because the strings I was passing were (apparently non-modifiable) literals. The code below worked as desired when I allocated a text buffer and wrote the strings there before passing them in. This version also could be modified to return the array of pointers, which then would point to the tokens.

(The code below also works even when the string parameter is non-modifiable, as long as myToc() makes a local copy of the string; but that would not have the desired effect if the purpose of the function is to return the list of tokens rather than just print them.)

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void myToc(char * str){
   int spcCount = 0;
   int ptrIndex = 0;

   int n = strlen(str);

   for(int i = 0; i < n; i++){
      if(i != 0 && str[i] == ' ' && str[i-1] != ' '){
         spcCount++;
      }
   }

   //Pointer array;  +1 for one word more than number of spaces
   char** ptr = (char**) calloc(spcCount+2, sizeof(char*));
   //Used to differentiate separating spaces from unnecessary ones
   char temp;

   for(int j = 0; j < n; j++){
      if(j == 0){
         ptr[ptrIndex] = &str[j];
         temp = str[j];
         ptrIndex++;
      }
      else{
         if(str[j] == ' '){
            temp = str[j];
            str[j] = '\0';
         }
         else if(str[j] != ' ' && str[j] != '\0' && temp == ' '){
            ptr[ptrIndex] = &str[j];
            temp = str[j];
            ptrIndex++;
         }
      }
   }

   for (int k = 0; k < ptrIndex; ++k){
      printf("%s \n", ptr[k]);
   }
}

int main (int n, char** v)
{
  char text[256];
  strcpy(text, "a b c");
  myToc(text);
  printf("-----\n");
  strcpy(text, "hello world");
  myToc(text);
}

I would prefer simpler code, however. Basically you want a pointer to the first non-blank character in str[], then a pointer to each non-blank (other than the first) that is preceded by a blank. Your first loop almost gets this idea except it is looking for blanks preceded by non-blanks. (Also you could start that loop at i = 1 and avoid having to test i != 0 on each iteration.)

I might just allocate an array of char* of size sizeof(char*) * (n + 1)/2 to hold the pointers rather than looping over the string twice (that is, I'd omit the first loop, which is just to figure out the size of the array). In any case, if ptr[0] is non-blank I would write its address to the array; then looping for (int j = 1; j < n; ++j), write the address of str[j] to the array if str[j] is non-blank and str[j - 1] is blank--basically what you are doing, but with fewer ifs and fewer auxiliary variables. Less code means less opportunity to introduce a bug, as long as the code is clean and makes sense.

Previous remarks:

int *ptr = declares an array of int. For an array of pointers to char, you want

char** ptr = (char**) calloc(spcCount+2, sizeof(char*));

The comment prior to that line also seems to indicate some confusion. There is no terminating null in your array of pointers, and you don't need to allocate space for one, so possibly spcCount+2 could be spcCount + 1.

This also is suspect:

while(ptr[k] != '\0')

It looks like it would work, given the way you used calloc (you do need spcCount+2 to make this work), but I would feel more secure writing something like this:

for (k = 0; k < ptrIndex; ++k)

I do not thing that is what caused the segfault, it just makes me a little uneasy to compare a pointer (ptr[k]) with \0 (which you would normally compare against a char).



来源:https://stackoverflow.com/questions/25752442/string-tokenizer-without-using-strtok

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!