How does strtok() split the string into tokens in C?

前端 未结 15 1888
陌清茗
陌清茗 2020-11-22 14:48

Please explain to me the working of strtok() function. The manual says it breaks the string into tokens. I am unable to understand from the manual what it actua

相关标签:
15条回答
  • 2020-11-22 15:34

    This is how i implemented strtok, Not that great but after working 2 hr on it finally got it worked. It does support multiple delimiters.

    #include "stdafx.h"
    #include <iostream>
    using namespace std;
    
    char* mystrtok(char str[],char filter[]) 
    {
        if(filter == NULL) {
            return str;
        }
        static char *ptr = str;
        static int flag = 0;
        if(flag == 1) {
            return NULL;
        }
        char* ptrReturn = ptr;
        for(int j = 0; ptr != '\0'; j++) {
            for(int i=0 ; filter[i] != '\0' ; i++) {
                if(ptr[j] == '\0') {
                    flag = 1;
                    return ptrReturn;
                }
                if( ptr[j] == filter[i]) {
                    ptr[j] = '\0';
                    ptr+=j+1;
                    return ptrReturn;
                }
            }
        }
        return NULL;
    }
    
    int _tmain(int argc, _TCHAR* argv[])
    {
        char str[200] = "This,is my,string.test";
        char *ppt = mystrtok(str,", .");
        while(ppt != NULL ) {
            cout<< ppt << endl;
            ppt = mystrtok(NULL,", ."); 
        }
        return 0;
    }
    
    0 讨论(0)
  • 2020-11-22 15:36

    strtok doesn't change the parameter itself (str). It stores that pointer (in a local static variable). It can then change what that parameter points to in subsequent calls without having the parameter passed back. (And it can advance that pointer it has kept however it needs to perform its operations.)

    From the POSIX strtok page:

    This function uses static storage to keep track of the current string position between calls.

    There is a thread-safe variant (strtok_r) that doesn't do this type of magic.

    0 讨论(0)
  • 2020-11-22 15:40

    Here is my implementation which uses hash table for the delimiter, which means it O(n) instead of O(n^2) (here is a link to the code):

    #include<stdio.h>
    #include<stdlib.h>
    #include<string.h>
    
    #define DICT_LEN 256
    
    int *create_delim_dict(char *delim)
    {
        int *d = (int*)malloc(sizeof(int)*DICT_LEN);
        memset((void*)d, 0, sizeof(int)*DICT_LEN);
    
        int i;
        for(i=0; i< strlen(delim); i++) {
            d[delim[i]] = 1;
        }
        return d;
    }
    
    
    
    char *my_strtok(char *str, char *delim)
    {
    
        static char *last, *to_free;
        int *deli_dict = create_delim_dict(delim);
    
        if(!deli_dict) {
            /*this check if we allocate and fail the second time with entering this function */
            if(to_free) {
                free(to_free);
            }
            return NULL;
        }
    
        if(str) {
            last = (char*)malloc(strlen(str)+1);
            if(!last) {
                free(deli_dict);
                return NULL;
            }
            to_free = last;
            strcpy(last, str);
        }
    
        while(deli_dict[*last] && *last != '\0') {
            last++;
        }
        str = last;
        if(*last == '\0') {
            free(deli_dict);
            free(to_free);
            deli_dict = NULL;
            to_free = NULL;
            return NULL;
        }
        while (*last != '\0' && !deli_dict[*last]) {
            last++;
        }
    
        *last = '\0';
        last++;
    
        free(deli_dict);
        return str;
    }
    
    int main()
    {
        char * str = "- This, a sample string.";
        char *del = " ,.-";
        char *s = my_strtok(str, del);
        while(s) {
            printf("%s\n", s);
            s = my_strtok(NULL, del);
        }
        return 0;
    }
    
    0 讨论(0)
  • 2020-11-22 15:41

    strtok will tokenize a string i.e. convert it into a series of substrings.

    It does that by searching for delimiters that separate these tokens (or substrings). And you specify the delimiters. In your case, you want ' ' or ',' or '.' or '-' to be the delimiter.

    The programming model to extract these tokens is that you hand strtok your main string and the set of delimiters. Then you call it repeatedly, and each time strtok will return the next token it finds. Till it reaches the end of the main string, when it returns a null. Another rule is that you pass the string in only the first time, and NULL for the subsequent times. This is a way to tell strtok if you are starting a new session of tokenizing with a new string, or you are retrieving tokens from a previous tokenizing session. Note that strtok remembers its state for the tokenizing session. And for this reason it is not reentrant or thread safe (you should be using strtok_r instead). Another thing to know is that it actually modifies the original string. It writes '\0' for teh delimiters that it finds.

    One way to invoke strtok, succintly, is as follows:

    char str[] = "this, is the string - I want to parse";
    char delim[] = " ,-";
    char* token;
    
    for (token = strtok(str, delim); token; token = strtok(NULL, delim))
    {
        printf("token=%s\n", token);
    }
    

    Result:

    this
    is
    the
    string
    I
    want
    to
    parse
    
    0 讨论(0)
  • 2020-11-22 15:43

    To understand how strtok() works, one first need to know what a static variable is. This link explains it quite well....

    The key to the operation of strtok() is preserving the location of the last seperator between seccessive calls (that's why strtok() continues to parse the very original string that is passed to it when it is invoked with a null pointer in successive calls)..

    Have a look at my own strtok() implementation, called zStrtok(), which has a sligtly different functionality than the one provided by strtok()

    char *zStrtok(char *str, const char *delim) {
        static char *static_str=0;      /* var to store last address */
        int index=0, strlength=0;           /* integers for indexes */
        int found = 0;                  /* check if delim is found */
    
        /* delimiter cannot be NULL
        * if no more char left, return NULL as well
        */
        if (delim==0 || (str == 0 && static_str == 0))
            return 0;
    
        if (str == 0)
            str = static_str;
    
        /* get length of string */
        while(str[strlength])
            strlength++;
    
        /* find the first occurance of delim */
        for (index=0;index<strlength;index++)
            if (str[index]==delim[0]) {
                found=1;
                break;
            }
    
        /* if delim is not contained in str, return str */
        if (!found) {
            static_str = 0;
            return str;
        }
    
        /* check for consecutive delimiters
        *if first char is delim, return delim
        */
        if (str[0]==delim[0]) {
            static_str = (str + 1);
            return (char *)delim;
        }
    
        /* terminate the string
        * this assignmetn requires char[], so str has to
        * be char[] rather than *char
        */
        str[index] = '\0';
    
        /* save the rest of the string */
        if ((str + index + 1)!=0)
            static_str = (str + index + 1);
        else
            static_str = 0;
    
            return str;
    }
    

    And here is an example usage

      Example Usage
          char str[] = "A,B,,,C";
          printf("1 %s\n",zStrtok(s,","));
          printf("2 %s\n",zStrtok(NULL,","));
          printf("3 %s\n",zStrtok(NULL,","));
          printf("4 %s\n",zStrtok(NULL,","));
          printf("5 %s\n",zStrtok(NULL,","));
          printf("6 %s\n",zStrtok(NULL,","));
    
      Example Output
          1 A
          2 B
          3 ,
          4 ,
          5 C
          6 (null)
    

    The code is from a string processing library I maintain on Github, called zString. Have a look at the code, or even contribute :) https://github.com/fnoyanisi/zString

    0 讨论(0)
  • 2020-11-22 15:43

    So, this is a code snippet to help better understand this topic.

    Printing Tokens

    Task: Given a sentence, s, print each word of the sentence in a new line.

    char *s;
    s = malloc(1024 * sizeof(char));
    scanf("%[^\n]", s);
    s = realloc(s, strlen(s) + 1);
    //logic to print the tokens of the sentence.
    for (char *p = strtok(s," "); p != NULL; p = strtok(NULL, " "))
    {
        printf("%s\n",p);
    }
    

    Input: How is that

    Result:

    How
    is
    that
    

    Explanation: So here, "strtok()" function is used and it's iterated using for loop to print the tokens in separate lines.

    The function will take parameters as 'string' and 'break-point' and break the string at those break-points and form tokens. Now, those tokens are stored in 'p' and are used further for printing.

    0 讨论(0)
提交回复
热议问题