Split String in C to recognize consecutive tabs

巧了我就是萌 提交于 2021-02-17 04:31:12

问题


I have a file that has certain fields separated by tabs. There will always be 17 tabs but there order can vary, such as..

75104\tDallas\t85\t34.46\t45.64
75205\tHouston\t\t37.34\t87.32
93434\t\t\t1.23\t3.32

When I use strtok in the following fashion

    while (fgets(buf, sizeof(buf), fp) != NULL) {
    tok = strtok(buf,"\t");

    while(tok != NULL) {
        printf("%s->",tok);
        tok = strtok(NULL,"\t");
    }
}

I get all the tokens, but double tabs \t\t or more are ignored. However, I need to know when a field is empty, I cannot have strtok ignore multiple tabs because the structure depends on 17 tabs being counted, using a placeholder if a field is empty.

I've tried dealing with the problem with an

if(tok == NULL || '')

but I don't think strtok recognizes a tab after a tab. What is the best way to deal with this issue?


回答1:


You can't use strtok in your case. From man strtok:

The strtok() function breaks a string into a sequence of zero or more nonempty tokens ... From the above description, it follows that a sequence of two or more contiguous delimiter bytes in the parsed string is considered to be a single delimiter, and that delimiter bytes at the start or end of the string are ignored. Put another way: the tokens returned by strtok() are always nonempty strings. Thus, for example, given the string "aaa;;bbb,", successive calls to strtok() that specify the delimiter string ";," would return the strings "aaa" and "bbb", and then a null pointer

So you will have to find an alternative, which could either be manually writing a function that uses linear search and strncpy, or sscanf or using strsep, if it is available. The latter would very likely be my choice, because it was intended as replacement for strtok.

From man strsep:

The strsep() function was introduced as a replacement for strtok(3), since the latter cannot handle empty fields. However, strtok(3) con‐ forms to C89/C99 and hence is more portable.




回答2:


Here's a solution using strsep, which was introduced specifically to address the fact that strtok skips over consecutive delimiters:

char *cur, *nxt;
while (fgets(buf, sizeof(buf), fp) != NULL)
{
    nxt = buf;
    while ((cur = strsep(&nxt, "\t")) != NULL)
    {
        printf("%s->",cur);
    }
}

NOTE: the string passed to strsep must be writable (passing a literal string specifically does not work). It will be modified by strsep (delimiters are overwritten with NUL characters on consecutive calls).




回答3:


A good approach to start digesting how this could be implemented , the below function will do that , read it please. :

int splitLine(char *buf, char **argv, int max_args)
{
    int arg;

    /* skip over initial spaces */
    while (isspace(*buf)) buf++;

    for (arg = 0; arg < max_args
        && *buf != '\0'; arg++) {
        argv[arg] = buf;
        /* skip past letters in word */
        while (*buf != '\0'
            && !isspace(*buf)) {
            buf++;
        }
        /* if not at line's end, mark
        * word's end and continue */
        if (*buf != '\0') {
            *buf = '\0';
            buf++;
        }
        /* skip over extra spaces */
        while (isspace(*buf)) buf++;
    }
    return arg;
}

This function use a space separator, you can reimplement to use any other one.



来源:https://stackoverflow.com/questions/36319131/split-string-in-c-to-recognize-consecutive-tabs

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!