How to get scanf to continue with empty scanset

跟風遠走 提交于 2021-02-04 06:24:29

问题


I am currently trying to parse UnicodeData.txt with this format: ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.html However, I am hitting a problem in that when I try to read, say a line like the following.

something;123D;;LINE TABULATION;

I try to get the data from the fields by code such as the following. The problem is that fields[3] is not getting filled in, and scanf is returning 2. in is the current line.

char fields[4][256];
sscanf(in, "%[^;];%[^;];%[^;];%[^;];%[^;];",
    fields[0], fields[1], fields[2], fields[3]);

I know this is the correct implementation of scanf(), but is there a way to get this to work, short of making my own scanf()?


回答1:


scanf does not handle "empty" fields. So you will have to parse it on your own.

The following solution is:

  • fast, as it uses strchr rather than the quite slow sscanf
  • flexible, as it will detect an arbitrary number of fields, up to a given maximum.

The function parse extracts fields from the input str, separated by semi-colons. Four semi-colons give five fields, some or all of which can be blank. No provision is made for escaping the semi-colons.

#include <stdio.h>
#include <string.h>

static int parse(char *str, char *out[], int max_num) {
    int num = 0;
    out[num++] = str;
    while (num < max_num && str && (str = strchr(str, ';'))) {
        *str = 0;           // nul-terminate previous field
        out[num++] = ++str; // save start of next field
    }
    return num;
}

int main(void) {
    char test[] = "something;123D;;LINE TABULATION;";
    char *field[99];
    int num = parse(test, field, 99);
    int i;
    for (i = 0; i < num; i++)
        printf("[%s]", field[i]);
    printf("\n");
    return 0;
}

The output of this test program is:

[something][123D][][LINE TABULATION][]

Update: A slightly shorter version, which doesn't require an extra array to store the start of each substring, is:

#include <stdio.h>
#include <string.h>

static int replaceSemicolonsWithNuls(char *p) {
    int num = 0;
    while ((p = strchr(p, ';'))) {
        *p++ = 0;
        num++; 
    }
    return num;
}

int main(void) {
    char test[] = "something;123D;;LINE TABULATION;";
    int num = replaceSemicolonsWithNuls(test);
    int i;
    char *p = test;
    for (i = 0; i < num; i++, p += strlen(p) + 1)
        printf("[%s]", p);
    printf("\n");
    return 0;
}



回答2:


Just in case you would like to consider this following alternative, using scanfs and "%n" format-specifier, used for reading in how many characters have been read by far, into an integer:

#include <stdio.h>
#define N 4

int main( ){

    char * str = "something;123D;;LINE TABULATION;";
    char * wanderer = str;
    char fields[N][256] = { 0 };
    int n;

    for ( int i = 0; i < N; i++ ) {
        n = 0;
        printf( "%d ", sscanf( wanderer, "%255[^;]%n", fields[i], &n ) );
        wanderer += n + 1;
    }

    putchar( 10 );

    for ( int i = 0; i < N; i++ )
        printf( "%d: %s\n", i, fields[i] );

    getchar( );
    return 0;
}

On every cycle, it reads maximum of 255 characters into the corresponding fields[i], until it encounters a delimiter semicolon ;. After reading them, it reads in how many characters it had read, into the n, which had been zeroed (oh my...) beforehand.

It increases the pointer that points to the string by the amount of characters read, plus one for the delimiter semicolon.

printf for the return value of sscanf, and the printing of the result is just for demonstration purposes. You can see the code working on http://codepad.org/kae8smPF without the getchar(); and with for declaration moved outside for C90 compliance.




回答3:


I don't think sscanf will do what you need: sscanf format %[^;] will match a non-empty sequence of not-semicolon characters. The alternative would be using readline with the separator being ';', like:

#include <iostream>
#include <sstream>
#include <string>

int main() {
  using namespace std;
  istringstream i { "something;123D;;LINE TABULATION;\nsomething;123D;;LINE TABULATION;\nsomething;123D;;LINE TABULATION;\n" };
  string a, b, c, d, newline;
  while( getline(i, a, ';') && getline(i, b, ';') && getline(i, c, ';') && getline (i, d, ';') && getline(i, newline) )
    cout << d << ',' << c << '-' << b << ':' << a << endl; 
}

(I have only seen you took the c++ tag off this question now, if your problem is c-only, I have another solution, below:)

#include <string.h>
#include <stdio.h>

int main() {
  typedef char buffer[2048];
  buffer line;
  while( fgets(line, sizeof(line), stdin) > 0 ) {
    printf("(%s)\n", line);
    char *end = line;
    char *s1 = *end == ';' ? (*end = '\0'), end++ : strtok_r(end, ";", &end);
    char *s2 = *end == ';' ? (*end = '\0'), end++ : strtok_r(end, ";", &end);
    char *s3 = *end == ';' ? (*end = '\0'), end++ : strtok_r(end, ";", &end);
    char *s4 = *end == ';' ? (*end = '\0'), end++ : strtok_r(end, ";", &end);
    printf("[%s][%s][%s][%s]\n", s4, s3, s2, s1);
  }
}


来源:https://stackoverflow.com/questions/22974628/how-to-get-scanf-to-continue-with-empty-scanset

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!