I have snprintf
and it can avoid a buffer overflow, but why there is no function called snscanf
?
Code:
int main()
{
ch
In sscanf(s, format, ...)
, the the array of characters scanned is a const char *
. There is no writing to s
. The scanning stops when s[i]
is NUL. Little need for an n
parameter as an auxiliary limit to the scan.
In sprintf(s, format, ...)
, the array s
is a destination. snprintf(s, n, format, ...)
insures that data is not wriiten to s[n]
and beyond.
What would be useful is a flag extension to sscanf()
conversion specifiers so a limit could easily specified at compile time. (It can be done in a cumbersome fashion today, below, with a dynamic format or with sscanf(src,"%4s",buf1)
.)
// This is a proposed idea for C. - Not valid code today.
sscanf(src, "%!s", sizeof(buf1), buf)
Here !
would tell sscanf()
to read a size_t
variable for the size limit the upcoming string. Maybe in C17?
Cumbersome method that works today.
char * src = "helloeveryone";
char buf1[5];
char format[1+20+1+1];
sprintf(format, "%%" "%zu" "s", sizeof(buf1) - 1);
sscanf(src, format, buf1);
a little more wrinkles. the 'n' usually refers to the first argument in the snprintf. Now, it is true that the first string argument in sscanf is not written to. However, it is read. Thus, the following could segfault:
char s[2];
s[0]='1'; s[1]='3';
int x;
sscanf(s, "%d", &x);
because stepping one char beyond s could inadvertently step into reading from undefined memory (or continue the integer from another variable). so, something like this would be useful:
snscanf(s, 2, "%d", &x);
s is not a string, of course, but it is a character array. the 'n' in the snscanf would prevent overstepping (reading from) the first (source string) argument, and not be related to the destination argument.
the way to avoid this is to first make sure that s is terminated by a '\0' within 2 characters. you can't use strlen, of course. you need strnlen, and a test whether it is less than 2. if it is 2, then more copying effort is needed first.
There's no need for an snscanf()
because there's no writing to the first buffer argument. The buffer length in snprintf()
specifies the size of the buffer where the writing goes to:
char buffer[256];
snprintf(buffer, sizeof(buffer), "%s:%d", s, n);
The buffer in the corresponding position for sscanf()
is a null-terminated string; there's no need for an explicit length as you aren't going to write to it (it's a const char * restrict buffer
in C99 and C11).
char buffer[256];
char string[100];
int n;
if (sscanf(buffer, "%s %d", string, &n) != 2)
...oops...
In the output, you are already expected to specify the length of the strings (though you're probably in the majority if you use %s
rather than %99s
or whatever is strictly appropriate):
if (sscanf(buffer, "%99s %d", string, &n) != 2)
...oops...
It would be nice/useful if you could use %*s
as you can with snprintf()
, but you can't — in sscanf()
, the *
means 'do not assign scanned value', not the length. Note that you wouldn't write snscanf(src, sizeof(buf1), "%s", buf1)
, not least because you can have multiple %s
conversion specifications in a single call. Writing snscanf(src, sizeof(buf1), sizeof(buf2), "%s %s", buf1, buf2)
makes no sense, not least because it leaves an insoluble problem in parsing the varargs list. It would be nice to have a notation such as snscanf(src, "%@s %@s", sizeof(buf1), buf1, sizeof(buf2), buf2)
to obviate the need to specify the field size (minus one) in the format string. Unfortunately, you can't do that with sscanf()
et al now.
Annex K of ISO/IEC 9899:2011 (previously TR24731) provides sscanf_s()
, which does take lengths for character strings, and which might be used as:
if (sscanf_s(buffer, "%s %d", string, sizeof(string), &n) != 2)
...oops...
(Thanks to R.. for reminding me of this theoretical option — theoretically because only Microsoft has implemented the 'safe' functions, and they did not implement them exactly as the standard requires.)
Note that §K.3.3 Common definitions <stddef.h>
says: '... The type is rsize_t
which is the type size_t
.385)' (and footnote 385 says: 'See the description of the RSIZE_MAX macro in <stdint.h>.
' That means that in fact you can pass size_t
without needing a cast — as long as the value passed is within the range defined by RSIZE_MAX
in <stdint.h>
. (The general intention is that RSIZE_MAX
is a largish number but smaller than SIZE_MAX
. For more details, read the 2011 standard, or get TR 24731 from the Open Standards web site.)
How to use sscanf correctly and safely
Note that fnprintf is not alone, and most array functions have a secure variation.
Why don't you try fgets()
(with the standard input file stdin
)?
fgets()
lets you to specify the maximum size for your buffer.
(In all what follows, I'll be using standard ISO C99 compatible syntax.)
Thus, you can write this code:
#include <stdio.h>
#define MAXBUFF 20 /* Small just for testing... */
int main(void) {
char buffer[MAXBUFF+1]; /* Add 1 byte since fgets() inserts '\0' at end */
fgets(buffer, MAXBUFF+1, stdin);
printf("Your input was: %s\n", buffer);
return 0;
}
fgets()
reads at most MAXBUFF characters from stdin
,
which is the standard input (that means: the keyboard).
The result is held in the array buffer
.
If a '\n' character is found, the reading stops and '\n' is also held in buffer
(as the last character). In addition, always a '\0' is added at the end of buffer
, so enough storage is needed.
You can use a combination of fgets()
followed by sscanf()
in order to process the string:
char buffer[MAXBUFF+1];
fgets(buffer, MAXBUFF+1, stdin); /* Plain read */
int x; float f;
sscanf(buffer, "%d %g", &x, &f); /* Specialized read */
Thus, you have a "safe" scanf()
-like method.
Note: This approach has a potencial problem. If fgets()
reachs MAXBUFF characters before the end-of-line character '\n' is obtained, the rest of the input will not be discarded, and it will be taken as part of the next keyboard reading.
Hence, one has to add a flush mechanism, that actually is very simple:
while(getchar()!'\n')
; /* Flushing stdin... */
However: If you just add that last piece of code after the fgets()
line,
the user will be forced two press ENTER two times each time (s)he enters less than MAXBUFF characters. Worst: this is the most typical situation!
To fix this new problem, observe that an easy logical condition completeley equivalent to the fact that the character '\n' was not reached, is the following:
(buffer[MAXBUFF - 1] != '\0') && (buffer[MAXBUFF - 1] != '\n')
(Prove it!)
Thus, we write:
fgets(buffer, maxb+1, stdin);
if ((buffer[MAXBUFF - 1] != '\0') && (buffer[MAXBUFF - 1] != '\n'))
while(getchar() != '\n')
;
A final touch is needed: since the array buffer could have garbadge,
it seems that some kind of initialization is needed.
However, let us observe that only the position [MAXBUFF - 1]
has to be cleaned:
char buffer[MAXBUFF + 1] = { [MAXBUFF - 1] = '\0' }; /* ISO C99 syntax */
Finally, we can gather all that facts in a quick macro, like this program shows:
#include <stdio.h>
#define safe_scanf(fmt, maxb, ...) { \
char buffer[maxb+1] = { [maxb - 1] = '\0' }; \
fgets(buffer, maxb+1, stdin); \
if ((buffer[maxb - 1] != '\0') && (buffer[maxb - 1] != '\n')) \
while(getchar() != '\n') \
; \
sscanf(buffer, fmt, __VA_ARGS__); \
}
#define MAXBUFF 20
int main(void) {
int x; float f;
safe_scanf("%d %g", MAXBUFF+1, &x, &f);
printf("Your input was: x == %d\t\t f == %g", x, f);
return 0;
}
It has been used the mechanism of variable number of parameters in a macro,
under the ISO C99 norms: Variadic macros
__VA_ARGS__
replaces the variable list of parameters.
(We need variable number of parameters in order to mimic the scanf()
-like behaviour.)
Notes: The macro-body was enclosed inside a block with { }. This is not completely satisfactory, and it is easily improved, but it is part of another topic...
In particular, the macro safe_scanf()
does not "return" a value (it is not an expression, but a block statement).
Remark: Inside the macro I have declared an array buffer
which is created at the time of entering the block, and then is destroyed when the block is exited. The scope of buffer
is limited to the block of the macro.
The controversial (and optional) Annex K to C11 adds a sscanf_s
function which takes an additional argument of type rsize_t
(also defined in Annex K) after the pointer argument, specifying the size of the pointed-to array. For better or worse, these functions are not widely supported. You can achieve the same results by putting the size in the conversion specifier, e.g.
char out[20];
sscanf(in, "%19s", out);
but this is awkward and error-prone if the size of the destination object may vary at runtime (you would have to construct the conversion specifier programmatically with snprintf
). Note that the field width in the conversion specifier is the maximum number of input characters to read, and sscanf
also writes a terminating null byte for %s
conversions, so the field width you pass must be strictly less than the size of the destination object.