What would be an efficient way of converting a delimited string into an array of strings in C (not C++)? For example, I might have:
char *input = \"valgrind
From the strsep(3)
manpage on OSX:
char **ap, *argv[10], *inputstring;
for (ap = argv; (*ap = strsep(&inputstring, " \t")) != NULL;)
if (**ap != '\0')
if (++ap >= &argv[10])
break;
Edited for arbitrary # of tokens:
char **ap, **argv, *inputstring;
int arglen = 10;
argv = calloc(arglen, sizeof(char*));
for (ap = argv; (*ap = strsep(&inputstring, " \t")) != NULL;)
if (**ap != '\0')
if (++ap >= &argv[arglen])
{
arglen += 10;
argv = realloc(argv, arglen);
ap = &argv[arglen-10];
}
Or something close to that. The above may not work, but if not it's not far off. Building a linked list would be more efficient than continually calling realloc
, but that's really besides the point - the point is how best to make use of strsep
.
What's about something like:
char* string = "valgrind --leak-check=yes --track-origins=yes ./a.out";
char** args = (char**)malloc(MAX_ARGS*sizeof(char*));
memset(args, 0, sizeof(char*)*MAX_ARGS);
char* curToken = strtok(string, " \t");
for (int i = 0; curToken != NULL; ++i)
{
args[i] = strdup(curToken);
curToken = strtok(NULL, " \t");
}
Were you remembering to malloc an extra byte for the terminating null that marks the end of string?
if you have all of the input in input
to begin with then you can never have more tokens than strlen(input)
. If you don't allow "" as a token, then you can never have more than strlen(input)/2
tokens. So unless input
is huge you can safely write.
char ** myarray = malloc( (strlen(input)/2) * sizeof(char*) );
int NumActualTokens = 0;
while (char * pToken = get_token_copy(input))
{
myarray[++NumActualTokens] = pToken;
input = skip_token(input);
}
char ** myarray = (char**) realloc(myarray, NumActualTokens * sizeof(char*));
As a further optimization, you can keep input
around and just replace spaces with \0 and put pointers into the input
buffer into myarray[]. No need for a separate malloc for each token unless for some reason you need to free them individually.
Looking at the other answers, for a beginner in C, it would look complex due to the tight size of code, I thought I would put this in for a beginner, it might be easier to actually parse the string instead of using strtok
...something like this:
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <ctype.h> char **parseInput(const char *str, int *nLen); void resizeptr(char ***, int nLen); int main(int argc, char **argv){ int maxLen = 0; int i = 0; char **ptr = NULL; char *str = "valgrind --leak-check=yes --track-origins=yes ./a.out"; ptr = parseInput(str, &maxLen); if (!ptr) printf("Error!\n"); else{ for (i = 0; i < maxLen; i++) printf("%s\n", ptr[i]); } for (i = 0; i < maxLen; i++) free(ptr[i]); free(ptr); return 0; } char **parseInput(const char *str, int *Index){ char **pStr = NULL; char *ptr = (char *)str; int charPos = 0, indx = 0; while (ptr++ && *ptr){ if (!isspace(*ptr) && *ptr) charPos++; else{ resizeptr(&ptr, ++indx); pStr[indx-1] = (char *)malloc(((charPos+1) * sizeof(char))+1); if (!pStr[indx-1]) return NULL; strncpy(pStr[indx-1], ptr - (charPos+1), charPos+1); pStr[indx-1][charPos+1]='\0'; charPos = 0; } } if (charPos > 0){ resizeptr(&pStr, ++indx); pStr[indx-1] = (char *)malloc(((charPos+1) * sizeof(char))+1); if (!pStr[indx-1]) return NULL; strncpy(pStr[indx-1], ptr - (charPos+1), charPos+1); pStr[indx-1][charPos+1]='\0'; } *Index = indx; return (char **)pStr; } void resizeptr(char ***ptr, int nLen){ if (*(ptr) == (char **)NULL){ *(ptr) = (char **)malloc(nLen * sizeof(char*)); if (!*(ptr)) perror("error!"); }else{ char **tmp = (char **)realloc(*(ptr),nLen); if (!tmp) perror("error!"); *(ptr) = tmp; } }
I slightly modified the code to make it easier. The only string function that I used was strncpy
..sure it is a bit long-winded but it does reallocate the array of strings dynamically instead of using a hard-coded MAX_ARGS, which means that the double pointer is already hogging up memory when only 3 or 4 would do, also which would make the memory usage efficient and tiny, by using realloc
, the simple parsing is covered by employing isspace
, as it iterates using the pointer. When a space is encountered, it realloc
ates the double pointer, and malloc
the offset to hold the string.
Notice how the triple pointers are used in the resizeptr
function.. in fact, I thought this would serve an excellent example of a simple C program, pointers, realloc, malloc, passing-by-reference, basic element of parsing a string...
Hope this helps, Best regards, Tom.