问题
I'm implementing a function which, given a string, a character and another string (since now we can call it the "substring"); puts the substring everywhere the character is in the string. To explain me better, given these parameters this is what the function should return (pseudocode):
func ("aeiou", 'i', "hello") -> aehelloou
I'm using some functions from string.h
lib. I have tested it with pretty good result:
char *somestring= "this$ is a tes$t wawawa$wa";
printf("%s", strcinsert(somestring, '$', "WHAT?!") );
Outputs: thisWHAT?! is a tesWHAT?!t wawawaWHAT?!wa
so for now everything is allright. The problem is when I try to do the same with, for example this string:
char *somestring= "this \"is a test\" wawawawa";
printf("%s", strcinsert(somestring, '"', "\\\"") );
since I want to change every "
for a \"
. When I do this, the PC collapses. I don't know why but it stops working and then shutdown. I've head some about the bad behavior of some functions of the string.h
lib but I couldn't find any information about this, I really thank any help.
My code:
#define salloc(size) (str)malloc(size+1) //i'm lazy
typedef char* str;
str strcinsert (str string, char flag, str substring)
{
int nflag= 0; //this is the number of times the character appears
for (int i= 0; i<strlen(string); i++)
if (string[i]==flag)
nflag++;
str new=string;
int pos;
while (strchr(string, flag)) //since when its not found returns NULL
{
new= salloc(strlen(string)+nflag*strlen(substring)-nflag);
pos= strlen(string)-strlen(strchr(string, flag));
strncpy(new, string, pos);
strcat(new, substring);
strcat(new, string+pos+1);
string= new;
}
return new;
}
Thanks for any help!
回答1:
Some advices:
- refrain from
typedef char* str;
. Thechar *
type is common in C and masking it will just make your code harder to be reviewed - refrain from
#define salloc(size) (str)malloc(size+1)
for the exact same reason. In addition don't castmalloc
in C - each time you write a
malloc
(orcalloc
orrealloc
) there should be a correspondingfree
: C has no garbage collection - dynamic allocation is expensive, use it only when needed. Said differently a
malloc
inside a loop should be looked at twice (especially if there is no correspondingfree
) - always test allocation function (unrelated: and io) a malloc will simply return NULL when you exhaust memory. A nice error message is then easier to understand than a crash
- learn to use a debugger: if you had executed your code under a debugger the error would have been evident
Next the cause: if the replacement string contains the original one, you fall again on it and run in an endless loop
A possible workaround: allocate the result string before the loop and advance both in the original one and the result. It will save you from unnecessary allocations and de-allocations, and will be immune to the original char being present in the replacement string.
Possible code:
// the result is an allocated string that must be freed by caller
str strcinsert(str string, char flag, str substring)
{
int nflag = 0; //this is the number of times the character appears
for (int i = 0; i<strlen(string); i++)
if (string[i] == flag)
nflag++;
str new_ = string;
int pos;
new_ = salloc(strlen(string) + nflag*strlen(substring) - nflag);
// should test new_ != NULL
char * cur = new_;
char *old = string;
while (NULL != (string = strchr(string, flag))) //since when its not found returns NULL
{
pos = string - old;
strncpy(cur, old, pos);
cur[pos] = '\0'; // strncpy does not null terminate the dest. string
strcat(cur, substring);
strcat(cur, string + 1);
cur += strlen(substring) + pos; // advance the result
old = ++string; // and the input string
}
return new_;
}
Note: I have not reverted the str
and salloc
but you really should do.
回答2:
In your second loop, you always look for the first flag
character in the string. In this case, that’ll be the one you just inserted from substring
. The strchr
function will always find that quote and never return NULL
, so your loop will never terminate and just keep allocating memory (and not enough of it, since your string grows arbitrarily large).
Speaking of allocating memory, you need to be more careful with that. Unlike in Python, C doesn’t automatically notice when you’re no longer using memory; anything you malloc
must be free
d. You also allocate far more memory than you need: even in your working "this$ is a tes$t wawawa$wa"
example, you allocate enough space for the full string on each iteration of the loop, and never free
any of it. You should just run the allocation once, before the second loop.
This isn’t as important as the above stuff, but you should also pay attention to performance. Each call to strcat
and strlen
iterates over the entire string, meaning you look at it far more often than you need. You should instead save the result of strlen
, and copy the new string directly to where you know the NUL terminator is. The same goes for strchr
; you already replaced the beginning of the string and don’t want to waste time looking at it again, apart from the part where that’s causing your current bug.
In comparison to these issues, the style issues mentioned in the comments with your typedef and macro are relatively minor, but they are still worth mentioning. A char*
in C is different from a str
in Python; trying to typedef
it to the same name just makes it more likely you’ll try to treat them as the same and run into these issues.
回答3:
I don't know why but it stops working
strchr(string, flag)
is looking over the whole string for flag. Search needs to be limited to the portion of the string not yet examined/updated. By re-searching the partially replaces string, code is finding the flag
over and over.
The whole string management approach needs re-work. As OP reported a Python background, I've posted a very C approach as mimicking Python is not a good approach here. C is different especially in the management of memory.
Untested code
// Look for needles in a haystack and replace them
// Note that replacement may be "" and result in a shorter string than haystack
char *strcinsert_alloc(const char *haystack, char needle, const char *replacment) {
size_t n = 0;
const char *s = haystack;
while (*s) {
if (*s == needle) n++; // Find needle count
s++;
}
size_t replacemnet_len = strlen(replacment);
// string length - needles + replacements + \0
size_t new_size = (size_t)(s - haystack) - n*1 + n*replacemnet_len + 1;
char *dest = malloc(new_size);
if (dest) {
char *d = dest;
s = haystack;
while (*s) {
if (*s == needle) {
memcpy(d, s, replacemnet_len);
d += replacemnet_len;
} else {
*d = *s;
d++;
}
s++;
}
*d = '\0';
}
return dest;
}
回答4:
In your program, you are facing problem for input -
char *somestring= "this \"is a test\" wawawawa";
as you want to replace "
for a \"
.
The first problem is whenever you replace "
for a \"
in string
, in next iteration strchr(string, flag)
will find the last inserted "
of \"
. So, in subsequent interations your string will form like this -
this \"is a test" wawawawa
this \\"is a test" wawawawa
this \\\"is a test" wawawawa
So, for input string "this \"is a test\" wawawawa"
your while loop will run for infinite times as every time strchr(string, flag)
finds the last inserted "
of \"
.
The second problem is the memory allocation you are doing in your while
loop in every iteration. There is no free()
for the allocated memory to new
. So when while
loop run infinitely, it will eat up all the memory which will lead to - the PC collapses
.
To resolve this, in every iteration, you should search for flag
only in the string starting from a character after the last inserted substring
to the end of the string. Also, make sure to free()
the dynamically allocated memory.
来源:https://stackoverflow.com/questions/46958109/inserting-strings-into-another-string-in-c