问题
During my university exercise I have come across strange behaviour of a variable.
/* Main parameters */
double sizeX, sizeY; /* Size of the global domain */
int nPartX, nPartY; /* Particle number in x, y direction */
int nPart; /* Total number of particles */
int nCellX, nCellY; /* (Global) number of cells in x, y direction */
int steps; /* Number of timesteps */
double dt; /* Stepsize for timesteps */
int logs; /* Whether or not we want to keep logfiles */
void ReadInput(const char *fname)
{
FILE *fp;
char c;
Debug("ReadInput", 0);
if(rank == 0)
{
fp = fopen(fname, "r");
if(!fp) Debug("Cannot open input file", 1);
if(fscanf(fp, "sizeX: %lf\n", &sizeX) != 1) Debug("sizeX?", 1);
if(fscanf(fp, "sizeY: %lf\n", &sizeY) != 1) Debug("sizeY?", 1);
if(fscanf(fp, "nPartX:%i\n", &nPartX) != 1) Debug("nPartX?", 1);
if(fscanf(fp, "nPartY:%i\n", &nPartY) != 1) Debug("nPartY?", 1);
if(fscanf(fp, "nCellX:%i\n", &nCellX) != 1) Debug("nCellX?", 1); //read value is 10
if(fscanf(fp, "nCellY:%i\n", &nCellY) != 1) Debug("nCellY?", 1);
if(fscanf(fp, "steps: %li\n", &steps) != 1) Debug("steps?", 1);
//here the nCellX variable value 10 is changed somehow to 0
if(fscanf(fp, "dt: %lf\n", &dt) != 1) Debug("dt?", 1);
if(fscanf(fp, "logs: %c\n", &c) != 1) Debug("logs?", 1);
logs = (c == 'y');
fclose(fp);
}
printf("(%i) reporting in...\n", rank);
MPI_Bcast(&sizeX, 1, MPI_DOUBLE, 0, grid_comm);
MPI_Bcast(&sizeY, 1, MPI_DOUBLE, 0, grid_comm);
MPI_Bcast(&nPartX,1, MPI_INT, 0, grid_comm);
MPI_Bcast(&nPartY,1, MPI_INT, 0, grid_comm);
MPI_Bcast(&nCellX,1, MPI_INT, 0, grid_comm);
MPI_Bcast(&nCellY,1, MPI_INT, 0, grid_comm);
MPI_Bcast(&steps, 1, MPI_INT, 0, grid_comm);
MPI_Bcast(&dt, 1, MPI_DOUBLE, 0, grid_comm);
MPI_Bcast(&logs, 1, MPI_INT, 0, grid_comm);
nPart = nPartX * nPartY;
dt2 = dt * dt;
}
Teacher and I have concluded that if we change the variable name from "nCellX" to "nCellX_2", the problem disappears and the code works as expected. Another interesting thing is that only this single global variable have this problem, other variables works correctly. I was wondering does anyone came across this type of problem as well. Any guideline/explanation would be appreciated.
If this problem is not clear enough let me know, also if full code is required I can provide that as well. In general the code is a parallel algorithm of a Particle-in-Cell.
回答1:
It is possible that the following line of code is causing a problem:
if(fscanf(fp, "steps: %li\n", &steps) != 1) Debug("steps?", 1);
The %li
indicates a long integer, which might be 64-bits while steps
is an int
, which might be 32-bits. The format specifier should be %i
instead of %li
.
Whether there is an actual problem depends on the environment (e.g., it is most likely an issue if building a 64-bit application). If there is that 64-bit vs 32-bit mismatch, then the fscanf
call will overwrite memory and possibly destroy whatever variable follows steps
in the memory layout (and that could be nCellX
). Note that using -Wall
option should warn you about this situation. Why changing the name of nCellX to something different should mask the problem is not clear, but it would seem that changing the names may be resulting in a change in the layout of the variables in memory; I doubt that is disallowed by the C standard (although I have not looked).
回答2:
As an confirmation to the comment by @Mark Wilkins & Co. I'm trying to show that naming defineitively can have an effect.
On the case:fprintf()
takes a pointer where it stores what it read. It does not know
the type it points to, but take the definition from the format and cast the
argument. Something like sscanf("36", "%i", &my_dest);
->
number = va_arg(vl, int*);
Use correct flags for you compiler to catch this
When exec starts up a program it typically assign addresses for uninitialized data (ie int foo;) in a region known as BSS. (See Fig. 1 down below for a figure).
On many systems this would be from a low memory address and up.
To demonstrate what happens (on a given system) we have as follows:
I start out with the following:
/* global scope */
unsigned char unA;
unsigned char unB;
unsigned char unC;
unsigned int unD;
List 1
In main()
I say:
unA = '1';
unB = '2';
unC = '3';
/* bit shifting the "string" NAC! into unD, reverse order as my system is LSB
* first (little-endian), unD becomes 558055758 => by byte ASCII !CNA */
unD = 0 | ('!' << 24) | ('C' << 16) | ('A' << 8) | 'N';
List 2
And point a unsigned char pointer to unA
and dumps the following 16 bytes which
result in:
Dumps are in format [char<dot>], or hex with leading zero (%c. or %02x)*
+-- Address of unA
|
0x804b06c: 1.3.0000N.A.C.!. 2.00000000000000
| | |_____| |
| | | +--- unB
| | +--------- unD
| +------------------ unC
+-------------------- unA
List 3
Then I change name of unB
to un2
, same order in file:
unsigned char unA;
unsigned char un2;
unsigned char unC;
unsigned int unD;
List 4
Now my dump gives:
+-- Address of unA
|
0x804b06c: 1.3.2.00N.A.C.!. 0000000000000000
| | | |_____|
| | | +--------- unD
| | +---------------- unB
| +------------------ unC
+-------------------- unA
List 5
As one can see the order of the addresses / alignment has been changed. No change in type, only in name.
Assigning wrong type:
Next step is then to cast and overflow range of a type.
Change un2
back to unB
.
We have alignment as in List 3.
We create a function that set the bytes (on a system with 4 byte/32bit int), high order as:
void set_what(unsigned int *n)
{
*n = 0 | ('t' << 24) | ('a' << 16) | ('h' << 8) | 'w';
/* or *n = 0x74616877; in an ASCII environment
* 0x74 0x61 0x68 0x77 == tahw */
}
List 6
In main()
we say:
/* dump */
set_what((unsigned int*)&unA);
/* dump */
List 7
And get:
0x804b06c: 1.3.0000N.A.C.!. 2.00000000000000
0x804b06c: w.h.a.t.N.A.C.!. 2.00000000000000
List 8
Or:
set_what((unsigned int*)&unB); -> Yield:
0x804b06c: 1.3.0000N.A.C.!. 2.00000000000000
0x804b06c: 1.3.0000N.A.C.!. w.h.a.t.00000000
set_what((unsigned int*)&unC); -> Yield:
0x804b06c: 1.3.0000N.A.C.!. 2.00000000000000
0x804b06c: 1.w.h.a.t.A.C.!. 2.00000000000000
List 9
As one can see data is over written, regardless of type and what not.
Under some conditions this would result in SIGSEGV.
To the problems in your code, as stated in earlier comment, but I repeat it.
In the declarations you say int steps
and in fscanf()
you specify %li
which is a long int
and not an int
. On quie a few systems this could have
little effect, but on a 64bit system everything goes bad.
Check by asm:
We copy the code and make two copies, one with long int steps;
and one with
int steps;
named A: lin_ok.c
and B: lin_bad.c
. Then we create some
asm output.
A $ cpp lin_ok.c > lin_ok_m32.i
A $ cpp lin_ok.c > lin_ok_m64.i
B $ cpp lin_bad.c > lin_bad_m32.i
B $ cpp lin_bad.c > lin_bad_m64.i
A $ gcc -std=c89 -m32 -S lin_ok_m32.i
A $ gcc -std=c89 -m64 -S lin_ok_m64.i
B $ gcc -std=c89 -m32 -S lin_bad_m32.i
B $ gcc -std=c89 -m64 -S lin_bad_m64.i
$ diff lin_ok_m32.s lin_ok_m64.s | head
9c9
< .comm steps,4,4 ; reserve 4 bytes
---
> .comm steps,8,8 ; reserve 8 bytes
...
As one can see the code instructs to reserve 8 bytes on 64 bit and 4 on 32 bit
(this system) for steps
.
If you use gcc, compile with more flags. Personally I use, typically:
gcc -Wall- Wextra -pedantic -std=c89 -o main main.c or
-std=c99
if in need.
This will give you warnings on such problems as wrong type in scanf.
An example of layout of a running application. It can be completely different, depending on system, etc, but is an aprox AFAIK. Hopefully I've gotten most of it right.
________________ _________________
[ ] [ ]
[ ] [ Physical memory ]
[ Virtual memory ] <-- Translation --> [ ]
[ range ] table { - - - - - - - - }
[________________] [ ]
| [_________________]
|
+--+ high address : Virtual address
|
0xF00 +-------------------+'''''''''''''''''' Runnning env
| argv, env-vars, ..| |
0xBFF +-------------------+ | ptr
| stack | <- Running storage, where |
|... grows down ...| fun_a should return, local | 0xC0000000 on
| | variables, env, ... | linux Intel x86
| < huge area > | New frame allocated for |
| | recursive calls etc. |
|... grows up ...| |
| | <- Dynamic memory alloc. |
| heap | malloc, etc |
0x9e49+-------------------+ |
| double sizeX; | <- Uninitialized data |
bss | ... | BSS 000000 ... |
seg. | int nCellY | |
| int steps; | |
0x804c+-------------------+''''''''''''''''''''' Stored '| --- edata
data | | on |
seg. | int rank = 0; | <- Initialized data disk |
0x804b+-------------------+ : | --- etext
| main() | : |
text | mov ecx, edx | <- Instructions : | 0x08048000 on
seg. | ELF, or the like | Layout, link, etc : | linux Intel x86
0x8040+-------------------+ ''''''''''''''''''''''''''''''
|
+--- low address : Virtual address
Fig 1.
来源:https://stackoverflow.com/questions/9839969/strange-global-variable-behaviour-once-variable-name-is-changed-issue-disappear