Strange global variable behaviour, once variable name is changed issue disappears

醉酒当歌 提交于 2019-12-12 10:19:36

问题


During my university exercise I have come across strange behaviour of a variable.

/* Main parameters                                                          */
double sizeX, sizeY;      /* Size of the global domain                      */
int nPartX, nPartY;       /* Particle number in x, y direction              */
int nPart;                /* Total number of particles                      */
int nCellX, nCellY;       /* (Global) number of cells in x, y direction     */
int steps;                /* Number of timesteps                            */
double dt;                /* Stepsize for timesteps                         */
int logs;                 /* Whether or not we want to keep logfiles        */

void ReadInput(const char *fname)
{
  FILE *fp;
  char c;

  Debug("ReadInput", 0);
  if(rank == 0)
  {
    fp = fopen(fname, "r");
    if(!fp) Debug("Cannot open input file", 1);
    if(fscanf(fp, "sizeX: %lf\n", &sizeX) != 1) Debug("sizeX?",  1);
    if(fscanf(fp, "sizeY: %lf\n", &sizeY) != 1) Debug("sizeY?",  1);
    if(fscanf(fp, "nPartX:%i\n", &nPartX) != 1) Debug("nPartX?", 1);
    if(fscanf(fp, "nPartY:%i\n", &nPartY) != 1) Debug("nPartY?", 1);
    if(fscanf(fp, "nCellX:%i\n", &nCellX) != 1) Debug("nCellX?", 1); //read value is 10
    if(fscanf(fp, "nCellY:%i\n", &nCellY) != 1) Debug("nCellY?", 1);    
    if(fscanf(fp, "steps: %li\n", &steps) != 1) Debug("steps?",  1);    
//here the nCellX variable value 10 is changed somehow to 0
    if(fscanf(fp, "dt:    %lf\n", &dt)    != 1) Debug("dt?",     1);
    if(fscanf(fp, "logs:  %c\n",  &c)     != 1) Debug("logs?",   1);
    logs = (c == 'y');
    fclose(fp);
  }

  printf("(%i) reporting in...\n", rank);

  MPI_Bcast(&sizeX, 1, MPI_DOUBLE, 0, grid_comm);  
  MPI_Bcast(&sizeY, 1, MPI_DOUBLE, 0, grid_comm);
  MPI_Bcast(&nPartX,1, MPI_INT,    0, grid_comm);  
  MPI_Bcast(&nPartY,1, MPI_INT,    0, grid_comm);
  MPI_Bcast(&nCellX,1, MPI_INT,    0, grid_comm);
  MPI_Bcast(&nCellY,1, MPI_INT,    0, grid_comm);
  MPI_Bcast(&steps, 1, MPI_INT,    0, grid_comm);
  MPI_Bcast(&dt,    1, MPI_DOUBLE, 0, grid_comm);
  MPI_Bcast(&logs,  1, MPI_INT,    0, grid_comm);
  nPart = nPartX * nPartY;
  dt2 = dt * dt;
}

Teacher and I have concluded that if we change the variable name from "nCellX" to "nCellX_2", the problem disappears and the code works as expected. Another interesting thing is that only this single global variable have this problem, other variables works correctly. I was wondering does anyone came across this type of problem as well. Any guideline/explanation would be appreciated.

If this problem is not clear enough let me know, also if full code is required I can provide that as well. In general the code is a parallel algorithm of a Particle-in-Cell.


回答1:


It is possible that the following line of code is causing a problem:

if(fscanf(fp, "steps: %li\n", &steps) != 1) Debug("steps?",  1);

The %li indicates a long integer, which might be 64-bits while steps is an int, which might be 32-bits. The format specifier should be %i instead of %li.

Whether there is an actual problem depends on the environment (e.g., it is most likely an issue if building a 64-bit application). If there is that 64-bit vs 32-bit mismatch, then the fscanf call will overwrite memory and possibly destroy whatever variable follows steps in the memory layout (and that could be nCellX). Note that using -Wall option should warn you about this situation. Why changing the name of nCellX to something different should mask the problem is not clear, but it would seem that changing the names may be resulting in a change in the layout of the variables in memory; I doubt that is disallowed by the C standard (although I have not looked).




回答2:


As an confirmation to the comment by @Mark Wilkins & Co. I'm trying to show that naming defineitively can have an effect.

On the case:
fprintf() takes a pointer where it stores what it read. It does not know the type it points to, but take the definition from the format and cast the argument. Something like sscanf("36", "%i", &my_dest); -> number = va_arg(vl, int*);

Use correct flags for you compiler to catch this


When exec starts up a program it typically assign addresses for uninitialized data (ie int foo;) in a region known as BSS. (See Fig. 1 down below for a figure).

On many systems this would be from a low memory address and up.

To demonstrate what happens (on a given system) we have as follows:

I start out with the following:

/* global scope */
unsigned char unA;
unsigned char unB;
unsigned char unC;
unsigned int  unD;

List 1

In main() I say:

unA = '1';
unB = '2';
unC = '3';
/* bit shifting the "string" NAC! into unD, reverse order as my system is LSB 
 * first (little-endian), unD becomes 558055758 => by byte ASCII !CNA */
unD = 0 | ('!' << 24) | ('C' << 16) | ('A' << 8) | 'N';

List 2

And point a unsigned char pointer to unA and dumps the following 16 bytes which result in:
Dumps are in format [char<dot>], or hex with leading zero (%c. or %02x)*

 +-- Address of unA
 |
0x804b06c: 1.3.0000N.A.C.!. 2.00000000000000 
           | |     |_____|  |
           | |        |     +--- unB
           | |        +--------- unD
           | +------------------ unC
           +-------------------- unA

List 3

Then I change name of unB to un2, same order in file:

unsigned char unA;
unsigned char un2;
unsigned char unC;
unsigned int  unD;

List 4

Now my dump gives:

 +-- Address of unA
 |
0x804b06c: 1.3.2.00N.A.C.!. 0000000000000000
           | | |   |_____|  
           | | |      +--------- unD
           | | +---------------- unB
           | +------------------ unC
           +-------------------- unA

List 5

As one can see the order of the addresses / alignment has been changed. No change in type, only in name.


Assigning wrong type:

Next step is then to cast and overflow range of a type. Change un2 back to unB. We have alignment as in List 3.

We create a function that set the bytes (on a system with 4 byte/32bit int), high order as:

void set_what(unsigned int *n)
{
    *n = 0 | ('t' << 24) | ('a' << 16) | ('h' << 8) | 'w';
    /* or *n = 0x74616877; in an ASCII environment 
     * 0x74 0x61 0x68 0x77 == tahw */
}

List 6

In main() we say:

/* dump */
set_what((unsigned int*)&unA);
/* dump */

List 7

And get:

0x804b06c: 1.3.0000N.A.C.!. 2.00000000000000
0x804b06c: w.h.a.t.N.A.C.!. 2.00000000000000

List 8

Or:

set_what((unsigned int*)&unB); -> Yield:
0x804b06c: 1.3.0000N.A.C.!. 2.00000000000000
0x804b06c: 1.3.0000N.A.C.!. w.h.a.t.00000000

set_what((unsigned int*)&unC); -> Yield:
0x804b06c: 1.3.0000N.A.C.!. 2.00000000000000
0x804b06c: 1.w.h.a.t.A.C.!. 2.00000000000000

List 9

As one can see data is over written, regardless of type and what not.

Under some conditions this would result in SIGSEGV.


To the problems in your code, as stated in earlier comment, but I repeat it.

In the declarations you say int steps and in fscanf() you specify %li which is a long int and not an int. On quie a few systems this could have little effect, but on a 64bit system everything goes bad.

Check by asm:

We copy the code and make two copies, one with long int steps; and one with int steps; named A: lin_ok.c and B: lin_bad.c. Then we create some asm output.

A $ cpp lin_ok.c > lin_ok_m32.i
A $ cpp lin_ok.c > lin_ok_m64.i
B $ cpp lin_bad.c > lin_bad_m32.i
B $ cpp lin_bad.c > lin_bad_m64.i

A $ gcc -std=c89 -m32 -S lin_ok_m32.i
A $ gcc -std=c89 -m64 -S lin_ok_m64.i
B $ gcc -std=c89 -m32 -S lin_bad_m32.i
B $ gcc -std=c89 -m64 -S lin_bad_m64.i


$ diff lin_ok_m32.s lin_ok_m64.s | head
9c9
<   .comm   steps,4,4   ; reserve 4 bytes
---
>   .comm   steps,8,8   ; reserve 8 bytes
...

As one can see the code instructs to reserve 8 bytes on 64 bit and 4 on 32 bit (this system) for steps.


If you use gcc, compile with more flags. Personally I use, typically:

gcc -Wall- Wextra -pedantic -std=c89 -o main main.c or -std=c99 if in need.

This will give you warnings on such problems as wrong type in scanf.


An example of layout of a running application. It can be completely different, depending on system, etc, but is an aprox AFAIK. Hopefully I've gotten most of it right.

 ________________                       _________________
[                ]                     [                 ]
[                ]                     [ Physical memory ]
[ Virtual memory ] <-- Translation --> [                 ]
[     range      ]        table        { - - - - - - - - }
[________________]                     [                 ]
    |                                  [_________________]
    |
 +--+ high address : Virtual address
 |
0xF00 +-------------------+'''''''''''''''''' Runnning env
      | argv, env-vars, ..|                              |
0xBFF +-------------------+                              | ptr
      |       stack       | <- Running storage, where    |
      |...  grows down ...|  fun_a should return, local  | 0xC0000000 on
      |                   |  variables, env, ...         | linux Intel x86
      |  < huge area  >   |  New frame allocated for     |
      |                   |  recursive calls etc.        |
      |...   grows up  ...|                              |     
      |                   | <- Dynamic memory alloc.     |
      |       heap        |  malloc, etc                 |
0x9e49+-------------------+                              | 
      | double sizeX;     | <- Uninitialized data        |
bss   | ...               |           BSS 000000 ...     |
seg.  | int nCellY        |                              |
      | int steps;        |                              |
0x804c+-------------------+''''''''''''''''''''' Stored '| --- edata
data  |                   |                        on    |
seg.  | int rank = 0;     | <- Initialized data   disk   |
0x804b+-------------------+                         :    | --- etext
      | main()            |                         :    |
text  | mov ecx, edx      | <- Instructions         :    | 0x08048000 on
seg.  | ELF, or the like  |   Layout, link, etc     :    | linux Intel x86
0x8040+-------------------+ ''''''''''''''''''''''''''''''
 |
 +--- low address : Virtual address

Fig 1.



来源:https://stackoverflow.com/questions/9839969/strange-global-variable-behaviour-once-variable-name-is-changed-issue-disappear

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!