When compiling programs to run inside a VM, what should march and mtune be set to?

问题

With VMs being slave to whatever the host machine is providing, what compiler flags should be provided to gcc?

I would normally think that -march=native would be what you would use when compiling for a dedicated box, but the fine detail that -march=native is going to as indicated in this article makes me extremely wary of using it.

So... what to set -march and -mtune to inside a VM?

For a specific example...

My specific case right now is compiling python (and more) in a linux guest inside a KVM-based "cloud" host that I have no real control over the host hardware (aside from 'simple' stuff like CPU GHz m CPU count, and available RAM). Currently, cpuinfo tells me I've got an "AMD Opteron(tm) Processor 6176" but I honestly don't know (yet) if that is reliable and whether the guest can get moved around to different architectures on me to meet the host's infrastructure shuffling needs (sounds hairy/unlikely).

All I can really guarantee is my OS, which is a 64-bit linux kernel where uname -m yields x86_64.

回答1:

Some incomplete and out of order excerpts from section 3.17.14 Intel 386 and AMD x86-64 Options of the GCC 4.6.3 Standard C++ Library Manual (which I hope are pertinent).

-march=cpu-type
  Generate instructions for the machine type cpu-type.  
  The choices for cpu-type are the same as for -mtune.  
  Moreover, specifying -march=cpu-type implies -mtune=cpu-type. 

-mtune=cpu-type
  Tune to cpu-type everything applicable about the generated code,  
  except for the ABI and the set of available instructions.  
  The choices for cpu-type are:
    generic
      Produce code optimized for the most common IA32/AMD64/EM64T processors. 
    native
      This selects the CPU to tune for at compilation time by determining
      the processor type of the compiling machine. 
      Using -mtune=native will produce code optimized for the local machine
      under the constraints of the selected instruction set.
      Using -march=native will enable all instruction subsets supported by
      the local machine (hence the result might not run on different machines).

What I found most interesting is that specifying -march=cpu-type implies -mtune=cpu-type. My take on the rest was that if you are specifying both -march & -mtune you're probably getting too close to tweak overkill.

My suggestion would be to just use -m64 and you should be safe enough since you're running inside a x86-64 Linux, correct?

~~But if you don't need to run in another environment and you're feeling lucky and fault tolerant then -march=native might also work just fine for you.~~

-m32
  The 32-bit environment sets int, long and pointer to 32 bits  
  and generates code that runs on any i386 system.     
-m64
  The 64-bit environment sets int to 32 bits and long and pointer  
  to 64 bits and generates code for AMD's x86-64 architecture.

For what it's worth ...

Out of curiosity I tried using the technique described in the article you referenced. I tested gcc v4.6.3 in 64-bit Ubuntu 12.04 which was running as a VMware Player guest. The VMware VM was running in Windows 7 on a desktop using an Intel Pentium Dual-Core E6500 CPU.

The gcc option -m64 was replaced with just -march=x86-64 -mtune=generic.

However, compiling with -march=native resulted in gcc using all of the much more specific compiler options below.

-march=core2 -mtune=core2 -mcx16 
-mno-abm -mno-aes -mno-avx -mno-bmi -mno-fma -mno-fma4 -mno-lwp 
-mno-movbe -mno-pclmul -mno-popcnt -mno-sse4.1 -mno-sse4.2 
-mno-tbm -mno-xop -msahf --param l1-cache-line-size=64 
--param l1-cache-size=32 --param l2-cache-size=2048

So, yes, as the gcc documentation states when "Using -march=native ... the result might not run on different machines". To play it safe you should probably only use -m64 or it's apparent equivalent -march=x86-64 -mtune=generic for your compiles.

I can't see how you would have any problem with this since the intent of those compiler options are that gcc will produce code capable of running correctly on any x86-64/amd64 compliant CPU. (No?)

I am frankly astounded at how specific the gcc -march=native CPU options turned out to be. I have no idea how a CPU's L1 cache size being 32k could be used to fine tune the generated code. But apparently if there is a way to do this, then using -march=native will allow gcc to do it.

I wonder if this might result in any noticeable performance improvements?

回答2:

One would like the think that the CPU architecture reported by the guest OS is what you should optimize for. Otherwise, I'd call it a bug. There can be decent reasons for bugs sometimes, but...

Note that not all hypervisors will necessarily be the same.

It might be a good idea to check on a mailing list for your specific hypervisor.

来源：https://stackoverflow.com/questions/10132904/when-compiling-programs-to-run-inside-a-vm-what-should-march-and-mtune-be-set-t

标签

Linux

gcc

virtualization

compiler-optimization