On today\'s modern processors, is there any performance difference between greater than and greater than or equal comparison for a branch condition? If I have a condition t
I seriously doubt there's a difference.
There shouldn't be any noticeable difference between comparing different predicates, because of the way they're computed (beware I haven't read the x86 manuals in detail so it may work different):
Most instructions produce several flags as a byproduct, usually you have at least: carry (c), overflow (o), zero (z) and negative (n).
Using those predicates that are created by a x-y instruction (that creates the above 4 reliably) we can easily figure out all wanted comparisions trivially. For unsigned numbers:
x = y z
x != y !z
x < y !c
x <= y !c + z
x > y c . !z
x >= y c
So it hardly makes any difference. But then there are some differences, which mostly come down to the fact if we can use TEST (which is an AND instead of a full blown subtraction) or have to use CMP (that's the subtraction). TEST is more limited but faster (usually).
Also modern architectures (starting from c2d on intel side) can sometimes fuse two µops into one macro op - so called macro-op fusion which has some nice advantages. And the rules for that change from one architecture to the next and are a bit longer. For example branches that test the overflow, parity or sign flag only (JO, JNO, JP, JNP, JS, JNS) can fuse with TEST but not with CMP on c2d and nehalems (you bet I looked that one up - section 7.5).
So can we just say it's complicated and not worry about such things? That is except if you're writing an optimizer for a compiler, because really - independent of WHAT you write in your source code the compiler will do what it wants anyhow - and for good reason (ie if JGE were theoretically faster you'd have to write if (x < y) usually..). And if you really need one advice: Comparing against 0 is often faster.
I'm not quite sure how the underlying implementation is done in the ALU/FPU but there should only be one operation for all of them (on primitive types that is)
I really hope that this is only a question because you are curious and not that you're trying to optimize, this will never give you a big performance boost and most likely your code will contain far far worse performance issues.
You can event implement all relation operators using just one:
a < b is the base a > b == b < a a >= b == !(a < b) a <= b == !(a > b)
This is of course not how it's implemented in the CPU, this is more trivia.